Interaktivní osnova
[Michal Štefánik]: Unsupervised Estimation of Out-of-Distribution Performance 1. 4. 2021
Abstract
Neural language models are consistently moving the SOTA on a wide range of NLP tasks, but they do not perform consistently well under a domain shift, i.e. when applied to samples from different language domains. This disallows their deployment to some critical applications. Questionable comparability of models based on in-domain evaluation also slows down further research progress in the relevant direction.
We propose a set of simple evaluation methods that can estimate the expected performance of the system on out-of-distribution (OOD) samples. We show how each of these methods corresponds to a true evaluated performance on OOD and demonstrate the practical implications of our work in zero-shot evaluation.
Eventually, we present a set of interesting observations adjusting our understanding of neural language models, based on a novel insight that the evaluation methods bring.