Interaktivní osnova
[Michal Štefánik] Introducing Domain Adaptation Framework for Neural Language Modeling 14. 10. 2021
Image: Illustrative projection of samples for a specific task of interest, in the intersection of pre-training and adaptation domain. Taken from [3].
Abstract
It is commonly agreed that the ordering in which the phenomena is learned by humans is crucial for the eventual level of their comprehension: if the person is first exposed to the most complex patterns of the system and only then it has a chance to learn the essentials, they will barely be able to work with the complex patterns.
Specifically to NLP, meta-learning of learning curricula has been applied in learning Word2Vec embeddings [2] with interesting qualitative gains, but for a price of exhaustive computational demands in the meta-learning process.
Deep neural networks, benefiting from transfer learning, a similar curricula selection can additionally be conducted to pick the ordering of both data samples and objectives. For example, [1] shows that permutations of objectives according to manually-crafted curricula might bring significant performance gains on machine translation. The methodology of selection of the optimal curriculum is not clear, but, we know that domain adaptation preceding end-task training is beneficial essentially in all cases [3].
This motivates the creation of a framework that eases the process of training using a composition of multiple objectives, in a selected schedule, which is what a framework of Transformers Domain Adaptation framework, or in short TDA does.
TDA rebuilds a standard training pipeline centered on a specific combination of an objective and a compatible model by parametrizing both objectives and the schedule of their application.
This talk will introduce the building stones of the framework and practically show how these can be used in training more accurate, or distributionally robust named entity recognition, text classification, and machine translation models.
https://github.com/authoranonymous321/DA
Readings
[1]: Popel, M. et.al: Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals: on https://www.nature.com/articles/s41467-020-18073-9
[2]: Tsvetkov, J. et.al: Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning: on https://aclanthology.org/P16-1013
[3]: Gururangan, S. et.al: Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks: on: https://aclanthology.org/2020.acl-main.740/