Natural Language Modelling PA154 Jazykové modelování (13) Pavel Rychlý pary@fi.muni.cz May 25, 2021 Big models bigger is better many layers need big machines using advanced hardware: GPU, TPU PA154 Jazykové modelování (13) Natural Language Modelling 2/11 BERT ■ Google ■ pre-training on raw text ■ masking tokens, is-next-sentence ■ big pre-trained models available ■ domain (task) adaptation Input: The man went to the [MASK]1 . He bought a [MASK]2 of milk . Labels: [MASK]1 - store; [MASK]2 = gallon Sentence A = The man went to the store. Sentence B = He bought a gallon of milk. Label = IsNextSentence Sentence A = The man went to the store. Sentence B = Penguins are flightless. Label = NotNextSentence PA154 Jazykové modelování (13) Natural Language Modelling 3/11 ■ Open Al ■ GPT-2: 1.5 billion parameters ■ GPT-3: 175 billion parameters ■ very good text generation —>» potentially harmful applications ■ Misuse of Language Models ■ bias - generate stereotyped or prejudiced content: gender, race, religion ■ Sep 2020: Microsoft have "exclusive" use of GPT-3 PA154 Jazykové modelování (13) Natural Language Modelling "5: Text-To-Text Transfer Transformer ■ Google Al ■ transfer learning ■ C4: Colossal Clean Crawled Corpus "translate English to German: That is good." "cola sentence: The course is jumping well "stsb sentencel: The rhino grazed on the grass. sentence2: A rhino is grazing in a field." "summarize: state authorities dispatched emergency crews tuesday to survey the damage after an onslaught of severe weather in mississippi..." "Das ist gut." "not acceptable" "3.8" six people hospitalized after a storm in attala county." PA154 Jazykové modelování (13) Natural Language Modelling 5/11 Pretrained models ■ huge training data ■ long training time ■ small model ■ fine tuning on target task ■ multi-language models ■ universal tokenization: subword units ► Byte-Pair Encoding (BPE) ► Word Piece ► SentencePiece PA154 Jazykové modelování (13) Natural La ALBERT ■ A Lite BERT ■ factorized embedding parameters ■ cross-layer parameter sharing ■ inter-sentence coherence loss Next Sentence Prediction —>► Sentence-Order Prediction ■ much smaller: No. parameters: 108M —>► 12M (base) SentenCG A = The man went to the store. Sentence B = He bought a gallon of milk. Label = IsNextSentence Sentence A = The man went to the store. Sentence B = Penguins are flightless. Label = NotNextSentence PA154 Jazykové modelování (13) Natural Language Modelling 7/11 Intrinsic evaluation ■ direct evaluation of word embeddings ■ semantic similarity (WordSim-353, SimLex-999, ...) ■ word analogy (Google Analogy, BATS (Bigger Analogy Test Set)) ■ concept categorization (ESSLLI-2008) PA154 Jazykové modelování (13) Natural Language Modelling 8/11 Extrinsic evaluation ■ using the model in a downstream NLP task ■ Part-of-Speech Tagging, Noun Phrase Chunking, Named Entity Recognition, Shallow Syntax Parsing, Semantic Role Labeling, Sentiment Analysis, Text Classification, Paraphrase Detection, Textual Entailment Detection PA154 Jazykové modelování (13) Natural Language Modelling Multi-task benchmarks ■ GLUE (https://gluebenchmark.com) nine sentence- or sentence-pair language understanding tasks ■ SuperGLUE (https://super.gluebenchmark.com) more difficult language understanding tasks ■ XTREME - Cross-Lingual Transfer Evaluation of Multilingual Encoders (https://sites.research.google/xtreme) 40 typologically diverse languages, 9 tasks PA154 Jazykové modelování (13) Natural Language Modelling Libraries and Frameworks ■ Dive into Deep Learning: online book https://d21.ai ■ Hugging Face Transformers: many ready to use models https://huggingface.co/transformers ■ jiant: library, many tasks for evaluation https://j iant.inf o ■ GluonNLP: reproduction of latest research results https://nip.gluon.ai ■ low level libraries: NumPy, PyTorch, TensorFlow, MXNet PA154 Jazykové modelování (13) Natural Language Modelling