Deep Learning Natural Language Modelling PA154 Jazykové modelování (12) Pavel Rychlý pary@fi.muni.cz May 18, 2021 deep neural networks many layers trained on big data using advanced hardware: GPU, TPU supervised, semi-supervised or unsupervised Natural Language Modelling Neural Networks ■ Neuron: many inputs, weights, transfer function (threshold), one output: m yk = C£2 Wkixj} ■ Input/Hidden/Output layer ■ One-hot representation of words/classes: [00010000] Training Neural Networks ■ supervised training ■ example: input + result ■ difference between output and expected result ■ adjusts weights according to a learning rule ■ backpropagation (feedforward neural networks) ■ gradient of the loss function, stochastic gradient descent (SGD) O Input layer Natural Language Modelling „^ *..... lV ■■Tlr Backprop output layer Natural Language Modelling Recurrent Neural Network (RNN) ■ dealing with long inputs ■ feedforward NN + internal state (memory) ■ finite impulse RNN: unroll to strictly feedforward NN ■ infinite impulse RNN: directed cyclic graph ■ additional storage managed by NN: gated state/memory ■ backpropagation through time 0 © 0 \^ Unfold t i \ lw N _Jw _Lul ' | u tu I u 0 © © © h. hr h- Long short-term memory (LSTM) ■ LSTM unit: cell, input gate, output gate and forget gate ■ cell = memeory ■ gates regulate the flow of information into and out of the cell © © PA154 Jazykové modelování (12) Natural Language Modelling Natural Language Modelling GRU, BRNN, Encoder-Decoder Gated recurrent unit (GRU) fewer parameters than LSTM memory = output Bi-directional RNN two hidden layers of opposite directions to the same output hierarchical, multilayer variable input/output size, not 1-1 mapping two components Encoder: variable-length sequence —► fixed size state Decoder: fixed size state —► variable-length sequence Input Encoder State Decoder Output Natural Language Model Natural Language Modelling Sequence to Sequence Sequence to Sequence Learning ► Encoder: Input sequence state ► Decoder: state + output sequence output sequence Using Encoder: Input sequence state Decoder: state + sentence delimiter output Encoder 1 They are watching Decoder regardent i L í lis regardent Ü Encoder Decoder lis regardent ~T ^ ^ ^ ^ [ Tj L3T 1-77 They are watching , Language Modelling Language Modelling Transformers BERT using context to compute token/sentence/document embedding BERT = Bidirectional Encoder Representations from Transformers GPT = Generative Pre-trained Transformer many varians: tokenization, attention, encoder/decoder connections Google pre-training on raw text masking tokens, is-next-sentence big pre-trained models available domain (task) adaptation Input: The man went to the [MASK]1 . He bought a [MASKL of milk Labels: [MASK]( - store; [MASK] = gallon Sentence A = The man went to the Sentence B = He bought a gallon . Label - IsNexrSentence Sentence A - The man went to the store. Sentence B=Penguins are flightless. Label = fotNcxtSentence PA154 Jazykové modelování (12) Natural Language Modelling Natural Language Modelling