Deep Learning
Natural Language Modelling
PA154 Jazykové modelování (12) Pavel Rychlý
pary@fi.muni.cz
May 18, 2021
deep neural networks
many layers
trained on big data
using advanced hardware: GPU, TPU
supervised, semi-supervised or unsupervised
Natural Language Modelling
Neural Networks
■ Neuron: many inputs, weights, transfer function (threshold), one output:
m
yk = <t>C£2 Wkixj}
■ Input/Hidden/Output layer
■ One-hot representation of words/classes: [00010000]
Training Neural Networks
■ supervised training
■ example: input + result
■ difference between output and expected result
■ adjusts weights according to a learning rule
■ backpropagation (feedforward neural networks)
■ gradient of the loss function, stochastic gradient descent (SGD)
O
Input layer
Natural Language Modelling
„^ *.....
lV ■■Tlr   Backprop output layer
Natural Language Modelling
Recurrent Neural Network (RNN)
■ dealing with long inputs
■ feedforward NN + internal state (memory)
■ finite impulse RNN: unroll to strictly feedforward NN
■ infinite impulse RNN: directed cyclic graph
■ additional storage managed by NN: gated state/memory
■ backpropagation through time
0 © 0
\^ Unfold             t i
\ lw       N _Jw _Lul
'   | u tu I u
0 © ©
©
h.  hr h-
Long short-term memory (LSTM)
■ LSTM unit: cell, input gate, output gate and forget gate
■ cell = memeory
■ gates regulate the flow of information into and out of the cell
© ©
PA154 Jazykové modelování (12)
Natural Language Modelling
Natural Language Modelling
GRU, BRNN,
Encoder-Decoder
Gated recurrent unit (GRU) fewer parameters than LSTM memory = output
Bi-directional RNN
two hidden layers of opposite directions to the same output hierarchical, multilayer
variable input/output size, not 1-1 mapping two components
Encoder: variable-length sequence —► fixed size state Decoder: fixed size state —► variable-length sequence
Input
Encoder
State
Decoder
Output
Natural Language Model
Natural Language Modelling
Sequence to Sequence
Sequence to Sequence
Learning
► Encoder: Input sequence state
► Decoder: state + output sequence     output sequence
Using
Encoder: Input sequence state
Decoder: state + sentence delimiter output
Encoder
1
They
are watching
Decoder
regardent i
<bos>
L í
lis
regardent
Ü
Encoder
Decoder
lis regardent
~T ^ ^ ^ ^ [ Tj L3T 1-77
They     are   watching <eos> ,
<bos>
Language Modelling
Language Modelling
Transformers
BERT
using context to compute token/sentence/document embedding BERT = Bidirectional Encoder Representations from Transformers GPT = Generative Pre-trained Transformer
many varians: tokenization, attention, encoder/decoder connections
Google
pre-training on raw text masking tokens, is-next-sentence big pre-trained models available domain (task) adaptation
Input: The man went to the [MASK]1 . He bought a [MASKL of milk Labels: [MASK]( - store;   [MASK]   = gallon
Sentence A = The man went to the Sentence B = He bought a gallon . Label - IsNexrSentence
Sentence A - The man went to the store. Sentence B=Penguins are flightless. Label = fotNcxtSentence
PA154 Jazykové modelování (12)
Natural Language Modelling
Natural Language Modelling