MUNI
FI
Transformers
PA154 Language Modeling (10.1)
Pavel Rychlý
pary@fi.muni.cz April 20, 2023
Multi-layer encoder/decoder
■ Encoder: Input sequence —► state
■ Decoder: state + sentence delimiter —► output
■ Problem: fix size state
Encoder Decoder
Sources
^avel Rychlý ■ Transformers ■ April 20,2023
FC
Targets
Recurrent		Recurrent
		
t		t
Embedding		Embedding
x n
Attention
■ each decoder layer has access to all hidden states from the last encoder
■ use attention to extract important parts (vector)
Attention
use attention to extract important parts (vector) important = similar to "me"
Input Hidden
Encoder oupuis
M J 1 J "lít
^avel Rychlý ■ Transformers ■ April 20,2023
^avel Rychlý ■ Transformers ■ April 20,2023
Self-Attention
Transformes
instead of sequencial processing
attention to previous (and following) tokens
fully parallel processing during training
■ Attention is All You Need
■ self-attention in both encoder and decoder
■ masked cross-attention in decoder http:
//jalammar.github.io/illustrated-transformer/
^avel Rychlý ■ Transformers ■ April 20,2023
5/10
^avel Rychlý ■ Transformers ■ April 20,2023
6/10
Transformers variants
BERT
using context to compute token/sentence/document embedding
BERT = Bidirectional Encoder Representations from Transformers
GPT= Generative Pre-trained Transformer
many varians: tokenization, attention, encoder/decoder
connections
OpenAI GPT	ELMo	
		X X X
^avel Rychly ■ Transformers ■ April 20,2023
Using pre-trained models
(BERT) trained on huge amount of data finetuned on task specific data
using output of BERT as an input to task specific model (without modification of BERT)
^avel Rychly ■ Transformers ■ April 20,2023
Google encoder only pre-training on raw text masking tokens, is-next-sentence big pre-trained models available domain (task) adaptation
Input: The man went to the [MASK]2 . He bought a [MASK]2 of milk . Labels: [MASK]   = store;   [MASK], = gallon
Sentence A-The man went to Sentence B = He bought a gal Label = IsNextSentence
Sentence A - The man went to the s Sentence B = penguins are flightle Label = NotKcxtSc'itcnoc
^avel Rychly ■ Transformers ■ April 20,2023
GPT
Open Al
decoder only
pre-training on raw text
trained on prediction of next token
Prediction Classifier
I Layer Norm | J   Layer Norm j
Text & Positicn Embed
^avel Rychly ■ Transformers ■ April 20,2023