Transformers
PA154 Language Modeling (11.1)
Pavel Rychly
pary@fi.muni.cz May 7, 2024
Encoder-Decoder
variable input/output size, not 1-1 mapping two components
Encoder: variable-Length sequence —>► fixed size state Decoder: fixed size state —>► variable-Length sequence
Encoder
Decoder
Pavel Rychlý • Transformers • May 7, 2024
Sequence to Sequence
Learning
■ Encoder: Input sequence -> state
■ Decoder: state + output sequence
-> output sequence
Encoder
Decoder
T
They
t r
are watching
T
T
<eos>
lis
i
<bos>
regardent
i
lis
regardent
<eos>
Pavel Rychlý • Transformers • May 7, 2024
3/14
Sequence to Sequence
Using
Encoder: Input sequence —>► state Decoder: state + sentence delimiter
-> output
Encoder
Decoder
T í f f
They      are watching
T
<eos>
<eos>
lis regardent
i ^ L_J) L-JT L->7
<bos>
Pavel Rychlý • Transformers • May 7, 2024
4/14
Multi-layer encoder/decoder
Encoder
n x
Recurrent
Embedding
Sources
Decoder
Recurrent
I
Embedding
Targets
Pavel Rychlý • Transformers • May 7, 2024
Multi-layer encoder/decoder
■ Encoder: Input sequence —>► state
■ Decoder: state + sentence delimiter —>► output
■ Problem: fix size state
Encoder Decoder
n x
Recurrent
Embedding
Sources
Recurrent
I
Embedding
Targets
Pavel Rychlý • Transformers • May 7, 2024
Attention
■ each decoder Layer has access to all hidden states from the Last encoder
■ use attention to extract important parts (vector)
INPUT
Ť
	ENCODER	>
		
		
r	ENCODER	
i.		
		
r	ENCODER	
		
		
r	ENCODER	
		
		
	ENCODER	
		
		
	ENCODER	
		
Je    suis étudiant
OUTPUT
I   am   a student
	DECODER	
		
		
	DECODER	
		J
		
r	DECODER	
<		J
		
	DECODER	
		
		
	DECODER	
>.		
		
>-	DECODER	
		
Pavel Rychlý • Transformers • May 7, 2024
7/14
Attention
Input Hidden
use attention to extract important parts (vector)
important = similar to "me"
Encoder ouputs
Decoder
\ i
Attention
Pavel Rychlý • Transformers • May 7, 2024
8/14
Self-Attention
■ instead of sequenciaL processing
■ attention to previous (and following) tokens
■ fuLLy parallel processing during training
Pavel Rychlý • Transformers • May 7, 2024
9/14
Transformes
■ Attention is All You Need
■ seLf-attention in both encoder and decoder
■ masked cross-attention in decoder http:
//jalammar.github.io/illustrated-transformer/
Pavel Rychlý • Transformers • May 7, 2024
10/14
Transformers variants
using context to compute token/sentence/document embedding
BERT = Bidirectional Encoder Representations from Transformers
GPT= Generative Pre-trained Transformer
many varians: tokenization, attention, encoder/decoder connections
BERT (Ours)
1		accessed
		
OpenAI GPT
Pavel Rychlý • Transformers • May 7, 2024
11/14
BERT
■ GoogLe
■ encoder only
■ pre-training on raw text
■ masking tokens, is-next-sentence
■ big pre-trained models available
■ domain (task) adaptation
Input: The man went to the [MASK]] . He bought a [MASK]2 of milk . Labels: [MASK]1 = store;   [MASK]2 = gallon
Sentence A = The man went to the store. Sentence B = He bought a gallon of milk. Label = IsNextSentence
Sentence A = The man went to the store. Sentence B = Penguins are flightless. Label = NotNextSentence
Pavel Rychlý • Transformers • May 7, 2024
12/14
Using pre-trained models
■ (BERT) trained on huge amount of data
■ finetuned on task specific data
■ using output of BERT as an input to task specific model (without modification of BERT)
Pavel Rychlý • Transformers • May 7, 2024
13/14
GPT
Open Al
decoder only
pre-training on raw text
trained on prediction of next token
Text	Task
Prediction	Classifier
Classification
Start
Text
Extract
Transformer
Linear
12x
Layer Norm
Feed Forward
Layer Norm
5
Masked Multi Self Attention
Entailment
Similarity
Multiple Choice
Text & Position Embed
Start	Premise	Delim	Hypothesis	Extract
				
				
Start	Text 1	Delim	Text 2	Extract
				
Start	Text 2	Delim	Textl	Extract
				
				
Start	Context	Delim	Answer 1	Extract
				
Start	Context	Delim	Answer 2	Extract
				
Start	Context	Delim	Answer N	Extract
Transformer
Linear
Transformer
Transformer
Linear
Transformer Linear
Transformer
Linear
Transformer
Linear
Pavel Rychlý • Transformers • May 7, 2024
14/14