Understanding LLM
PA154 Language Modeling ()
Pavel Rychlý
pary@ﬁ.muni.cz
28 February 2024
Pavel Rychlý ·Understanding LLM ·28 February 2024 1 / 19
LLM = Large Language Models
new term, big hype
big expectations (AGI)
little understanding
Pavel Rychlý ·Understanding LLM ·28 February 2024 2 / 19
Language
not normal distribution (Zipf’s)
many rare events
ambiguous words
variable - changing, sublanguages
Pavel Rychlý ·Understanding LLM ·28 February 2024 3 / 19
Symbolic processing
ambiguous, variable, rare events leads to:
hard for symbolic manipulations
multiple words for same meaning
same word for multiple meanings
Pavel Rychlý ·Understanding LLM ·28 February 2024 4 / 19
Word embeddings
words represented as vectors
one word = vector of 500 numbers
similar words - closer vectors
not important
dimensions can represents diﬀerent features
Pavel Rychlý ·Understanding LLM ·28 February 2024 5 / 19
Word features
gramatical
part of speech, nuber, gender
syntactic
used with “in”/“at”, always with a particle
semantic
positive sentiment, movement meaning, fruits
style
formal, colloquial
domain
math, biology
form
Pavel Rychlý ·Understanding LLM ·28 February 2024 6 / 19
Word features
features are not independent
math – scientiﬁc
used with “in” – noun
in capital form – proper noun
features are not discrete
each feature corespond to a (set of) dimension
most features are valid for only small set of words
most words have (almost) 0 for most features
multiple meanings = union of features
Pavel Rychlý ·Understanding LLM ·28 February 2024 7 / 19
Phrase/sentece embeddings
vector space is very big
just 2 values (0,1) in each dimension = 25
00 combinations, 105
0
same vector space for phrases
average of words
diﬀerent words with same meanings – same embedding Czech president,
president of the Czech Republic
sentences, paragraphs, documents, . . .
Pavel Rychlý ·Understanding LLM ·28 February 2024 8 / 19
Neurons
input: vector
output: number in <0,1>
σ(x ∗ w + b)
linear classiﬁer
hyperplane cutting the vector space
selects one feature
Pavel Rychlý ·Understanding LLM ·28 February 2024 9 / 19
Neuron networks
second layer can implement and/or operators
grouping of features
all 10 features
5 out of 10 features
cutting the (original) vector space by several hyperplanes
regions in the vector space
Pavel Rychlý ·Understanding LLM ·28 February 2024 10 / 19
Transformers
Attention Is All You Need paper
attentions are important, but there other important components
encoder/decoder
layers of same structure with diﬀerent parameters
input: list of tokens (words)
output depends on model
Pavel Rychlý ·Understanding LLM ·28 February 2024 11 / 19
Transformers
Pavel Rychlý ·Understanding LLM ·28 February 2024 12 / 19
BERT - encoder only
output: vector (embedding) for each token
each layer:
attention
feed forward network (2 layers)
direct links
Pavel Rychlý ·Understanding LLM ·28 February 2024 13 / 19
Direct links
LayerNorm(x + Sublayer(x))
adding some information into original embedding
for each token separately
changing the original token embedding by small steps
some part of the original information is preserved
Pavel Rychlý ·Understanding LLM ·28 February 2024 14 / 19
Analogy
for each token we have stream
sending information/signal down the stream
each layer can make a small transformation
adding information from context
decreasing/increasing importance of a feature
Pavel Rychlý ·Understanding LLM ·28 February 2024 15 / 19
Feed forward
ﬁrst layer
single neuron selects a feature
result: vector of features
second layer
select feature combinations
transforms them into word vector space
Pavel Rychlý ·Understanding LLM ·28 February 2024 16 / 19
Attention
highlites some features from context
multi head = more features in single step
transforms them into word vector space
Pavel Rychlý ·Understanding LLM ·28 February 2024 17 / 19
GPT - decoder only
auto-regressive decoder
the last token is generating new token
Pavel Rychlý ·Understanding LLM ·28 February 2024 18 / 19
Summary
transormer is a ﬂow of information
starts with word embeddings
adding/changing fatures – moving in the vector space
all vectors are in the same vector space
Pavel Rychlý ·Understanding LLM ·28 February 2024 19 / 19