Understanding LLM PA154 Language Modeling () Pavel Rychlý pary@fi.muni.cz 28 February 2024 Pavel Rychlý ·Understanding LLM ·28 February 2024 1 / 19 LLM = Large Language Models new term, big hype big expectations (AGI) little understanding Pavel Rychlý ·Understanding LLM ·28 February 2024 2 / 19 Language not normal distribution (Zipf’s) many rare events ambiguous words variable - changing, sublanguages Pavel Rychlý ·Understanding LLM ·28 February 2024 3 / 19 Symbolic processing ambiguous, variable, rare events leads to: hard for symbolic manipulations multiple words for same meaning same word for multiple meanings Pavel Rychlý ·Understanding LLM ·28 February 2024 4 / 19 Word embeddings words represented as vectors one word = vector of 500 numbers similar words - closer vectors not important dimensions can represents different features Pavel Rychlý ·Understanding LLM ·28 February 2024 5 / 19 Word features gramatical part of speech, nuber, gender syntactic used with “in”/“at”, always with a particle semantic positive sentiment, movement meaning, fruits style formal, colloquial domain math, biology form Pavel Rychlý ·Understanding LLM ·28 February 2024 6 / 19 Word features features are not independent math – scientific used with “in” – noun in capital form – proper noun features are not discrete each feature corespond to a (set of) dimension most features are valid for only small set of words most words have (almost) 0 for most features multiple meanings = union of features Pavel Rychlý ·Understanding LLM ·28 February 2024 7 / 19 Phrase/sentece embeddings vector space is very big just 2 values (0,1) in each dimension = 25 00 combinations, 105 0 same vector space for phrases average of words different words with same meanings – same embedding Czech president, president of the Czech Republic sentences, paragraphs, documents, . . . Pavel Rychlý ·Understanding LLM ·28 February 2024 8 / 19 Neurons input: vector output: number in <0,1> σ(x ∗ w + b) linear classifier hyperplane cutting the vector space selects one feature Pavel Rychlý ·Understanding LLM ·28 February 2024 9 / 19 Neuron networks second layer can implement and/or operators grouping of features all 10 features 5 out of 10 features cutting the (original) vector space by several hyperplanes regions in the vector space Pavel Rychlý ·Understanding LLM ·28 February 2024 10 / 19 Transformers Attention Is All You Need paper attentions are important, but there other important components encoder/decoder layers of same structure with different parameters input: list of tokens (words) output depends on model Pavel Rychlý ·Understanding LLM ·28 February 2024 11 / 19 Transformers Pavel Rychlý ·Understanding LLM ·28 February 2024 12 / 19 BERT - encoder only output: vector (embedding) for each token each layer: attention feed forward network (2 layers) direct links Pavel Rychlý ·Understanding LLM ·28 February 2024 13 / 19 Direct links LayerNorm(x + Sublayer(x)) adding some information into original embedding for each token separately changing the original token embedding by small steps some part of the original information is preserved Pavel Rychlý ·Understanding LLM ·28 February 2024 14 / 19 Analogy for each token we have stream sending information/signal down the stream each layer can make a small transformation adding information from context decreasing/increasing importance of a feature Pavel Rychlý ·Understanding LLM ·28 February 2024 15 / 19 Feed forward first layer single neuron selects a feature result: vector of features second layer select feature combinations transforms them into word vector space Pavel Rychlý ·Understanding LLM ·28 February 2024 16 / 19 Attention highlites some features from context multi head = more features in single step transforms them into word vector space Pavel Rychlý ·Understanding LLM ·28 February 2024 17 / 19 GPT - decoder only auto-regressive decoder the last token is generating new token Pavel Rychlý ·Understanding LLM ·28 February 2024 18 / 19 Summary transormer is a flow of information starts with word embeddings adding/changing fatures – moving in the vector space all vectors are in the same vector space Pavel Rychlý ·Understanding LLM ·28 February 2024 19 / 19