Mathematical Representation of Words and Their Meanings ooooooooooo PLIN064 Uvod do digital humanities Zuzana Nevěřilová xpopelkOf i.muni.cz Centrum zpracování přirozeného jazyka, B203 Fakulta informatiky, Masarykova univerzita 20. listopadu 2019 Mathematical Representation of Words and Their Meanings •oooooooooo Mathematical Representation of Words and Their Meanings Assumptions: • words (or other finite representations) exist • words are used in texts with different frequency (in a certain distribution) • some words are used together more often than other word tuples (n-grams) word —> number = not practical (what numbers are similar?) word —> vector = practical (vector similarity measured by their angle) Mathematical Representation of Words and Their Meanings O0OOOOOOOOO How to Convert one Word to a Vector there are oc possibilities, however, we never count with one word let's focus on words in contexts The cat sat on the mat ■ 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 Mathematical Representation of Words and Their Meanings O0OOOOOOOOO How to Convert one Word to a Vector there are oc possibilities, however, we never count with one word let's focus on words in contexts The cat sat on the mat ■ 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 Mathematical Representation of Words and Their Meanings O0OOOOOOOOO How to Convert one Word to a Vector there are oc possibilities, however, we never count with one word let's focus on words in contexts The cat sat on the mat ■ 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 one hot encoding = the angle between two different vectors is always 90° Mathematical Representation of Words and Their Meanings O0OOOOOOOOO How to Convert one Word to a Vector there are oc possibilities, however, we never count with one word let's focus on words in contexts vocabulary matrix of size V = 6 Mathematical Representation of Words and Their Meanings OO0OOOOOOOO Ont Hot Encoding There is no information about meaning in one hot encoded vectors. The only information that is present is: • the word is in the vocabulary • the word is different from another word For encoding/decoding, we need a lookup table: the l cat 2 sat 3 on 4 mat 5 ■ 6 Mathematical Representation of Words and Their Meanings OOO0OOOOOOO Ont Hot Encoding • does not encode information about word distribution • the vector space dimensionality is too big Challenge: • make the vector space more dense • encode the word in a way that similar words have vectors to similar direction (the angle between two vectors —>> 0°) Mathematical Representation of Words and Their Meanings OOOO0OOOOOO Training Word Embeddings Two basic approaches: • count-based: co-occurrence matrix • context-based: skip-grams in sliding window Mathematical Representation of Words and Their Meanings OOOO0OOOOOO Training Word Embeddings Two basic approaches: • count-based: co-occurrence matrix • context-based: skip-grams in sliding window the cat the 0.4 0.3 0.1 cat 0.3 0.1 0.2 0.1 0.2 0.1 Mathematical Representation of Words and Their Meanings OOOO0OOOOOO Training Word Embeddings Two basic approaches: • count-based: co-occurrence matrix • context-based: skip-grams in sliding window the cat the 0.4 0.3 0.1 cat 0.3 0.1 0.2 0.1 0.2 0.1 the cat sat on the mat Mathematical Representation of Words and Their Meanings OOOO0OOOOOO Training Word Embeddings Two basic approaches: • count-based: co-occurrence matrix • context-based: skip-grams in sliding window the cat the 0.4 0.3 0.1 cat 0.3 0.1 0.2 0.1 0.2 0.1 the cat sat on the mat Mathematical Representation of Words and Their Meanings OOOO0OOOOOO Training Word Embeddings Two basic approaches: • count-based: co-occurrence matrix • context-based: skip-grams in sliding window the cat the 0.4 0.3 0.1 cat 0.3 0.1 0.2 0.1 0.2 0.1 the cat sat on the mat Mathematical Representation of Words and Their Meanings OOOO0OOOOOO Training Word Embeddings Two basic approaches: • count-based: co-occurrence matrix • context-based: skip-grams in sliding window the cat the 0.4 0.3 0.1 cat 0.3 0.1 0.2 0.1 0.2 0.1 the cat sat on the mat Mathematical Representation of Words and Their Meanings ooooo«ooooo Context-Based Representation Learning In (supervised) machine learning, we provide the algorithm: • input and correct output in many examples • loss function Mathematical Representation of Words and Their Meanings ooooo«ooooo Context-Based Representation Learning In (supervised) machine learning, we provide the algorithm: • input and correct output in many examples • loss function The algorithm: splits the input data into training and validation sets 1. makes hypotheses about the function from input to output on training data 2. measures the loss (the error) on validation data 3. changes the hypothesis and recalculates Mathematical Representation of Words and Their Meanings ooooo«ooooo Context-Based Representation Learning In (supervised) machine learning, we provide the algorithm: • input and correct output in many examples • loss function The algorithm: splits the input data into training and validation sets 1. makes hypotheses about the function from input to output on training data 2. measures the loss (the error) on validation data 3. changes the hypothesis and recalculates machine learning = iterative process Mathematical Representation of Words and Their Meanings OOOOOO0OOOO Context-Based Representation Learn the cat sat on the mat Context Target (0, cat) the (the, sat) cat (cat, on) sat (sat, the) on (on, mat) the Mathematical Representation of Words and Their Meanings OOOOOO0OOOO Context-Based Representation Learn the cat sat on the mat Context Target (0, cat) the (the, sat) cat (cat, on) sat (sat, the) on (on, mat) the Mathematical Representation of Words and Their Meanings OOOOOO0OOOO Context-Based Representation Learn the cat sat on the mat Context Target (0, cat) the (the, sat) cat (cat, on) sat (sat, the) on (on, mat) the Mathematical Representation of Words and Their Meanings OOOOOO0OOOO Context-Based Representation Learn the cat sat on the mat Context Target (0, cat) the (the, sat) cat (cat, on) sat (sat, the) on (on, mat) the Mathematical Representation of Words and Their Meanings OOOOOO0OOOO Context-Based Representation Learn the cat sat on the mat Context Target (0, cat) the (the, sat) cat (cat, on) sat (sat, the) on (on, mat) the Mathematical Representation of Words and Their Meanings OOOOOO0OOOO Context-Based Representation Learn the cat sat on the mat Context Target (0, cat) the (the, sat) cat (cat, on) sat (sat, the) on (on, mat) the Mathematical Representation of Words and Their Meanings OOOOOO0OOOO Context-Based Representation Learning the cat sat on the mat Context Target (0, cat) the (the, sat) cat (cat, on) sat (sat, the) on (on, mat) the The learning objective given context word w; what is the target word wjl The loss function number of correct targets in the validation set Mathematical Representation of Words and Their Meanings OOOOOOO0OOO Context-Based Representation Learning vocabulary matrix V, embedding matrix E, context matrix C Mathematical Representation of Words and Their Meanings ooooooo«ooo Context-Based Representation Learning vocabulary matrix V, embedding matrix E, context matrix C Words Wj and w; are one hot encoded in V as vectors v, and vj Mathematical Representation of Words and Their Meanings OOOOOOO0OOO Context-Based Representation Learning vocabulary matrix V, embedding matrix E, context matrix C Words w; and wj are one hot encoded in V as vectors v\ and vj the /—th row is selected from E (using multiplication v\) —>» embedding vector e,- Mathematical Representation of Words and Their Meanings OOOOOOO0OOO Context-Based Representation Learning vocabulary matrix V, embedding matrix E, context matrix C Words w; and wj are one hot encoded in V as vectors v\ and vj the /—th row is selected from E (using multiplication v\) —>» embedding vector e,- the j—th column is selected from C (using multiplication e,-f e,- is not one hot) —>» context vector c/ By the two transformations, we calculate that the word wj is the target for context w\. The loss function decides whether the calculation is (in)correct. Mathematical Representation of Words and Their Meanings OOOOOOOO0OO Context-Based Representation Learning: Summary • we need lookup table • we build vocabulary matrix arbitrarily • we build context matrix from observations • the embedding matrix is calculated iteratively • the vectors calculated using the embeddings, encode similar words to similar vectors (having a small angle) Mathematical Representation of Words and Their Meanings ooooooooo«o Context-Based Representation Learning: Summa Example DH project: Mathematical Representation of Words and Their Meanings oooooooooo» 3 Collobert, R., Weston, J., Bottou, L, Karlen, M., Kavukcuoglu, K., and Kuksa, P. (2011). Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493-2537. https://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html