MUNI FI Word Embeddings Evaluation of Word Embeddings PA154 Language Modeling (8.2) Pavel Rychly Natural Language Processing Centre Faculty of Informatics, Masaryk University April 4, 2023 many hyperparameters, diffrent training data different results even for same parameters and data what is better? how to compare quality of vectors? evaluate a direct outcome: word similarities ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 Sketch Engine Thesaurus Thesaurus evaluation Gold standard Lemma Score Freq king 0.242 16,899 prince 0.213 6,355 Charles 0.189 8,952 elizabelh 0.177 3,567 eďwarď 0.176 6,484 mary 0.173 6,870 oenlleman 0.171 6,274 lady 0.170 11,905 husband 0.167 11,669 sister 0.167 8,062 mother 0.164 27,536 princess 0.160 2,944 father 0.159 23,824 wife 0.157 18,308 brother 0.155 11,049 henry 0.151 6,699 daughter 0.150 11,216 anne 0.149 4,386 queen British National Corpus (BNC) freq = 7,872 (70.10 per million) motherdoctor daughteiizabemrebpod^ dad annemarviaHv 9éor9e Philip «Jjll margarét President girl 3'mtherduke 3 guyjQhn guest J1"^^' ncess Source Most simiLar words to queen sereLex king, brookLyn, bowie, prime minister, mary, bronx, rolling stone, eLton John, royaL family, princess Thesaurs.com monarch, ruLer, consort, empress, regent, femaLe ruLer, femaLe sovereign, queen consort, queen dowager SkE on BNC king, prince, charLes, eLizabeth, edward, mary, gentLe- man, Lady, husband, sister, mother, princess, father SkE on enTenTen08 princess, prince, king, emperor, monarch, Lord, Lady, sister, Lover, ruLer, goddess, hero, mistress, warrior word2vec on BNC princess, prince, Princess, king, Diana, Oueen, duke, paLace, Buckingham, duchess, Lady-in-waiting, Prince powerthesaurus.org empress, sovereign, monarch, ruLer, czarina, queen consort, king, queen regnant, princess, rani, queen regent ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 Thesaurus evaluation Gold standard Analogy queries evaluation of word embeddings (word2vec) "a is to a* as b is to b*", where b* is hidden very low inter-annotater agreement there are many directions of similarities existing gold standards not usable ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 5/17 ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 6/17 Analogy queries Analogy queries evaluation of word embeddings (word2vec) "a is to a* as fa is to b*", where b* is hidden syntactic: good is to best as smart is to smarter semantic: Paris is to France as Tokyo is to Japan agreement by humans: evaluation of word embeddings (word2vec) "a is to a* as b is to fa*", where b* is hidden syntactic: good is to best as smart is to smarter semantic: Paris is to France as Tokyo is to Japan agreement by humans: Berlin - ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 Analogy queries Analogy queries evaluation of word embeddings (word2vec) "a is to a* as fa is to b*", where b* is hidden syntactic: good is to best as smart is to smarter semantic: Paris is to France as Tokyo is to Japan agreement by humans: Berlin - Germany evaluation of word embeddings (word2vec) "a is to a* as b is to fa*", where b* is hidden syntactic: good is to best as smart is to smarter semantic: Paris is to France as Tokyo is to Japan agreement by humans: Berlin - Germany London - ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 Analogy queries Analogy queries evaluation of word embeddings (word2vec) "a is to a* as fa is to b*", where b* is hidden syntactic: good is to best as smart is to smarter semantic: Paris is to France as Tokyo is to Japan agreement by humans: Berlin - Germany London - England / Britain / UK ? evaluation of word embeddings (word2vec) "a is to a* as b is to fa*", where b* is hidden syntactic: good is to best as smart is to smarter semantic: Paris is to France as Tokyo is to Japan agreement by humans: Berlin - Germany London - England / Britain / UK ? best match for linear combination of vectors: arg maxfc«€^ cos(b*,a* - a + b) ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 Analogy queries Alternatives to cosine similarity Analogy queries Alternatives to cosine similarity cos(x,y) = , arg maxb,€V cos(b*, a* - a + b) cos(x,y) = , arg max^.g^ cos(b*,a* - a + b) = arg maxb, €V(cos(b*, a*) - cos(b*,a) + cos(b*,b)) (CosAdd) ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 Analogy queries Alternatives to cosine similarity Thesaurus Evaluation cos(x,y) = , Vx'Vy, arg maxb,€V cos(b*, a* - a + b) = arg maxb*€v{cos(b*, a*) - cos(b*, a) + cos(b*,b)) (CosAdd) arn max,. cos(b',a')cos(b',b) argmaxfc,e^ Cos(b',a) (CosMul) SkE uses Jaccard similarity instead of cosine similarity: JacAddJacMul Results on capital-common-countries question set (462 queries) BNC SkELL count percent count percent CosAdd 58 12.6 183 39.6 CosMul 99 21.4 203 43.9 JacAdd 32 6.9 319 69.0 JacMul 57 12.3 443 95.9 word2vec 159 34.4 366 79.2 Results depends not only on data but also on the evaluation method. ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 Results on other corpora Problems of analogy queries More English corpora, using JacMul Corpus size (M) correct ■ BNC 112 57 ■ SkELL 1,520 443 ■ araneum maius (LCL sketches) 1,200 224 encluewebló 16,398 448 ■ ententen 08 3,268 0 ententen 12 12,968 0 ententen 13 22,878 439 Pair of words does not define an exact relation Berlin - Germany: capital, biggest city in what time? Canberra ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 9/17 ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 10/17 Problems of analogy queries Problems of analogy queries Pair of words does not define an exact relation Berlin - Germany: capital, biggest city in what time? Canberra, Rome ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 Problems of analogy queries Pair of words does not define an exact relation Berlin - Germany: capital, biggest city in what time? Canberra, Rome rare words/phrases Baltimore - Baltimore Sun: Cincinnati - Cincinnati Enquirer ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 Outlier detection list of words find the one which is not part of the cluster examples: ■ red, blue, green, dark, yellow, purpLe, pink, orange, brown ■ t-shirt, sheet, dress, trousers, shorts, jumper, skirt, shirt, coat Pair of words does not define an exact relation Berlin - Germany: capital, biggest city in what time? Canberra, Rome rare words/phrases Baltimore - Baltimore Sun: Cincinnati - ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 Outlier detection list of words find the one which is not part of the cluster examples: ■ red, bLue, green, dark, yeLLow, purpLe, pink, orange, brown ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 Evaluating Outlier Detection original data set by Camacho-Collados, Navigli 8 pairs of 8 words in a cluster and 8 outliers 8 x 8 = 64 queries Accuracy - the percentage of successfully answered queries, Outlier Position Percentage (OPP) Score - average percentage of the right answer (Outlier Position) in the list of possible clusters ordered by their compactness ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 11/17 ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 12/17 Problems of original data set Problems of original data set English only needs extra knowledge ■ Mercedes Benz, BMW, MicheLin, Audi, OpeL, VoLkswagen, Porsche, ALpina, Smart English only needs extra knowledge ■ Mercedes Benz, BMW, MicheLin, Audi, OpeL, VoLkswagen, Porsche, ALpina, Smart ■ (Bridgestone, Boeing, Samsung, MichaeL Schumacher, AngeLa MerkeL, Capri, pineappLe) ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 Problems of original data set Problems of original data set English only needs extra knowledge ■ Mercedes Benz, BMW, MicheLin, Audi, OpeL, VoLkswagen, Porsche, ALpina, Smart ■ (Bridgestone, Boeing, Samsung, MichaeL Schumacher, AngeLa MerkeL, Capri, pineappLe) ■ Peter, Andrew,James, John, Thaddaeus, BarthoLomew, Thomas, Noah, Matthew English only needs extra knowledge ■ Mercedes Benz, BMW, MicheLin, Audi, OpeL, VoLkswagen, Porsche, ALpina, Smart ■ (Bridgestone, Boeing, Samsung, MichaeL Schumacher, AngeLa MerkeL, Capri, pineappLe) ■ Peter, Andrewjamesjohn, Thaddaeus, BarthoLomew, Thomas, Noah, Matthew ■ January, March, May, JuLy, Wednesday, September, November, February, June ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 Problems of original data set New data set: HAMOD English only needs extra knowledge ■ Mercedes Benz, BMW, MicheLin, Audi, OpeL, VoLkswagen, Porsche, ALpina, Smart ■ (Bridgestone, Boeing, Samsung, MichaeL Schumacher, AngeLa MerkeL, Capri, pineappLe) ■ Peter, Andrewjamesjohn, Thaddaeus, BarthoLomew, Thomas, Noah, Matthew ■ January, March, May, JuLy, Wednesday, September, November, February, June ■ tiger, dog, Lion, cougar, jaguar, Leopard, cheetah, wiLdcat, Lynx mostly proper names (7 out of 8) 7 languages: Czech, Slovak, English, German, French, Italian, Estonian 128 clusters (8 words + 8 outliers) https://github.com/lexica Icomputing/hamod ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 ^avel Rychly ■ Evaluation of Word Embeddings ■ April 4,2023 New data set - example CoLors ELectronics Czech English Czech EngLish červená red teLevize teLevision modrá bLue reproduktor speaker zeLená green notebook Laptop žLutá yellow ta b Let tab Let fiaLová purpLe mp3 přehrávač mp3 pLayer růžová pink mobiL phone oranžová orange rádio radio hnědá brown pLaystation pLaystation dřevěná wooden b Lok notebook skLeněná gLass sešit workbook temná dark kniha book zářivá bright CD CD pruhovaný striped energie energy puntíkovaný dotted světLo Light smutná sad papir paper nizká Low ráno morning ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 Evaluation 9 clusters only, 72 queries OOP Accuracy Czes2 92.2 70.8 czTenTenl2 93.4 79.2 csTenTenl7 94.3 81.9 czTenTenl2 (fasttext) 97.7 87.5 Czech Common Crawl 98.1 95.8 ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 Construction ■ each human evaluator goes through all the sets (only once) for their native language ■ 1 exercise: 8 inliers + 1 outlier (randomly chosen from the list of outliers for each set) ■ in each turn, the evaluator selects the outlier ■ simple web interface for the exercise ■ Inter-Annotator Agreement: Estonian 0.93, Czech 0.97 DRUMS ■ PIANO ■ HEADPHONES ■ HARP ■ DOUBLE BASS ■ FLUTE | GUITAR J VIOLIN 1 J SAXOPHONE J j I'M NOT SURE QUIT ^avel Rychlý ■ Evaluation of Word Embeddings ■ April 4,2023 17/17