Current Challenges Philipp Koehn 2 November 2023 Philipp Koehn Machine Translation: Current Challenges 2 November 2023 WMT 2016 i human .6 .2 -- A -- Neural MT • uedin-nmt Statistical MT_* metamind uedin-syntax • • nyu-umontreal #online-b kit-limsi •• . • cambridge Kiir online-a jhu-syntax • . jhu-pbmt uedin-pbmt online-f # online-g bleu H-1-1-1-1-1-1-1-1-1 18 20 22 24 26 28 30 32 34 36 (in 2017 barely any statistical machine translation submissions) Philipp Koehn Machine Translation: Current Challenges 2 November 2023 2017: Google: "Near Human Quality" 2 ^ 6 perfect translation human £ _ -neural (GNMT) phrase-based (PBMT) English English English Spanish French Chinese > > > > > > Spanish French Chinese English English English Translation model Philipp Koehn Machine Translation: Current Challenges 2 November 2023 2018: More Hype Microsoft Research Achieves Human Parity For Chinese English Translation Written by Sue Gee Wednesday, 21 March 2018 Researchers in Microsoft's labs in Beijing and in Redmond and Washington have developed an Al machine translation system that can translate with the same accuracy as a human from Chinese to English. SDL Cracks Russian to English Neural Machine Translation Global Enterprises to Capitalize on Near Perfect Russian to English Machine Translation as SDL Sets New Industry Standard '90% of the system's output labelled as perfect byprofessional Russian-English translators Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Just Better Fluency? Adequacy +1% 100 80 60 III CS^EN DE^EN RO^EN RU^EN llONLINE-Bll UEDIN-NMT Fluency +13% 100 80 60 nrrr CS^EN DE^EN RO^EN RU^EN IIONLINE-BI ■ UEDIN-NMT (from: Sennrich and Haddow, 2017) Philipp Koehn Machine Translation: Current Challenges 2 November 2023 lack of training data Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Amount of Training Data Corpus Size (English Words) English-Spanish systems trained on 0.4 million to 385.7 million words Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Translation Examples Source A Republican strategy to counter the re-election of Obama i 1024 Un órgano de coordinación para el anuncio de libre determinación 1 512 Lista de una estrategia para luchar contra la elección de hoj as de Ohio 256 Explosion realiza una estrategia divisiva de luchar contra las elecciones de autor 1 128 Una estrategia republicana para la eliminación de la reelección de Obama 1 64 Estrategia siria para contrarrestar la reelección del Obama . 1 i 32 1 Una estrategia republicana para contrarrestar la reelección de Obama Philipp Koehn Machine Translation: Current Challenges 2 November 2023 domain mismatch Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Domain Mismatch 9 ^jiy System | Law Medical IT Koran Subtitles All Data 30.532.8 45.142.2 35.344.7 17.917.9 26.420.8 Law 31.134.4 12.118.2 3.5 6.9 1.3 2.2 2.8 6.0 Medical 3.910.2 39.443.5 2.0 8.5 0.6 2.0 1.4 5.8 IT 1.9 3.7 6.5 5.3 42.139.8 1.8 1.6 3.9 4.7 Koran 0.4 1.8 0.0 2.1 0.0 2.3 15.918.8 1.0 5.5 ^^^^ Subtitles 7.0 9.9 9.317.8 9.213.6 9.0 8.4 25.922.1 Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Translation Examples Source Schaue um dich herum. Ref. Look around you. All NMT: Look around you. SMT: Look around you. Law NMT: Sughum gravecorn. SMT: In order to implement dich Schaue . Medical NMT: EMEA / MB / 049 / 01-EN-Final Work progamme for 2002 SMT: Schaue by dich around . IT NMT: Switches to paused. SMT: To Schaue by itself . \t \t Koran NMT: Take heed of your own souls. SMT: And you see. Subtitles NMT: Look around you. SMT: Look around you . Philipp Koehn Machine Translation: Current Challenges 2 November 2023 noisy data Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Noise in Training Data • Crawled parallel data from the web (very noisy) SMT NMT WMT17 24.0 27.2 + Paracrawl 25.2 (+1.2) 17.3 (-9.9) (German-English, 90m words each of WMT17 and Crawl data) 5% 10% 20% 50% 100% Raw crawl data 27.4 24.2 26.6 24.2 24.7 24.4 20.9 24.S 17.3 +0.2 +0.2 +0.4 +0.8 -6.3 + 1.2 -0.9 +02 -2.5 .Q Q • Corpus cleaning methods [Xu and Koehn, EMNLP 2017] give improvements Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Types of Noise • Misaligned sentences • Disfluent language (from MT, bad translations) • Wrong language data (e.g., French in German-English corpus) • Untranslated sentences • Short segments (e.g., dictionaries) • Mismatched domain Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Mismatched Sentences • Artificial created by randomly shuffling sentence order • Added to existing parallel corpus in different amounts 5% 10% 20% 50% 100% 24.0 24.0 23.9 26.1 23.9 25.3 23.4 -0.0 -0.0 -0.1 —-0.1 " -0.6 • Bigger impact on NMT (green, left) than SMT (blue, right) Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Misordered Words 17 • Artificial created by randomly shuffling words in each sentence 5% 10% 20% 50% 100% Source 24.0 23.6 23.9 26.6 23.6 25.5 23.7 -0.0 -0.4 -0.1 -0.6 -0.4 Target 24.0 24.0 23.4 26.7 23.2 26.1 22.9 -0.0 -0.0 -0.6 -0.5 -0.8 -1.1 -1.1 • Similar impact on NMT than SMT, worse for source reshuffle Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Untranslated Sentences is ^ 5% 10% 20% 50% 100% 17.6 23.8 11.2 23.9 5.6 23.8 3.2 23.4 3.2 21.1 -0.2 -0.1 -0.2 -0.6 -2.9 Source -9.8 -16.0 -21.6 -24.0 -24.0 Target 27.2 27.0 26.7 26.8 26.9 -0.0 -0.2 -0.5 -0.4 -0.3 Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Wrong Language 19 5% 10% 20% 50% 100% fr source 26.9 24.0 -0.3 -0.0 26.8 23.9 -0.4 -0.1 26.8 23.9 -0.4 -0.1 26.8 23.9 -0.4 -0.1 26.8 23.8 -0.4 -0.2 fr target 26.7 24.0 26.6 23.9 26.7 23.8 26.2 23.5 25.0 23.4 -0.5 -0.0 -0.6 -0.1 -0.5 -0.2 -1.0 -0.5 -2.2 • Surprisingly robust, maybe due to domain mismatch of French data Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Short Sentences 5% 10% 20% 50% 1 -2 words 27.1 24.1 26.5 23.9 26.7 23.8 -0.1 +0.1 -0.7 -0.1 -0.5 -0.2 27.8 24.2 27.6 24.5 2M) 24.5 26.6 24.2 1 -5 words +0.6 +0.2 +0.4 +0.5 TdT +0.5 -0.6 +0-2 • No harm done Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Formal Constraints • Subtitles — translation has to fit into space on screen (may have to be shortened) — input and output broken up into linesl • Speech translation — input often not well-formed — real time translation: start while sentence is spoken — subtitles: have be readable in limited time — dubbing: sync up with video of speaker's mouth movementl • Poetry — meter — rhyme Philipp Koehn Machine Translation: Current Challenges 2 November 2023 catastrophic errors Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Catastrophic Errors News | Science and Technology Facebook apologises for rude mistranslation of Xi Jinping's name Company blames technical glitch that 'caused incorrect translations'of Chinese leader's name from Burmese to English. Facebook's auto translation Al fail leads to a nightmare for a Palestinian man The Al feature had "Good morning" in Arabic wrongly translated as "attack them" in Hebrew. By Gianluca Mezzoftore on October 24, 2017 f V Q Industry News • By Marion Marking On 3 Aug 2020 Thai Mistranslation Shows Risk of Auto-Translating Social Media Content After a machine translation of a post from English into Thai about the King's birthday proved offensive to the Thai monarchy, Facebook Thailand said it was deactivating auto-translate on Facebook and Instagram, revamping machine translation (MT) quality, and offering the Thai people its "profound apology." Philipp Koehn Machine Translation: Current Challenges 2 November 2023 What are Catastrophic Errors? • Generation of profanity — first step: maintain list of offensive words for each language — only eliminate these words, if the input did not include such words — but: offensive language is not limited to specific words • Generation of violent / inciting content • Opposite meaning • Mistranslation of names =4> All this is hard to detect Philipp Koehn Machine Translation: Current Challenges 2 November 2023 28 robustness Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Robustness to User Generated Content 29 English German daily content of #scaramouche from genshin impact #WM * mute #mouchecc for no cc tweets! not leak free http://dailymouch e.carrd.co x täglicher Inhalt von #scaramouche von genshin impact #]^# j{ stumm #mouchecc für keine CC-Tweets! nicht auslaufsicher ^ http://dailymouche.ca rrd.co Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Challenges • Jargon and acronyms • Misspellings (sometimes intended for effect) • Mangled grammar • Special symbols (emojis, etc.) • Hashtags, URLs,... • Use of dialectical languages • Use of non-standard writing systems (e.g., Latin script due to lack of keyboard) Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Some Methods • Special handling of non-words like emojis, hashtags, URLs • Creating synthetic noisy training data • Adversarial training • Resources — Machine translation of noisy text data set (MTNT) — WMT 2020 Shared Task on Machine Translation Robustness Philipp Koehn Machine Translation: Current Challenges 2 November 2023 bias Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Gender Bias The doctor asked the nurse to help her in the procedure El doctor le pidio a la enfermera que le ayudara con el procedimiento Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Gender Bias English ▼ Spanish ▼ the doctor said: x La doctora dijo: toma take the pill. la pildOra. (feminine) «0 IP El doctor dijo: toma la PildOra. (masculine) Open in Google Translate Feedback Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Robustness to Style "You Sound Just Like Your Father" Commercial Machine Translation Systems Include Stylistic Biases Dirk Hovy Federico Bianchi Tommaso Fornaciari Bocconi University ViaSarfatti 25, 20136 Milan, Italy {dirk.hovy, f.bianchi, fornaciari.tommaso}®unibocconi.it Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Dialect Bias • Models often trained only on standard languages (British, American) • Work less well on other dialects • Bigger problem for automatic speech recognition Dialect Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Evaluate Across Language Varieties 37 BLEU score on standard language is not enough Also need test sets for each language variety Headline quality Acceptable degradation on important language varieties Philipp Koehn Machine Translation: Current Challenges 2 November 2023 document-level translation Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Document-Level Translation The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. • Machine translation translates one sentence at a time • But: surrounding context may help Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Document-Level Translation 40 ^jiy The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. Machine translation translates one sentence at a time But: surrounding context may help — translation of pronouns may require co-reference Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Document-Level Translation The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. • Machine translation translates one sentence at a time • But: surrounding context may help — translation of pronouns may require co-reference — ambiguous words may be informed by broader context Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Document-Level Translation The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. • Machine translation translates one sentence at a time • But: surrounding context may help — translation of pronouns may require co-reference — ambiguous words may be informed by broader context — consistent translation of repeated words Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Conditioning on Broader Context 43 The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. Der Laden verkauft einen schönen Tisch. Full Document Translation Hierarchical attention — compute which previous sentences matter most — compute which words in these sentences matter most Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Conditioning on Broader Context 44 The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. J 1 Der Laden yerMuft ejnen sdfrM^IJsch. Er gefiült Jane sehr. ... • Concatenate all sentences together — document = very long sentence — special treatment for sentence boundaries — requires scaling of neural decoding implementation Philipp Koehn Machine Translation: Current Challenges 2 November 2023 machine translation and large language models Philipp Koehn Machine Translation: Current Challenges 2 November 2023 The Large Language Model Wave 46 • Large language models have overtaken much of NLP Philipp Koehn Machine Translation: Current Challenges 2 November 2023 LMs as Unsupervised Learners (2018) 47 Language Models are Unsupervised Multitask Learners Alec Radford *1 Jeffrey Wu *1 Rewon Child 1 David Luan1 Dario Amudei " 1 Ilya Sutskever " • Train language models on relatively clean text data (GPT-2) • Convert any NLP problem into a text continuation problem — pre prompt engineering — goes into some detail of how each task is converted — impressive performance on many tasks • Terrible at translation ... but all non-English text was removed from training corpus Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Three Stages of Training Large Language Model Stage 1: Train on massive amounts of text (up to a trillion words) Stage 2: Instruction training — Examples of requests / responses constructed by human annotators — "Summarize the following:..." — "Give me ten examples of..." — "Translate from French into English:..." Prompts Data set • Stage 3: Reinforcement learning from human feedback — Machine generates multiple responses to a prompt — Human annotators rank them — Train a reward model from — Fine-tune model with reward model Philipp Koehn Machine Translation: Current Challenges 2 November 2023 A Closer Look at PaLM for MT (2022) 49 ^ Prompting PaLM for Translation: Assessing Strategies and Performance David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, George Foster Google Research {vilar, freitag, colincherry, jmluo, vratnakar, fosterg}@google.com • Exploration of examples used for prompting • Evaluation with BLEU / BLEURT / MQM (human eval) • WMT 2021 test set for de,zh-»en, WMT 2014 for fr^en Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Comparison to State of the Art Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Human Evaluation: MQM 52 Language Models makes more adequacy errors, similar fluency German-English, MQM error categories (count of errors) Mistranslation Omission Add it ion I SOTA ■ PaLM Untranslated Awkward Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Translation Data in Training? (2023) 53 ^jiy Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability Eleftheria Briakou Colin Cherry George Foster ebriakou@cs.umd.edu colincherry@google.com fosterg@google.com PaLM is exposed to over 30 million translation pairs across at least 44 languages — 1.4% of training examples are bilingual — 0.34% have a translated sentence pair Most bilingual content is code-switched, about 20% contains translations Philipp Koehn Machine Translation: Current Challenges 2 November 2023 Impact of Translation Data • Sentence pairs can be extracted from bilingual samples — split sample into sentences — align English and French sentences with cross-lingual sentence embedding =4> parallel training corpus • Training on mined parallel data (WMT fr-en): 38.1 BLEU Training on WMT training data: 42.0 BLEU • Worse translation quality if bilingual content is removed from PaLM training • Much worse translation quality with smaller (IB, 8B) PaLM models Philipp Koehn Machine Translation: Current Challenges 2 November 2023 How About GPT? (2023) 55 ^ How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, Hany Hassan Awadalla Microsoft BLEU /o 60 b.0 40 30 20 10 H II de-en en-de cs-en en-cs ja-en en-ja zh-en en-zh ru-en en-ru uk-en en-uk is-en en-is hia-en en-ha fr-de de-fr ■ WMT Best «Microsoft ■ ChatGPT Philipp Koehn Machine Translation: Current Challenges 2 November 2023