Large Language Models Philipp Koehn 24 October 2024 Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Recall: Statistical Machine Translation 1 • Statistical Machine translation argmaxep(e|/) = argmaxe p(f\e) p(e) • Combination of translation model p(f\e) and language model p{e) — translation model ensures correct meaning — language model ensures fluency Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Neural vs. Statistical Machine Translation 2 BLEU Scores with Varying Amounts of Training Data 30 21.8 20 16.4 10 23.4 18.1 24.9 19.6 26.2 21.2 26.9 22.2 27.9 23.5 28.6 25.7. 24.7 29.2_ ,4 26.1 29.. 6 29.2 26^9 30.3 30.1 27Ü" 18, 14. 11 31.1 ,4 28.6 0 7. Phrase-Based with Big LM Phrase-Based Neural 10 6 10 7 Corpus Size (English Words) 10* [from Six Challenges for Neural Machine Translation, 2017, Koehn and Knowles] Philipp Koehn Machine Translation: Large Language Models 24 October 2024 What Happened to the LM in MT? 3 • Edinburgh SMT system 2013: 126 billion token LM [Durrani etai., 2013]! • Fusion model: merge predictions from MT and LM [Guicehreetai.,2oi5]l • Backtranslation: synthesize source side of monolingual data [sennrich et ai., 2017]! • mBART: Monolingual pretraining [Liu et ai., 2020]! • None of them used data at the scale used in SMT LLMs finally do that now (since 2022) Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Training 12 0.95 0.90 i/i (Si 5 0.S5 c o 1b 0.80 5 0.75 0.70 ♦ ♦ Compute 6el8 lel9 3el9 6el9 - Ie20 - 3e20 ^— 6e20 - Ie21 - 3e2l - Ie22 1010 1011 Training Tokens 10 Scaling laws: more data —>• bigger models —>• better performance Today: trillions of words —>• 10s to 100s of billions of parameters Llama3 405B: trained on 16,384 GPUs — available open source Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Instruction Training • Examples of requests and responses constructed by human annotators • May be collected from actual user requests and edited by experts • May be generated from existing data sets Question Summarization Translation Answering Summarize the following paragraph into Translate from English to What is the one sentence. German. highest The Federal Reserve paused its English: My name is mountain in campaign of interest rate increases for Ozymandias, King of Kings; the world? the first time in more than a year. But Look on my Works, ye Mighty, The highest officials suggested that rates would rise and despair! mountain in more in 2023, as inflation remains "well German: Mein Name ist the world is above" the central bank's target. Ozymandias, König der Könige; Mount Summary: No interest rate rise for now Schau auf meine Werke, du Everest. but maybe later in the year. Mächtiger, und verzweifle! Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Adapting LLMs to MT • Convert parallel data into chat format Translate the following sentence from German to English. German: Das Haus ist groß. English: The house is big. • Use it as instruction data for fine-tuning large language modell • Our work: fine-tuning various LLMs with QLoRA [Zhang et ai., 2023] — Llama2 and BLOOMZ show best results — fine-tuning with QLoRa sufficientl • Adapting for document-level translation [wu et ai., 2024] • Adapting for specific domains [Zheng et ai., 2024] • BigTranslate: Adapting LLM for MT for over 100 Languages [Yang et ai., 2023] Philipp Koehn Machine Translation: Large Language Models 24 October 2024 ALMA • Advanced Language Model-based trAnslator [xu et ai., 2024] • Two stage fine tuning — fine-tune on monolingual data for languages involved — fine-tune on parallel data • Not much parallel data needed (only WMT development sets) • Building on Llama2-7B, better translation performance than GPT3.5 Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Prompt Engineering • How a task is presented to the language model matters • Black art: often unclear what is in the training data • 'Imagine you are../': creates a style or sets a context for generation • Detailed instructions — "Translate every word." — "Do not provide any explanations." — "Do not drop or add any information." Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Translation Guidance by Specification 32 ^ • Language models allow text generation with specifications — "Write in the style of William Shakespeare." — "Use formal language." — "Write in the style of a marketing message." — "Use the following terminology in your translation:..." • Unclear, if this requires any additional model training (especially in the case of a very specific style, e.g., house style of a company) • Alternatively, provide example of target language text in the expected style Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Chain of Thought 33 ^jiy Language model has very limited working memory Complex reasoning may require several inference steps What is 56 times 204? Think step by step. I first multiply 50 and 200, which is 10,000. Then I multiply 6 and 200, which is 1,200. I first multiply 50 and 4, which is 200. Then I multiply 6 and 4, which is 24. Adding together 10,000 and 1,200 and 200 and 24 gives 11,424. Intermediate steps create working memory Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Decomposition of Tasks Write an outline for an argument that LLMs are great. Outline: 1. LLMs have emerged as general problem solvers 2. LLMs have become publicly available 3. Many people find LLMs useful 4. ... Write a paragraph about "LLMs have emerged as general problem solvers". Language model (LLMs) have rapidly emerged as versatile general problem solvers across various domains. Leveraging vast datasets and sophisticated algorithms, LLMs exhibit remarkable capabilities in understanding, generating,... Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Translating Step-by-Step 35 ^ [Briakou et al.; Google, 2024] • Inspired by the work process of professional translators — pre-drafting research: find hard translation problems — drafting: translation with focus on accuracy — refinement: improve fluency — proofreading: check for errors • All these steps contribute to the quality of the translation Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Translating Step-by-Step Pre-drafting Research Drafting You will be asked to translate a piece of text from English into Chinese following the five stages of the translation process. Here is the context in which the text appears: Now, let's move on to the drafting stage. Draft Translation: Context: However, he said the most "responsible way" to cut spending would be to pass all 12 bills. With many Republicans reticent to see a shutdown and a 1 percent cut to defense spending, the urgency to pass the bills could see enough Republicans ally with Democrats to extend the deadline on the four spending bills and reach an agreement to fund the government, even if doing so puts Johnson in hot water with some of those on the hard right. To start, let's do some pre-drafting research on the above context: In this phase, your primary objective is to create a draft translation that accurately conveys the meaning of the source text presented below. At this stage, it is crucial to focus on adequacy, ensuring that your translation closely adheres to the source text. Your response should conclude with the draft translation. If context is missing, generate a general translation that is adaptable to various contexts. Avoid adding any additional information not present in the source text. All elements of the source text should be present in the translation. Research: During this phase, thorough research is essential to address components of the context text that pose translation challenges, The goal is to establish a comprehensive translation plan that covers the following category: • Idiomatic Expressions: o Identify idiomatic expressions that cannot be Give your best one translation for the following piece of text based on the pre-drafting analysis without providing alternatives: English: However, he said the most "responsible way" to cut spending would be to pass all 12 bills. With many Republicans reticent to see a shutdown and a 1 percent cut to defense spending, the urgency to pass the bills could see enough Philipp Koehn Machine Translation: Large Language Models 24 October 2024 in-context learning Philipp Koehn Machine Translation: Large Language Models 24 October 2024 In-Context Learning • Problem — language models are trained on very diverse language usage - it may be confused on what it is expected to do • Solution: provide examples ("shots") of the task in the prompt • This has been shown to be successful even for new tasks Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Multi-Shot Translation 39 ^jiy Provide examples in the prompt Translate from German to English. Here are some examples. German: Ein Hund bellt. English: A dog barks. German: Ein Schwein grunzt. English: A pig grunts. German: Eine Katze miaut. English: A cat meows. German: Ein Wolf heult. English: A wolf howls. Now translate the following sentence. German: Ein Vogel singt. English: This is the standard approach when prompting language models Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Provide Text as Style Guidance 40 We want to translate in a particular style, e.g., patents Translate in the style of a patent. Here is some example text of the style: According to an aspect of this invention, a method includes detecting a syntactic chunk in a first string in a first language, assigning a syntactic label to the detected syntactic chunk in the first string, aligning the detected syntactic chunk in the first string to a syntactic chunk in a second language string, said aligning based... Translate from German to English. German: Eine oder mehrere der folgenden Funktionen können ebenfalls enthalten sein. English: Philipp Koehn Machine Translation: Large Language Models 24 October 2024 Specify Terminology 41 ^ • A common constraint on translation is company-specific terminology • For example, legal domain — Rechtswissenschaft = jurisprudence (not law) — Kläger = plaintiff (not prosecutor) — Strafe = sentence (not penalty) • Provide them in the prompt • In reality not so simple: need to distinguish technical and casual use of terms Translate from German to English. Use the following terminology in the translation... Philipp Koehn Machine Translation: Large Language Models 24 October 2024