Introduction

Administrativa
• Class web site:
• Graduate section: Tuesdays and Thursdays, 1:30-2:45, Ames 234
• Instructor: Philipp Koehn
• TAs: Huda Khayrallah, Brian Thompson, Tanay Agarwal
• Grading
  – five programming assignments (12% each)
  – final project (30%)
  – in-class presentation: language in ten minutes (10%)

Why Take This Class?
• Close look at an artificial intelligence problem
• Practical introduction to natural language processing
• Introduction to deep learning for structured prediction

Textbook
Neural Machine Translation
Philipp Koehn
Center for Speech and Language Processing
Department of Computer Science
Johns Hopkins University
1st public draft August 7, 2015
2nd public draft (arxiv) September 22, 2017
3rd draft September 25, 2017

some history

An Old Idea

Warren Weaver on translation as code breaking (1947):
When I look at an article in Russian, I say: "This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode". Early Efforts and Disappointment
• Excited research in 1950s and 1960s
  1954 Georgetown experiment
  Machine could translate 250 words and 6 grammar rules
• 1966 ALPAC report:
  – only $20 million spent on translation in the US per year
  – no point in machine translation

Rule-Based Systems
• Rule-based systems
  – build dictionaries
  – write transformation rules
  – refine, refine, refine
• M´et´eo system for weather forecasts (1976)
• Systran (1968), Logos and Metal (1980s)

"have" := if subject(animate) and object(owned-by-subject) then translate to "kade... aahe"
          if subject(animate) and object(kinship-with-subject) then translate to "laa... aahe"
          if subject(inanimate) then translate to "madhye... aahe"

Statistical Machine Translation
• 1980s: IBM
• 1990s: increased research
• Mid 2000s: Phrase-Based MT (Moses, Google)
• Around 2010: commercial viability

Neural Machine Translation
• Late 2000s: successful use of neural models for computer vision
• Since mid 2010s: neural network models for machine translation
• 2016: Neural machine translation the new state of the art

Hype
Hype
1950 1960 1970 1980 1990 2000 2010
Reality
Georgetown experiment
Expert systems / 5th generation AI
Statistical MT
Neural MT

how good is machine translation? Machine Translation: Chinese

Machine Translation: French

A Clear Plan
Source Target
Lexical Transfer
Interlingua

A Clear Plan
Source Target
Lexical Transfer
Syntactic Transfer
Interlingua
Analysis Generation

A Clear Plan
Source Target
Lexical Transfer
Syntactic Transfer
Semantic Transfer
Interlingua
Analysis Generation

A Clear Plan
Source Target
Lexical Transfer
Syntactic Transfer
Semantic Transfer
Interlingua
Analysis Generation

Learning from Data
Statistical Machine Translation System
Training Data
Linguistic Tools
Statistical Machine Translation System
Translation
Source Text
Training
Using parallel corpora
monolingual corpora
dictionaries

why is that a good plan? Word Translation Problems
• Words are ambiguous
  He deposited money in a bank account with a high interest rate. Sitting on the bank of the Mississippi, a passing ship piqued his interest.
• How do we find the right meaning, and thus translation?
• Context should be helpful

Syntactic Translation Problems
• Languages have different sentence structure
  das behaupten sie wenigstens
  this claim they at least
  the she
• Convert from object-verb-subject (OVS) to subject-verb-object (SVO)
• Ambiguities can be resolved through syntactic analysis
  – the meaning the of das not possible (not a noun phrase)
  – the meaning she of sie not possible (subject-verb agreement)

Semantic Translation Problems
• Pronominal anaphora
  I saw the movie and it is good.
• How to translate it into German (or French)?
  – it refers to movie
  – movie translates to Film
  – Film has masculine gender
  – ergo: it must be translated into masculine pronoun er
• We are not handling this very well [Le Nagard and Koehn, 2010]

Semantic Translation Problems
• Coreference
  Whenever I visit my uncle and his daughters, I can't decide who is my favorite cousin.
• How to translate cousin into German? Male or female?
• Complex inference required

Semantic Translation Problems
• Discourse
  Since you brought it up, I do not agree with you.
  Since you brought it up, we have been working on it.
• How to translated since? Temporal or conditional?
• Analysis of discourse structure — a hard problem

Learning from Data
• What is the best translation?
  Sicherheit → security 14,516
  Sicherheit → safety 10,015
  Sicherheit → certainty 334 Learning from Data
• What is the best translation?
  Sicherheit → security 14,516
  Sicherheit → safety 10,015
  Sicherheit → certainty 334
• Counts in European Parliament corpus

Learning from Data
• What is the best translation?
  Sicherheit → security 14,516
  Sicherheit → safety 10,015
  Sicherheit → certainty 334
• Phrasal rules
  Sicherheitspolitik → security policy 1580
  Sicherheitspolitik → safety policy 13
  Sicherheitspolitik → certainty policy 0
  Lebensmittelsicherheit → food security 51
  Lebensmittelsicherheit → food safety 1084
  Lebensmittelsicherheit → food certainty 0
  Rechtssicherheit → legal security 156
  Rechtssicherheit → legal safety 5
  Rechtssicherheit → legal certainty 723

Learning from Data
• What is most fluent?
  a problem for translation 13,000
  a problem of translation 61,600
  a problem in translation 81,700

Learning from Data
• What is most fluent?
  a problem for translation 13,000
  a problem of translation 61,600
  a problem in translation 81,700
• Hits on Google

Learning from Data
• What is most fluent?
  a problem for translation 13,000
  a problem of translation 61,600
  a problem in translation 81,700
  a translation problem 235,000

Learning from Data
• What is most fluent?
  police disrupted the demonstration 2,140
  police broke up the demonstration 66,600
  police dispersed the demonstration 25,800
  police ended the demonstration 762
  police dissolved the demonstration 2,030
  police stopped the demonstration 722,000
  police suppressed the demonstration 1,400
  police shut down the demonstration 2,040

Learning from Data
• What is most fluent?
  police disrupted the demonstration 2,140
  police broke up the demonstration 66,600
  police dispersed the demonstration 25,800
  police ended the demonstration 762
  police dissolved the demonstration 2,030
  police stopped the demonstration 722,000
  police suppressed the demonstration 1,400
  police shut down the demonstration 2,040

where are we now? Word Alignment
house the in stay will he that assumes michael
michael geht davon aus dass er im haus bleibt ,

Phrase-Based Model
• Foreign input is segmented in phrases
• Each phrase is translated into English
• Phrases are reordered
• Workhorse of today's statistical machine translation

Syntax-Based Translation
Sie PPER will VAFIN eine ART Tasse NN Kaffee NN trinken VVINF
NP VP S
PRO she VB drink NN | cup IN | of NP PP NN NP DET | a VBZ | wants VB VP VP NPTO | to NN coffee
S PRO VP
➏ ➊ ➋ ➌ ➍ ➎

Semantic Translation
• Abstract meaning representation [Knight et al., ongoing]
  (w / want-01
    :agent (b / boy)
    :theme (l / love
      :agent (g / girl)
      :patient b))
• Generalizes over equivalent syntactic constructs (e.g., active and passive)
• Defines semantic relationships
  – semantic roles
  – co-reference
  – discourse relations
• In a very preliminary stage

Neural Model
Input Word Embeddings
Left-to-Right Recurrent NN
Right-to-Left Recurrent NN
Attention
Input Context
Hidden State
Output Word Predictions
Given Output Words
Error
Output Word Embedding
the house is big . das Haus ist groß ,

what is it good for? what is it good enough for?

Why Machine Translation?

Assimilation — reader initiates translation, wants to know content
• user is tolerant of inferior quality
• focus of majority of research (GALE program, etc.)

Communication — participants don't speak same language, rely on translation
• users can ask questions, when something is unclear
• chat room translations, hand-held devices
• often combined with speech recognition, IWSLT campaign

Dissemination — publisher wants to make content available in other languages
• high demands for quality
• currently almost exclusively done by human translators

Problem: No Single Right Answer

Israeli officials are responsible for airport security.
Israel is in charge of the security at this airport.
The security work for this airport is the responsibility of the Israel government. Israeli side was in charge of the security of this airport.
Israel is responsible for the airport's security.
Israel is responsible for safety work at this airport.
Israel presides over the security of the airport.
Israel took charge of the airport security.
The safety of this airport is taken charge of by Israel.
This airport's security is the responsibility of the Israeli security officials. Quality
HTER assessment
0% publishable
10% editable
20%
30% gistable
40% triagable
50%
(scale developed in preparation of DARPA GALE programme)

Applications
HTER assessment application examples
0% Seamless bridging of language divide
publishable Automatic publication of official announcements
10% editable Increased productivity of human translators
20% Access to official publications
Multi-lingual communication (chat, social networks)
30% gistable Information gathering
Trend spotting
40% triagable Identifying relevant documents
50%

Current State of the Art
HTER assessment language pairs and domains
0% French-English restricted domain
publishable French-English technical document localization
10% French-English news stories
editable German-English news stories
20%
30% gistable Swahili–English news stories
40% triagable Uyghur–English news stories
50%
(informal rough estimates by presenter)

Thank You
questions? 