Verbmobil Translation of Face-To-Face Dialogs Wolfgang Wahlster German Research Center for Artificial Intelligence (DFKI) Stuhlsatzenhausweg 3 D-6600 Saarbrücken 11, Germany Phone: +49 681 302 5252 or 2363 Fax: +49 681 302 5341 E-mail: wahlster@dfki.uni-sb.de Abstract Verbmobil is a long-term project on the translation of spontaneous language in negotiation dialogs. We describe the goals of the project, the chosen discourse domains and the initial project schedule. We discuss some of the distinguishing features of Verbmobil and introduce the notion of translation on demand and variable depth of processing in speech translation. We describe the role of anytime modules for efficient dialog translation in close to real time. The long-term vision behind the project Verbmobil is a portable translation device that you can carry to a meeting with Speakers of other foreign languages and it will translate what you say for them. Fig. 1: English as the Common Dialog Language in Verbmobil This very ambitious scientific goal will be persued in a series of well-defined project phases. The first versions of Verbmobil will provide translation on demand for the two participants who have a passive knowledge of English but of which neither is a fluent speaker. We assume that most of the dialog will be conducted in English as a common dialog language. This is a realistic assumption for most international technical or business discussions. But for uncommon words or phrases, complex constructions and critical segments of the negotiation dialog the participants may want to switch back to their native language. This means that they need translation help and therefore turn to their Verbmobil devices. In the course of the conversation each dialog partner can activate his version of Verbmobil (eg. German-to-English or Japanese-to-English translation) and signal that he is now speaking in his native language (eg. German or Japanese), and that what he says should be translated into English (see Fig. 1). In O. Herzog, T. Christaller, D. Schütt (eds.): Grundlagen und Anwendungen der Künstlichen Intelligenz, Berlin: Springer, pp. 393-402. This means that there are three input modes for Verbmobil: 1) Both dialog participants speak English with a German or Japanese accent. In this case, no translation is necessary, but Verbmobil has to follow the conversation and extract context information for subsequent translation tasks. This is an extremely difficult problem, since the input can be ill-formed in many ways, so that various phonetic and grammatical constraints have to be relaxed in order to cope with the foreign accent and unusual constructions. Often Verbmobil will extract only a very shallow discourse using keyword spotting or other partial analysis techniques. 2) In the course of an utterance, a participant switches from English as the common dialog language back to German or Japanese as his native language. In this case, Verbmobil must generate a translation that fits with the context of the English sentence fragment. For example, if a German participant says "Let's meet again in June außer am Pfingstmontag " Verbmobil should produce "except on Whit Monday" to complete the English fragment correctly (the arrows and indicate that the speaker has signalled the code switching to Verbmobil). 3) The participant speaks in his own language and Verbmobil will translate his utterance into English. In this case, Verbmobil must generate an appropriate approximation of the communicative intent of the input, in close to real time. In many situations, Verbmobil will be able to find translations that preserve most but not necessary all of the content of the original, since translation is inescapably a matter of compromise. Clarification subdialogs play an important role in the conversational setting discussed above, since the dialog partners are no fluent speakers of English and Verbmobil is an imperfect understander and translator. In the Verbmobil project, two types of clarification subdialogs are studied (see Fig. 2): 1) Clarification subdialogs between the participants are conducted in English. There are two variants of this type of subdialog: both dialog partners use English or Verbmobil translates their utterances from their native language into English. 2) Clarification subdialogs between Verbmobil and one participant are conducted in the native language of the respective dialog partner. Fig. 2: Two Types of Clarification Subdialogs The Project Goals There are four distinguishing features of the Verbmobil approach: speaker-adaptive recognition of spontaneous speech negotiation dialogs in face-to-face situations portable translation device that can be tailored to the individual user and to specified application domains three language scenario (English, German, Japanese) with English as a dialog language, ensuring system transparency and user acceptance. In contrast to previous projects on speech translation (cf. [2], [5]) Verbmobil does not deal with telephone conversations but with face-to-face dialogs in a small meeting room. In face-to-face dialog translation we can exploit the fact that information passes between the participants not only on the linguistic channel but also on various nonverbal and paralinguistic channels. The hearer can merge information from the translation with information from gestural motions of the hands, fingers, head and eyes, eyeblinks, eyebrows movements, change of body posture and orientation. The research program includes some empirical investigations of translation and interpreting as done by humans in similar situations. Verbmobil does not deal with read speech input, but with incrementally produced spontaneous dialog contributions. Such utterances are rarely well-formed, since speakers make errors and correct them. Verbmobil has to deal with false starts, aborted phrases, speech repairs, hesitations, interjections, self-correction phrases and many other characteristic features of spontaneous speech (see Fig. 3). Fig. 3: Challenges of Language Technology In the discourse situation studied for the initial demonstrator the dialog partners discuss a possible date for their next meeting using a calendar in front of them. After the development of the initial demonstrator, the domain of discourse will be extended considerably for the first research prototype. Two negotiation tasks will be considered for the research prototype (see Fig. 4). Note that the appointment scheduling task is a subtask of both scenarios considered for the research prototype. The domains chosen deal with linguistically ordinary language, so that the linguistic knowledge sources can simply be extended when the domain is scaled up. In all conversational settings studied in the Verbmobil project the subject matter is limited and the aims of the dialog partners are known in advance. We take it that both dialog partners come to a meeting in a spirit of cooperation and that they are highly motivated to reach a successful conclusion. Fig. 4: Discourse Domains for Verbmobil. Verbmobil channels energy into key areas of language technology and integrates major subfields of advanced information technology like Fig. 5: Integrating Major Subfields of Language Technology natural language processing, speech recognition and synthesis, machine translation, dialog and knowledge processing (cf. Fig. 5). Since there in no doubt, that the fact that language is always situated is very important for translation and that a proper translation almost always depends on context, Verbmobil must integrate research on translation with work on dialog processing as well as knowledge representation and reasoning. Verbmobil is an interdisciplinary attempt to build a face-to-face translation system on the basis of current theories that leading researchers in artificial intelligence, computational linguistics, speech processing, neuro-computing and translation science would subscribe to. The Verbmobil consortium believes that the scientific foundation of dialog translation technology should never be compromised in the interests of achieving some functionality or speed-ups in the short run by ad hoc techniques, that cannot be generalized and scaled-up. Anytime Modules for Face-to-Face Dialog Translation Obviously, there is a tradeoff between run-time and quality of results in systems for faceto-face dialog translation. Verbmobil's analysis should not be deeper than necessary, its translation should be as shallow as possible, and its generation process should start as soon as possible. Fig. 6: Anytime Modules as Coroutines This means that the major components of the system must work in an incremental mode allowing the immediate processing of parts of a stepwise provided input. These modules will be realized as anytime modules for the sake of resource-bounded processing of discourse. Anytime modules are modules whose quality of results improve gradually as computation time increases. They yield imperfect but not useless results if interrupted before completion. If an anytime module is restarted, it can improve what it has generated so far. For Verbmobil anytime modules are needed on various levels of granularity, e.g. speech analysis, parsing, transfer, generation pronoun resolution, focus detection, lexical choice Fig. 7: Lexical Choice as an Anytime Module All Verbmobil modules integrate a wide spectrum of layered methods: from simple and low cost to complex and expensive techniques. This can be illustrated by the problem of lexical choice. If lexical choice is implemented as an anytime module, the quality of the results can be measured in terms of the precision of communicating the intended concept in a given situation. The concept-to-word mapping can be achieved by a wide spectrum of techniques from very fast methods using the frequency of concept-word pairs to very elaborate methods like checking possible communicative effects and implicatures. The concept of anytime modules is tightly connected to the idea of variable depth of processing in a speech translation system. Verbmobil will use a multi-layered semantic representation language, that allows for all kinds of underspecification in the surface-oriented layers. In many cases, ambiguous quantifier scope or PP attachment in the source language need not be resolved before being translated, since a corresponding ambiguity can be captured in the target language. This leads to the new problem of language generation from disjunctive semantic structures. It is important that each layer of the semantic representation language comes with a specialized inference component, so that even on the level of surface-oriented representations simple inferences can be drawn. While these inferences may be based on primitive rewriting techniques, the inference engine on the more elaborate levels of meaning representation may be a full theorem prover. Fig. 8: The Notion of Variable Depth of Processing The Project Structure The Verbmobil project is funded by the German Ministry for Research and Technology (BMFT) and an industrial consortium. For the first four years of the project the BMFT funding amounts to 60 Million Deutschmarks. The BMFT commisioned two feasibility studies on the goals of Verbmobil: one from a consortium of German industrial and academic research groups (see [3]) and another from the Center for the Study of Language and Information (CSLI) in the US (see [1]). Based on the positive recommendations of the two independent studies a detailed project plan and schedule was prepared (see [4]), that formed the basis of a call for proposals in July 1992. An international advisory and review board was appointed by the BMFT consisting of 10 well-known experts in speech, language and translation technology. The scientific review of all submitted proposals was finished at the end of January 1993. The main phase of the project is starting in May 1993. The project is planned for 8 to 10 years and the first phase of 4 years is structured by 2 major milestones: a demonstrator after 2 years and a research prototype after 4 years (see Fig. 9). The central project coordination task and the implementation of the demonstrator and research prototype will be carried by the German Research Center for AI (DFKI). Fig. 9: The Project Schedule for Verbmobil The success of such an ambitious translation project obviously depends on international cooperation. It is planned to have an intensive collaboration with the ATR Interpreting Telecommunications Research Laboratories in Kyoto. In March 1993, this well-known Japanese center for speech translation research started a new project that will end in March 2000. Like Verbmobil this project deals with the translation of spontaneous dialog language. The funding amounts to 16 billion yen. Data collection, speech modules and linguistic knowledge sources for the Japanese language are the major areas of the planned collaboration. For work packages concerning the English language, cooperations have been prepared with three US research groups: Carnegie Mellon University, CSLI at Stanford University and the International Computer Science Institute (ICSI) at Berkeley. References: [1] Kay, M., Gawron, J.M., Norvig, P.: Verbmobil: A Translation System for Face-to-Face Dialog. BMFT Study, CSLI, Stanford Univ., August 1991. [2] Morimoto, T., Shikano, K., lida, H., Kurematsu, A.: Integration of Speech Recognition and Language Processing in the Spoken Language Translation System SL-TRANS. In: Proc. of the Intern. Conference on Speech and Language Processing, 1990, p. 921 - 928. [3] Verbmobil-Consortium: A Portable Translation Device. BMFT Study, Siemens, Munich, August 1991 (in German). [4] Wahlster, W., Engelkamp, J. (eds.): Scientific Goals and Networks of Work Packages of the Verbmobil Project. BMFT Study, DFKI, Saarbrucken, April 1992 (in German). [5] Waibel, A., Jain, A.N., McNair, A..E., Saito, H., Hauptmann, A., Tebelskis, J. (1991): JANUS - A Speech-to-Speech Translation System Using Connectionist and Symbolic Processing Strategies. In: Proc. of the 1991 Intern. Conf. on Acoustics, Speech, and Signal Processing, 1991, p.793 -796.