XML Applications

W3C Voice Browser Activity

Standards for Voice and Dialogue applications
- VoiceXML
- SRGS
- SISR
- SSML
- PLS
- Call Control XML
- State Chart XML
- …
W3C Recommendations and Working Drafts

VoiceXML

Language for dialogue applications development.
Specification
Primary targeted to phone applications.
- telephone support automation
- railways/bus schedules information
- ticket reservation
- …
Describes algorithm for dialogue flow control (dialogue strategy)
Alternativelly can be described by finite state automaton with output (Meally automatom)
- SCXML
W3C standard W3C (present version 2.1, version 3.0 in state of Working Draft)

VoiceXML - processing

Application needs to be run on VoiceXML platform or using VoiceXML interpreter.
- desktop platforms - OptimTalk, publicVoiceXML, JVoiceXML
- opensource on-line - Asterisk+VoiceGlue, Asterisk+OpenVXI
- on-line commercial:
  - Bevocal Cafe
  - Voxeo Prophecy
- VoiceXML forms in XHTML documents
  - using namespaces (formerly W3C submission XHTML+Voice profile 1.0)
  - Support in Opera a Firefox web browsers.
- …

VoiceXML - example

Figure: VoiceXML example

 <?xml version="1.0" encoding="UTF-8"?>
 <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
  <form id="pizza-mixed">
   <grammar src="pizza.grxml"/>
   <initial name="pizzaall">
    <prompt>Welcome to FI pizzeria</prompt>
    <nomatch count="2"><assign name="pizzaall" expr="true"/></nomatch>
    <noinput count="2"><assign name="pizzaall" expr="true"/></noinput>
   </initial>
   <field name="kind">
    <prompt>What kind of pizza do you want?</prompt>
    <nomatch>We have salami, mozzarela and appolo pizza</nomatch>
    <noinput>We have salami, mozzarela and appolo pizza</noinput>
    <grammar src="pizza.grxml#kind"/>
   </field>
   <field name="topping">
    <prompt>What topping do you want?</prompt>
    <nomatch>We offer ketchup and chilli.</nomatch>
    <noinput>We offer ketchup and chilli.</noinput>
    <grammar src="pizza.grxml#topping"/>
   </field>
  <field name="drink">
    <prompt>What do you want to drink?</prompt>
    <nomatch>Select one of coke, sprite and watter</nomatch>
    <noinput>Select one of coke, sprite and watter</noinput>
    <grammar src="pizza.grxml#drink"/>
   </field>
   <field name="ack">
    <prompt>Did you ordered <value expr="kind"/> pizza with <value
    expr="topping"/> and <value expr="drink"/>?</prompt>
    <grammar src="yesno.grxml"/>
   </field>
   <filled>
    <if cond="ack=='yes'">
         <prompt>Order submited</prompt>
    <else/>
         <clear namelist="kind topping drink ack"/>
    </if>
   </filled>
  </form>
 </vxml>

SRGS (Speech Recognition Grammar Specification)

Standard for description of context free grammars.
- describes the accepted inputs of particular VoiceXML fields
Specification
Part of W3C Voice Browser Activity standards
Present version 1.0
SRGS - motivation
- User’s voice input needs to be recognized - continues speech recognition.
- success rate 50-99 %
Possibilities how to improve success rate:
- improve the language model
- problem domain restriction
- improve the user model
Problem domain restriction + language model improvement = SRGS.

SRGS - example

Figure: SRGS grammar referenced in the previous VoiceXML example (pizza.grxml)

 <?xml version="1.0" encoding="UTF-8"?>
 <grammar root="mixed" xml:lang="en_US">
  <rule id="mixed">
    <item><ruleref special="GARBAGE"/> <ruleref uri="#kind"/> pizza <ruleref special="GARBAGE"/> <ruleref uri="#topping"/> and <ruleref uri="#drink"/>
    </item>
    <tag>
     {
       out.kind=rules.kind;
       out.topping=rules.topping;
       out.drink=rules.drink;
     }
    </tag>
  </rule>

  <rule id="kind">
   <one-of>
    <item>salami</item>
    <item>mozzarela</item>
    <item>polo</item>
   </one-of>
  </rule>

 ...

 </grammar>

SISR (Semantic Interpretation for Speech Recognition)

Purpose:
- What is the meaning of recognized input?
Language for derivation of the recognised inputs semantic.
Based on ECMAScript.
Used in speech recognition grammars (see previous slide).
SISR 1.0 Specification

SSML (Speech Synthesis Markup Language)

link: Speech Synthesis Markup Language
W3C Standard
present version 1.1 (September 2010)
Used to describe prosody characteristics of synthesised speech.
loudness
prosody
emphasis
speech rate
voice kind (male, female, neutral)
…
Contains markup for description of pronunciation of foreign words.
- IPA (International Phonetic Alphabet) can be utilized.

SSML - example of loudness and breaks

Figure: SSML Breaks and loudness control example

 <?xml version="1.0" encoding="utf-8"?>
 <speak version='1.1" xmlns="http://www.w3.org/2001/10/synthesis"
                      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                      xsi:schemaLocation="http://www.w3.org/TR/speech-synthesis11/synthesis.xsd>
  <prosody volume="loud">
   Dobre rano. <break />
  <prosody>
  <prosody volume="default">
   Jak se mate?
  </prosody>
 </speak>

SSML - example of intonation modeling

Figure: SSML Intonation modeling

 <speak ...>
  <prosody contour="(0%,50Hz) (75%, +10%) (80%, +20%) (90%,+30%)">
   Mas se dobre?
  </prosody>
 </speak>

PLS (Pronunciation Lexicon Specification)

Pronunciation Lexicon Specification
- W3C standard
- Actual version - 1.0, October 2008
Developed for description of pronunciation of words, abbreviations, etc.
Used for:
- Speech synthesis (SSML) - pronunciation of
  - foreign words
  - abbreviations
  - number values
  - …
- Speech recognition (SRGS) - PLS allows to describe different pronunciations of some words (needed to be correctly recognized).

PLS Structure

Root element - lexicon
- contains one or more lexicon entries - lexeme element
  - contains:
    
    one or more word notations - grapheme element
    
    one or more word pronunciation - phoneme element
    
    pronunciation may be written using IPA, SAMPA, etc

PLS - example

Figure: PLS pronunciation example

 <?xml version="1.0" encoding="utf-8"?>
 <lexicon version="1.0"
       xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
       alphabet="sampa" xml:lang="cs-CZ">
  <lexeme>
   <grapheme>CSR</grapheme>
   <phoneme>tSe: es er</phoneme>
   <phoneme>tSeska: republika</phoneme>
  </lexeme>
 </lexicon>