Dialogue systems Luděk Bártek Postprocessing Prosody Dialogue systems SABLE Luděk Bártek Laboratory of Searching and Dialogue, Fakulty of Informatics, Masaryk University, Brno spring 2022 Speech Synthesis - post-processing Dialogue systems Ludek Bartek Postprocessing Prosody ■ Post-processing objectives - make the synthesized speech more natural, enrich the speech by: Speech Synthesis Description Standards ■ intonation ■ accents (sentence, word) ■ emphasis SABLE SSML ■ brakes. ■ Tools - modification: ■ Fq eventually another formats ■ local modification of a sentence melody ■ intensity - amplitude. Prosody Introduction Dialogue systems Luděk Bártek Postprocessing Prosody SABLE The speech synthesis output is monotone speech without intonation and accents - sounds unnaturally (robotic voice). Solution - adding prosody. Basic prosodic factors: ■ speech pitch ■ loudness ■ duration time. The basic element of prosody is syllable. Prosody depends on the sentence type: ■ declarative, interrogative declarative, imperative sentence -falling intonation ■ interrogative complement sentence (answer yes/no) -rising intonation. Prosody modelling - Fq modulation. Sentence Intonation Examples Dialogue systems Ludek Bartek Postprocessing Prosody Speech Synthesis Description Standards SABLE ■ Speech without intonation ■ Declarative sentence SSML ■ Interrogative complement sentence 1 <\(y The Pitch of the Fundamental Tone Dialogue systems Luděk Bártek Postprocessing Prosody SABLE The pitch of the fundamental tone corresponds to the Fq formant. The Fq progression on vocalic kernel is non-linear. The intonation change is not just the change of the Fq -you must modify the higher formants as well. Based on the importance of Fq, languages are divided to: ■ tone-based languages (Chinese, Vietnamese, .. .) - Chinese word -ma- in dependence on the the Fq course may mean: ■ cannabis(JS) ■ horse ■ mother (MM) ■ melodic accent languages (Serbian, Slovenian, Lithuanian, Norwegian, Swedish, . ..) Prosody Another Prosodic Properties Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Intensity (loudness): ■ Physical point of view - the signal intensity at a given moment in time ■ Physiological point of view - Corti's apparatus response to the perceived sound ■ Previously mentioned views differs: ■ the subjective perception of sound does not correspond even in the first approximation to the physical intensity of the signal Duration time: ■ The syllable duration may differ in different contexts. ■ The small differences may be even in the same context. ■ Typical syllable duration is 50 — 200 milliseconds. Prosody Next prosody properties Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Quality of voice ■ jitter - voice vibration ■ irregular Fq amplitude deflection (shimmer) ■ voice timber ■ hoarseness ■ degree of sonority ■ . .. The speed of speech ■ Can be understood as inverted value of an average syllable length. ■ Can be measured also another way: ■ the number of spoken text characters per time unit (speech synthesizers evaluation). Prosody Next prosody properties Dialogue systems Ludek Bartek ■ Break Postprocessing Prosody ■ tacit ■ filled - contains some characteristic sound: Speech Synthesis Description Standards SABLE SSML ■ eeh ■ aa ■ ee ■ ... ■ Hesitation ■ It directly speaks of the speech pragmatics. ■ It may be important for dialogue system dialogue strategy modification. ■ A typical case of information contained mainly in the prosodic layer of language. Prosody Basic Derived Prosody properties Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Rhythm ■ Prosodic factor derived from the duration of ■ syllable ■ breaks on a time interval. Word Stress ■ derived from all basic prosody attributes ■ depends significantly on used language: ■ position of an accent in a word/stress unit ■ the amount of prosody factors used to express it -especially the amount of loudness versus the pitch. Sentence accent(intonation centre) ■ simply it's about prosodic amplification of the core of the sentence statement. Prosody Basic Derived Prosody Properties Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Intonation ■ generally - voice spectrum time line ■ the most important for speech melody is the basic voice frequency ■ the basic voice frequency time line ■ can be presented as a time graph of frequency ■ Related terminology: ■ melody - Fo contour ■ cadence - determined by emphasis for example, . .. ■ intonation cadence ■ melody - basic of melodic progress based on its grammar function. ■ Fo progress Prosody Basic Derived Prosody Properties Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Emotional colour of the voice ■ It is manifested by rapid changes in volume and base frequency ■ It often goes beyond sentence boundaries. ■ When Dialogue System can detected it, it allows to select suitable dialogue strategy. Emphatic accent ■ Created by emotive voice colour. ■ Presented in sentences spoken in situations with strong emotional context: That's really unheard ofl. It hurts like hell\ Prosody Basic derived prosody properties Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Contrasting accent - effort to emphasis a word or a syllable in contrast to another word or syllable: "I said to Sakvice not Rakvice." " Byte not bit. " Prosody Basic derived prosody properties Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Repeating ■ Prosody attribute strongly related to speaker. ■ The repeating is a variant of filler parts of speech ■ speaker doesn't realize it often ■ do not swap it with stutter (speech defect). Filler parts ■ besides the filler function can be characteristic of: ■ speaker style: "You were at the party yesterday, huhlT ■ Dialect or slang: " Man, that party last night was a blast, man?"?" Prosody Basic derived prosody properties Dialogue systems Ludek Bartek Postprocessing Prosody ■ Break ■ A frequent occurrence in spoken language: Speech Synthesis Description Standards SABLE ■ higher whole (utterance/speech, sentence, prosody phrase, ...) ■ inside words. SSML ■ Related to next prosody elements: ■ hesitation ■ repeating ■ filled break ■ . .. Basic derived prosody properties Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Part of the speech corrections ■ frequent phenomenon related to different parts. ■ May be caused: ■ the consequence of reneging ■ a part of the speech clarification ■ previous part of the speech correction. ■ Frequently followed by either break or another prosodic phenomena. Prosody Speech Prosodic Segments Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Speech. Prosodic phrase ■ Group of words forming a uniform intonation unit. ■ Represents the basic, from prosodic view compact, structure. ■ The division into prosodic phrases is related to syntactic structure corresponding sentence often. Accented beat ■ Group of syllables subordinated to one word accent. ■ It is either a word or a word and one syllable word in Czech typically. Syllable. Speech Synthesis Description Standards Dialogue systems Luděk Bártek Postprocessing Prosody SABLE Effort to unify speech synthesis description languages for speech synthesisers. Define the mark-up describing: ■ prosody - speech rate, Fq, part of the speech emphasis, break, volume, . .. ■ speaker - sex, age, ... Used Standards: ■ SABLE ■ SSML SABLE Dialogue systems Ludek Bartek Post- ■ Open standard for prosodic mark-up of a text. processing Prosody ■ Development started on 2nd half of 90th years Speech Synthesis Description Standards SABLE SSML ■ ■ XML/SGML application effort to unify three speech synthesis mark-up languages: ■ SSML - Speech Synthesis Mark-up Language (W3C, 1999). ■ STML - Spoken Text Mark-up Language(CSTR Edinburgh University, Lucent Technologies, 1997) ■ JSML - Java Synthesis Mark-up Language (Sun Microsystems, 2000) SABLE Fundamental Mark-up Dialogue systems Luděk Bártek Postprocessing Prosody SABLE SABLE - the root tag DIV ■ Used for division of a document into paragraphs and sentences. ■ Kind of a document part type is described by attribute type.