MUNI FI Dasher Dasher - Character LM PA154 Language Modeling (3.1) Pavel Rychlý pary@fi.muni.cz March 2, 2023 authors: David MacKay, David Ward Cambridge University; freeware support for highly efficient text input for using means other than a standard computer keyboard alternative for thousands of people with various physical disabilities on-screen text input using a positioning device (mouse, joystick...) uses a probabilistic predictive language model is still under development (technology remains the same) Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a About Dasher Areas of use Dasher is free open-source software GNU Generel Public License alphabet for more than 150 languages font colour setting system learns and offers letter combinations that are more used assistive technology (disabilities - without hands, with one hand...) Pocket PC, iOS, Android, Linux, macOS, Microsoft Windows complex languages (e.g. Japanese) latest version 5.0.0 (beta) from April 8, 2016 Davel Rychlý ■ Dasher - Character LM ■ March 2, 2022 Davel Rychlý ■ Dasher - Character LM ■ March 2, 2022 Principle File Edit Options Control Prediction Help a □ c to □ I a em be □ lost w To be or not to be letters in alphabetical order, each letter is in a rectangle The rectangle with the selected letter contains again the complete alphabet from which the 2nd symbol can be selected, etc. basic idea: more probable letters are in a larger rectangle rectangle sizes are decided on the basis of the language model Inverse" arithmetic coding arithmetic coding (text compression): codeword is a number from the interval (0,1), by successive encoding of symbols the intervals are refined in the ratio of the probabilities of occurrence of a character lossless data compression method in Dasher, the ypsilon coordinate represents the entire interval (0,1), where each alphabet symbol has an associated segment of length corresponding to the probability of its occurrence in a given context Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a 5/3a Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a b/3a Arithmetic coding - example for four-symbol model codeword is a number from the interval [0,1) 60 % for symbol NEUTRAL; interval is [0, 0.6) 20 % for symbol POSITIVE; interval is [0.6, 0.8) 10 % for symbol NEGATIVE; interval is [0.8, 0.9) 10 % for symbol END-OF-DATA; interval is [0.9, 1) symbol in END-OF-DATA section means that decoding is complete Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Arithmetic decoding 06 0.8 0.9 1 036 0.48 0.54 0.6 .523 0.5:S4 0.5' Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a message is encoded as the number 0.538 encoder with interval [0,1) is divided into four subintervals; the message is in the NEUTRAL section interval [0, 0.06) is divided into four subintervals; the message is in the NEGATIVE section interval [0.48, 0.54) is divided into four subintervals; the message is in the END-OF-DATA section8'33 Encoding the message "WIKI" by arithmetic coding Encoding of message "WIKI" by arithmetic coding continued Each symbol has its probability in the interval [0, 1) The number of message symbols or terminal symbol must be known interval is represented in binary system 1. 2. w 3. I 2 W I interval "W" is [0, 0.01) interval "I" is [0.01,0.11) interval "K" is [0.11,1) 6. [.0010101, .0010111) - .001011 Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Encoding of message "WIKI" by arithmetic coding continued PPM (Prediction by Partial Match) 1. 2. W 3. I w:i "it ~qp I :2 W h 1 .w=P 6. [.0010101, .0010111) - .001011 encodes "W" [0, 0.1) first followed by "I" [0.001, 0.0011) then the "K" is [0.00101, 0.0011) and finally "I" [0.0010101, 0.0010111) the result is a number from the final interval The language model used in Dasher is not limited to the concept of words combines information about n-grams with probabilities of occurrence of each symbol from the dictionary context 4-5 symbols Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a 11 / 3a Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a 12/3a PPM - 3 modes Language Model (3) Standard letter-based PPM (calculates probability by partial matching) Word-based model (word dictionary with frequencies) Mixture model (PPM/dictionary) Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Other features import of training data simply by loading the file data source for Czech: Institute of the Czech National Corpus, Faculty of Arts, Charles University any alphabet: e.g. also LaTeX, C, IPA other software - 2 modes: normal typing and word completion (user has to switch between them) Dasher has one mode which combines both Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Mouse, touchpad, touchscreen File Edit Options Help Dasher is great| writing speed using mouse: after 10 minutes of training 5-15 words/min., after an hour: 15-25 words/min., experienced users: 40 words per minute (as fast as typing by hand using a keyboard) sample of Dasher video: ipaq language model learns over time (learns new user's expressions or phrases) everything we write is automatically saved to a file as additional training data Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Input method types ■ computer mouse ■ touchpad ■ touchscreen ■ eyetracker ■ headmouse ■ trackball ■ trackpad ■ breath ■ buttons ■ tilt sensors Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Eyetracker camera + sensors that detect where the user is looking on the screen initial price: 2000 -4000 USD Davel Rychlý ■ Dasher - Character LM ■ March 2, 2022 17/32 Davel Rychlý ■ Dasher - Character LM ■ March 2, 2022 18/32 Eyetracker Eye Dasher Tobii Eye Tracker 5 price: 229 EUR Engineered for gaming also built in (gaming) laptops input speed: after ten minutes of training 7 words/minute, after an hour: 20 words/minute, experienced users: 30 words/minute eyetracking without Dasher, only with virtual (on-screen) keyboard: 15 words/min., error-rate 5x higher Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Eye Dasher - User friendliness Headmouse input using the virtual (on-screen) keyboard is discrete (waiting for the timer to expire, or blinking) Dasher provides continuous input video: eye_dasher IR camera reflexive points price: 500-1500 USD Davel Rychlý ■ Dasher - Character LM ■ March 2, 2022 Davel Rychlý ■ Dasher - Character LM ■ March 2, 2022 Breath Dasher Button Dasher m direct relation between lung volume and ypsilon coordinate value one-dimensional (cannto go back) therefore: Control mode Control area (Stop, Pause, Move, Delete) video: breath_dasher 3 directions forward up ■ forward down ■ back Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a 23/3a Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a 24/ 3a Dasher vs. speech recognition Speech Dasher: Efficient speech recognition correction inapplicability of automatic speech recognition systems in noisy environments even with the best recognizers about 5 % error rate (difficult editing of errors) Step 1: text input using a combination of speech and navigation via pointing device (mouse) Step 2: Speech recognizer makes an initial estimate of the text, the user edits or confirms the output initial error rate of 22 %, users usually fix everything faster than repair using separate speech recognition (special commands) faster than a standalone Dasher video: speech_dasher Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Other options - Swype Swype (2) virtual keyboard for smartphones and tablets developed by Nuance Communications typing by continuous stroke on QWERTY/QWERTZ/AZERTY/National keys word guessing using predictive dictionary (we can add our own words) more accuracy for longer words (short ones usually have more possibilities to interpret the stroke on the screen) typing without diacritics, offered variants with diacritics typing speed up to over 50 words/min. handles simple punctuation (even smileys) app is able to learn from Facebook, Gmail, Twitter... also available in Czech possibility to dictate in different languages using Dragon Dictation module (also in Czech) videohttp : / /www. youtube ■ com/watch?v=S J-RAef CG_c Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Other options -SwiftKey SwiftKey (2) free for Android, iOS, iPhone learns using previous text communication (SMS, Gmail, texts in RSS, it also adapts to letters that you repeatedly press slightly off) multiple languages (up to 5 simultaneous) typo correction next word prediction (offers the most likely variants of the following words) 800 emoji Emoji Prediction feature - learns to predict relevant emoji quality dictionaries (correspond to trends in communication) can be typed in Swype style English dictation functions can be turned on June 2012 release of SwiftKey Healthcare; prediction based on real clinical data April 2016 release of ShakeSpeak; emulating W. Shakespeare's speech to celebrate the 400th anniversary of his death year 2016 Microsoft buyout of SwiftKey video:http : / /www. youtube . com/watch?v=kA5Horw_SOE Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a 29/3a Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a 30/3a Other options - SlidelT Other options - GO Keyboard similar to Swype keyboard - typing by dragging between characters lower requirements for typing accuracy quality dictionaries (possibility to install more, including Czech) more than 70 language sets keyboard customisation option calculates the variants of the words the user wanted to type autocompletion of spaces and capital letters video:http : / /www . youtube . com/watch?v=Tp_7bWuvQwQ Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Other options Perfect keyboard Touch Pal keyboard Google keyboard Siine Shortcut keyboard Adaptxt keyboard ShapeWriter keyboard prediction in many languages possibility to change skins and backgrounds ability to import names and SMS into the dictionary support for Swype style text input detected a security issue in 2017; the app was sending user information back to China (language, location, network type, ...), more than 200 million users affected video:http : / /www . youtube . com/watch?v=XQRRvSwpmWc Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a Davel Rychlý ■ Dasher - Character LM ■ March 2, 202a 33/3a