CAN COMPUTERS UNDERSTAND L2 CZECH? RICHARD HOLAJ Department of Czech Language, Faculty of Arts, Masaryk University, Brno, Czech Republic WHY WE CARE? PRONUNCIATION MATTERS E-LEARNING IS AN IMPORTANT TOPIC WHAT WE HAVE NOW? AUDIO IN VOCABULARY DUOLINGO-LIKE APPROACH SPECIALIZED APPS CAN WE DO BETTER? AUTOMATIC SPEECH RECOGNITION + “INCORRECT SOUNDS” HOW TO ACHIEVE IT? COLLECT DATA ANNOTATE CREATE NEURAL NETWORK TRAIN MODEL EVALUATE COLLECT DATA AUDIO RECORDINGS NON-NATIVE CZECH SPEAKERS IDENTIFY “INCORRECT SOUNDS” ANNOTATE INTENDED SOUND PRONOUNCED SOUND ANNOTATION a a a ɛ a::e / e ã: a:k1vN / á:vN CREATE NEURAL NETWORK 1 Oliver Adams, Trevor Cohn, Graham Neubig, Hilaria Cruz, Steven Bird, et al.. Evaluating phonemic transcription of low-resource tonal languages for language documentation. LREC 2018 (Language Resources and Evaluation Conference), May 2018, Miyazaki, Japan. pp.3356-3365. ffhalshs-01709648v4f PERSEPHONE1 >3700 INDIVIDUAL SOUNDS MANUALLY LABELED TRAIN MODEL EVALUATE ERROR RATE TRAINING VALIDATION TEST MODEL AV1 43 % 42 % 51 % MODEL AV2 15 % 37 % 41 % DETAILED EVALUATION EXPECTED LABEL OUTPUT LABEL EXPECTED LABEL OUTPUT LABEL s z u:kD u m n au a:kD t:vA t:vT eu e z::dz dz au::a a WHAT NEXT? WHOLE WORDS ADJUST LABEL DICTIONARY FOCUS ON MOST RELEVANT SPEAKERS ANOPHONE