Introduction PA154 Language Modeling (1.1) Pavel Rychly pary@fi.muni.cz February 16, 2023 PA154 - Technical Informations ■ Slides in IS https://is.muni.cz/auth/el/fi/jaro2023/PA154/ ■ Final written exam (online) 50 points, 25 points for E ■ optional individual projects up to 25 points Pavel Rychlý • Introduction • February 16, 2023 2/9 Individual projects ■ presentation on a new research in Language modeling ■ small project as a part of bigger collaborative projects ■ neural machine translation ■ lexical acquisition ■ small task ■ describe errors in ChatGPT ■ annotation of a langauge resource Pavel Rychlý • Introduction • February 16, 2023 3/9 Language model ■ model ■ (mathematical) abstractions ■ similar/same behavior of modeled object ■ Language model ■ model a natural language Pavel Rychlý • Introduction • February 16, 2023 4/9 Language models-what are they good for? ■ assigning scores to sequencies of words ■ predicting words ■ generating text ■ statistical machine translation ■ automatic speech recognition ■ optical character recognition Pavel Rychlý • Introduction • February 16, 2023 5/9 Predicting words Do you speak... Would you be so ... Statistical machine ... Faculty of Informatics, Masaryk... WWII has ended in ... In the town where I was ... Lord of the ... Pavel Rychlý • Introduction • February 16, 2023 6/9 Generating text Describes without errors Describes with minor errors Somewhat related to the image Unrelated to the image A person riding a motorcycle on a dirt road. Two dogs play in the grass. A group of young people playing a game of frisbee. Two hockey players are fighting over the puck. A herd of elephants walking across a dry grass field. A close up of a cat laying on a couch. A skateboarder does a trick on a ramp. MBB A little girl in a pink hat is blowing bubbles. A red motorcycle parked on the side of the road. A dog is jumping to catch a frisbee. A refrigerator filled with lots of food and drinks. A yellow school bus parked in a parking lot. Pavel Rychlý • Introduction • February 16, 2023 7/9 MT + OCR Pavel Rychlý • Introduction • February 16, 2023 8/9 Language models - probability of a sentence ■ LM is a probability distribution over all possible word sequences. ■ What is the probability of utterance of s? Probability of sentence P/_M(Catalonia President urges protests) /^(President Catalonia urges protests) P/_M(urges Catalonia protests President) Ideally, the probability should strongly correlate with fluency and intelligibility of a word sequence. Pavel Rychlý • Introduction • February 16, 2023 9/9