PLIN041 History of Computational Linguistics

Faculty of Arts
Spring 2017
Extent and Intensity
0/2/0. 4 credit(s). Type of Completion: graded credit.
Teacher(s)
Mgr. Dana Hlaváčková, Ph.D. (lecturer)
Guaranteed by
doc. PhDr. Zdeňka Hladká, Dr.
Department of Czech Language – Faculty of Arts
Contact Person: Jaroslava Vybíralová
Supplier department: Department of Czech Language – Faculty of Arts
Timetable
Wed 10:50–12:25 G13
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
The capacity limit for the course is 20 student(s).
Current registration and enrolment status: enrolled: 0/20, only registered: 0/20, only registered with preference (fields directly associated with the programme): 0/20
fields of study / plans the course is directly associated with
Course objectives
The subject offers the essential information about the history of the Computational Linguistics ]CL], which is devoted to the study of natural languages based on the algorithmical description of the particular language levels. It is a formal description of morphology, syntax, semantic and pragmatics in the form of the individual algorithms and their respective implementations. The approaches used in the area of CL either use systems of rules formulated as the particular algorithms and their implementations or the statistical and machine learning techniques. - The end of 1950s and beginning of 1960s is origin of the Computational Linguistics (CL), when in the U.S. and Soviet Union started experiments with Machine Translation (MT) between English and Russian (P. Toma) and Russian and French (O. Kulagina). CL contributed in its own way also to the development of the Artificial Intellingence (AI). The first experiments with the MT were not evaluated as sufficiently successful (ALPAC Report, 1966), thus the attention of the researches turned to the natural language processing (NLP) as to the general problem consisting in computer processing of the language data. In 1960s the development of the CL was paradoxically influenced by N. Chomsky (1963). His results in the area of the formal grammars, languages and automata hierarchy led to the attempts to create software applications describing the concrete language levels (phonology, morphology, syntax) and to their verifications. During 1960s it become obvious that the mentioned approaches had not provided the satisfactory results. This caused that the research paradigm started to change from introspective techniques to the empirical ones - the text corpora appeared (Brown Corpus,Francis, Kučera, 1961). In 1970s and 80s more attention to the text corpora and the tools for managing them was paid. Various software tools such for handling natural language appeared, in particular, spelling and grammar checkers, various sorts of electronical dictionaries and lexical databases. In 1990s large text corpora cam into standard use, e.g. BNC with approx. 100 mil. tokens, or ČNK, resp. SYN2000. After 2000 more attention in CL was paid to the statistical techniques exploiting machine learning. As an application working with very large text data the statistical MT (SMP) has appeared, which has allowed to obtain better results than existing MT systems (Google Translator, 2007). Actually (2010 ...) the research in the CL is concentrating on the improving results in the area of morphological and syntactic tagging. In semantics the intensive investigations are related to the Word Sense Disambiguation (WSD) and semantic tagging as well. The hot topic is also modelling emotions with techniques of the affective computing. The subject displays an interdisciplinary nature, it links linguistic and computational approaches and falls under Artificial Intelligence and Cognitive Science. At the end of the course students should be able to: understand and explain the methodological development in the CL and to make deductions based on acquired knowledge of CL.
Syllabus
  • 1950s and beginning of 1960s - origin of the Computational Linguistics (CL), formal description of the language levels - morphology, syntax, semantics, pragmatics in the form of algorithms. In the course of the 1960s - start of the text corpora, their types, corpus tools, tagging, disambiguation. Rule-based and statistical approaches. 1970-80s - research in morphological structures, morphological algorithms and analyzers, syntactic analyzers. 1990s -semantic (lexical) analysis, machine readable electronic dictionaries, lexical databases (WordNet, EuroWordNet, thesauri), tools for handling lexical resources. More research in the area of the corpora. 2000-10s - attention to the semantic analysis of the sentence based on TIL (NTA). 2000-10s - tools for anaphora recognition and coreference. Software tools for language support - spelling and grammar checkers, translation programs. 2000... - dialogue systems - human-computer communication. Tools for knowledge representation in computers In general - explanation includes CL in Czech and international context.
Literature
  • The Oxford handbook of computational linguistics. Edited by Ruslan Mitkov. Oxford: Oxford University Press, 2003, xx, 784. ISBN 0198238827. info
  • HAJIČOVÁ, Eva, Jarmila PANEVOVÁ and Petr SGALL. Úvod do teoretické a počítačové lingvistiky. Praha: Karolinum, 2002, 156 s. ISBN 8024604701. info
  • CHOMSKY, Noam. Syntaktické struktury : logický základ teorie jazyka : o pojmu "gramatické pravidlo". Vyd. 1. Praha: Academia, 1966, 209 s. URL info
Teaching methods
Teaching is performed in the form of oral lectures and seminars, in which the slides and demos of the relevant software tools are combined. Students work out homeworks, prepare presentations based on the literature they had read and develop smaller projects. At the appropriate points of the teaching the open dialog between a teacher and students is used.
Assessment methods
- a dialogue about the selected topic (in the case of need) - presentation of the read literature (with slides), i.e. papers from journals, conference proceedings and also chapters from the relevant book publications. - small projects with some programming in Prolog
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Autumn 2013, Spring 2014, Autumn 2014, Spring 2015, Spring 2016, Spring 2018, Spring 2019, Autumn 2019, Autumn 2020, Autumn 2021, Autumn 2022, Autumn 2023, Autumn 2024.
  • Enrolment Statistics (Spring 2017, recent)
  • Permalink: https://is.muni.cz/course/phil/spring2017/PLIN041