FF:PLIN011 Collecting of corpus data - Course Information
PLIN011 Collecting of corpus data
Faculty of ArtsSpring 2013
- Extent and Intensity
- 0/2. 3 credit(s). Type of Completion: z (credit).
- Teacher(s)
- Mgr. Dana Hlaváčková, Ph.D. (seminar tutor)
- Guaranteed by
- doc. PhDr. Zdeňka Hladká, Dr.
Department of Czech Language – Faculty of Arts
Contact Person: Jaroslava Vybíralová
Supplier department: Department of Czech Language – Faculty of Arts - Timetable
- each odd Tuesday 12:30–14:05 G13
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
The capacity limit for the course is 20 student(s).
Current registration and enrolment status: enrolled: 0/20, only registered: 0/20, only registered with preference (fields directly associated with the programme): 0/20 - fields of study / plans the course is directly associated with
- Czech Language with Orientation on Computational Linguistics (programme FF, B-FI)
- Course objectives
- The aim of the course is make students acquainted with the manual and automated procedures for collecting data for the building written and spoken corpora. Attention is paid to choosing the language relevant material and prepare it for further computer processing. In the practice-oriented courses students learn to work with computer tools for building corpora (eg. Corpus Builder for written corpora), the result of their own work will be an electronic corpus. To prepare the spoken corpus, each student takes a digital recording of normal speech and its transcription under the specified rules with emphasis on text and audio synchronization (ELAN tool).
- Syllabus
- Main topics: Introduction to building language corpora Basic steps to create written corpora Basic steps in the development of spoken corpora Samples of written and spoken corpora Instructions for making recording and transcription Specifications of written corpus - the selection of appropriate texts with respect to focus Building of written corpora, problem solving Transcription of recordings - specific problems Checking results, evaluation of students' work in the course
- Literature
- BARONI, Marco, Adam KILGARRIFF, Jan POMIKÁLEK and Pavel RYCHLÝ. WebBootCat: a Web Tool for Instant Corpora. In Proceeding of the EuraLex Conference 2006. 1st ed. Italy: Edizioni dell'Orso s.r.l., 2006, p. 123-132, 9 pp. ISBN 88-7694-918-6. info
- Teaching methods
- Lectures, seminar discussions, practical demonstrations, work on the computer.
- Assessment methods
- Active participation in courses, final project - data processing for written and spoken corpus.
- Language of instruction
- Czech
- Further Comments
- Study Materials
- Enrolment Statistics (Spring 2013, recent)
- Permalink: https://is.muni.cz/course/phil/spring2013/PLIN011