PLIN011 Collecting of corpus data

Faculty of Arts
Spring 2012
Extent and Intensity
0/2. 3 credit(s). Type of Completion: z (credit).
Teacher(s)
Mgr. Dana Hlaváčková, Ph.D. (seminar tutor)
Guaranteed by
doc. PhDr. Zdeňka Hladká, Dr.
Department of Czech Language – Faculty of Arts
Contact Person: Jaroslava Vybíralová
Supplier department: Department of Czech Language – Faculty of Arts
Timetable
each even Tuesday 10:50–12:25 L11
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
The capacity limit for the course is 20 student(s).
Current registration and enrolment status: enrolled: 0/20, only registered: 0/20, only registered with preference (fields directly associated with the programme): 0/20
fields of study / plans the course is directly associated with
Course objectives
The aim of the course is make students acquainted with the manual and automated procedures for collecting data for the building written and spoken corpora. Attention is paid to choosing the language relevant material and prepare it for further computer processing. In the practice-oriented courses students learn to work with computer tools for building corpora (eg. Corpus Builder for written corpora), the result of their own work will be an electronic corpus. To prepare the spoken corpus, each student takes a digital recording of normal speech and its transcription under the specified rules with emphasis on text and audio synchronization (Transcriber tool).
Syllabus
  • Main topics: Introduction to building language corpora Basic steps to create written corpora Basic steps in the development of spoken corpora Samples of written and spoken corpora Instructions for making recording and transcription Specifications of written corpus - the selection of appropriate texts with respect to focus Building of written corpora, problem solving Transcription of recordings - specific problems Checking results, evaluation of students' work in the course
Literature
  • BARONI, Marco, Adam KILGARRIFF, Jan POMIKÁLEK and Pavel RYCHLÝ. WebBootCat: a Web Tool for Instant Corpora. In Proceeding of the EuraLex Conference 2006. 1st ed. Italy: Edizioni dell'Orso s.r.l., 2006, p. 123-132, 9 pp. ISBN 88-7694-918-6. info
Teaching methods
Lectures, seminar discussions, practical demonstrations, work on the computer.
Assessment methods
Active participation in courses, final project - data processing for written and spoken corpus.
Language of instruction
Czech
Further Comments
Study Materials
The course is also listed under the following terms Spring 2011, Spring 2013, Spring 2018, Spring 2020, Spring 2021, Autumn 2021, Spring 2025.
  • Enrolment Statistics (Spring 2012, recent)
  • Permalink: https://is.muni.cz/course/phil/spring2012/PLIN011