Aplikace počítačové lingvistiky při tvorbě on-line kurzu češtiny mluvtecesky.net

This publication doesn't include Faculty of Arts. It includes Language Centre. Official publication website can be found on muni.cz.

Title in English Application of computer linguistics in the creation of the mluvtecesky.net on-line Czech language course


MU Faculty or unit

Language Centre

Description The paper demonstrates the capabilities of computer linguistics in the production of Czech language courses for foreigners. The on-line course mluvtecesky.net has been developed within the CZKey project, whose parts relying on computational linguistics have been contributed by Masaryk University Language Centre and the same university's Natural Language Processing Centre at the Faculty of Informatics. Through lemmatizing the corpus of the created courses texts, a frequency dictionary has been derived, which could be used to estimate the course's size and vocabulary coverage, for production of flashcards and also to reveal some typos and ensure a unified terminology. The word list has been translated into other languages to provide the student with a rough idea of the word's meaning and each word form present in the course text has been linked to a lemma entry in a morphological database that contains full declinations/conjugations. In this way, the student can click on any word in the course and immediately see the corresponding lemma, its translation into his language and a table of all the word's forms. Finally, an attempt has been done at finding a compromise solution that combines the usefulness of a rather limited set of declension/conjugation paradigms with the versatility of a complex machine-derived system that links lemmas to paradigms while not allowing any exceptions. The outcome is a middle-sized set of paradigms that, once learned, can be applied to remember the forms of all the words in the course by merely taking note of the paradigm and any indicated exceptions.