Dictionary Express: First Phases Rapid dictionary-making method for European, Asian and other languages

Autoři

KOVAŘÍK František BLAHUŠ Marek CUKR Michal JAKUBÍČEK Miloš KOVÁŘ Vojtěch

Rok publikování 2024
Druh Článek v odborném periodiku
Časopis / Zdroj AsiaLex 2024 Proceedings: Asian Lexicography - Merging cutting-edge and established approaches
Citace
www https://asialex2024.org/conference-program/
Klíčová slova corpus annotation, semi-automatic lexicography, Dictionary Express, dictionary drafting, post-editing lexicography
Popis Dictionary Express (DE) is a new methodology combining automatic tools for lexicography and manual checking (annotation) of words, their forms, usage etc. The main goal of the project is to accelerate dictionary making faster and less demanding by separating the process into simple tasks, as opposed to the traditional dictionaries made entry-by-entry. This means the non-automatic work can be done by a small team of native speakers who are not professional linguists, supervised by a smaller team of developers and lexicographers. The data is acquired from big corpora of current web language usage, which helps the dictionary to be more accurate and up to date with the current language trends. In the past, several "rapid dictionaries" have been created using this method. The time needed to complete a DE project depends on the quality of the tagging of the corpus and the amount of the weekly workload. A DE project for Czech is now in the making, and apart from creating a new Czech dictionary, it focuses on analysing the rapid dictionary-making process and the input/output data. In this paper, we present the main annotation tasks of the DE methodology, the data preparation, and some interesting phenomena that occurred during the first phases of the Czech Dictionary Express.

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.