Přegenerování a podgenerování : Jak efektivně vyhledávat v jazykových korpusech data pro lingvistický výzkum

Title in English	Over/under Generating : How to Search Data for Linguistic Analysis in Language Corpora
Authors	OSOLSOBĚ Klára
Year of publication	2024
Type	Requested lectures
MU Faculty or unit	Faculty of Arts
Citation
Description	In this talk, we will show, how to minimize the overgeneration (to increase accuracy) and to prevent undergeneration (to maintain coverage) in corpus-based word formation research. On a specific example of retrieval of candidates for a word formation model (kutil) we shall show how to use observation of corpus data for progressive specification of corpus query. The data obtained from the corpus will be analysed from a quantitative and qualitative point of view. Next, we show to what extent homonymy of nouns formed by conversion of l-participles has a negative effect on the results of POS disambiguation.
Related projects:	Lexikon a gramatika češtiny IV - 2024