Přegenerování a podgenerování : Jak efektivně vyhledávat v jazykových korpusech data pro lingvistický výzkum
Title in English | Over/under Generating : How to Search Data for Linguistic Analysis in Language Corpora |
---|---|
Authors | |
Year of publication | 2024 |
Type | Requested lectures |
MU Faculty or unit | |
Citation | |
Description | In this talk, we will show, how to minimize the overgeneration (to increase accuracy) and to prevent undergeneration (to maintain coverage) in corpus-based word formation research. On a specific example of retrieval of candidates for a word formation model (kutil) we shall show how to use observation of corpus data for progressive specification of corpus query. The data obtained from the corpus will be analysed from a quantitative and qualitative point of view. Next, we show to what extent homonymy of nouns formed by conversion of l-participles has a negative effect on the results of POS disambiguation. |
Related projects: |