Frequency of Low-Frequency Words in Text Corpora
Autoři | |
---|---|
Rok publikování | 2010 |
Druh | Článek ve sborníku |
Konference | Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2010 |
Fakulta / Pracoviště MU | |
Citace | |
www | https://nlp.fi.muni.cz/raslan/2010/paper15.pdf |
Obor | Jazykověda |
Klíčová slova | Computational linguistics Language model; Low-frequency; Text analysis; Text corpora |
Popis | Low-frequency words, esp. words occurring only once in a text corpus, are very popular in text analysis. Also many lexicographers draw attention to such words. This paper lists a detailed statistical analysis of low-frequency words. The results provides important information for many practical applications, including lexicography and language modeling. |
Související projekty: |