Computing Idioms Frequency in Text Corpora

Bušta,  Jan

Computing Idioms Frequency in Text Corpora

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	BUŠTA Jan
Year of publication	2008
Type	Article in Proceedings
Conference	Proceedings of Recent Advances in Slavonic Natural Language Processing 2008
MU Faculty or unit	Faculty of Informatics
Citation
Web	https://nlp.fi.muni.cz/raslan/2008/papers/12.pdf
Field	Linguistics
Keywords	frequency of idioms; headwords; text corpora; czech language
Description	The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language. The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language.
Related projects:	Centrum komputační lingvistiky Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce