Fragments and Text Categorization

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

BLAŤÁK Jan POPELÍNSKÝ Lubomír MRÁKOVÁ Eva

Year of publication 2004
Type Article in Proceedings
Conference The Companion Volume to the Proceedings of 42st Annual Meeting of the Association for Computational Linguistics
MU Faculty or unit

Faculty of Informatics

Citation
Field Informatics
Keywords text classification; fragments
Description We introduce two novel methods of text categorization in which documents are split into fragments. We conducted experiments on English, French and Czech. In all cases, the problems referred to a binary document classification. We find that both methods increase the accuracy of text categorization. For the Naive Bayes classifier this increase is significant.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.