Automatic Adaptation of Author's Stylometric Features to Document Types

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors

RYGL Jan

Type Article in Proceedings
Conference Text, Speech, and Dialogue - 17th International Conference
MU Faculty or unit

Faculty of Informatics

Citation
WWW http://www.tsdconference.org/tsd2014/download/preprints/575.pdf
Doi http://dx.doi.org/10.1007/978-3-319-10816-2_7
Field Informatics
Keywords authorship verification; feature selection; machine learning; stylome; stylometric features
Description Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: books, blogs, discussions, comments and tweets. A method of an automatic selection of authors’ stylometric features using a double-layer machine learning is proposed and evaluated. Experiments are conducted on ten disjunct train and test sets and a method of an efficient training of large number of machine learning models is introduced (163,700 models were trained).
Related projects: