Automatic Adaptation of Author's Stylometric Features to Document Types

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	RYGL Jan
Year of publication	2014
Type	Article in Proceedings
Conference	Text, Speech, and Dialogue - 17th International Conference
MU Faculty or unit	Faculty of Informatics
Citation
web	http://www.tsdconference.org/tsd2014/download/preprints/575.pdf
Doi	http://dx.doi.org/10.1007/978-3-319-10816-2_7
Field	Informatics
Keywords	authorship verification; feature selection; machine learning; stylome; stylometric features
Description	Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: books, blogs, discussions, comments and tweets. A method of an automatic selection of authors’ stylometric features using a double-layer machine learning is proposed and evaluated. Experiments are conducted on ten disjunct train and test sets and a method of an efficient training of large number of machine learning models is introduced (163,700 models were trained).
Related projects:	Analýza přirozeného jazyka v prostředí internetu