Automatic Adaptation of Author's Stylometric Features to Document Types
Název česky | Automatická adaptace stylometrických rysů autora podle typu dokumentů |
---|---|
Autoři | |
Rok publikování | 2014 |
Druh | Článek ve sborníku |
Konference | Text, Speech, and Dialogue - 17th International Conference |
Fakulta / Pracoviště MU | |
Citace | |
www | http://www.tsdconference.org/tsd2014/download/preprints/575.pdf |
Doi | http://dx.doi.org/10.1007/978-3-319-10816-2_7 |
Obor | Informatika |
Klíčová slova | authorship verification; feature selection; machine learning; stylome; stylometric features |
Popis | Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: books, blogs, discussions, comments and tweets. A method of an automatic selection of authors’ stylometric features using a double-layer machine learning is proposed and evaluated. Experiments are conducted on ten disjunct train and test sets and a method of an efficient training of large number of machine learning models is introduced (163,700 models were trained). |
Související projekty: |