Určování autorství anonymních textů na základě automaticky nalezených charakteristických znaků
Title in English | Determining Authorship of Anonymous Texts Based on Automatically Discovered Characteristic Features |
---|---|
Authors | |
Year of publication | 2011 |
MU Faculty or unit | |
Citation | |
Description | Master's thesis. The work is based on the most successful methods for determining authorship of anonymous documents. We combine, optimize and revise these methods and create new techniques for three main tasks: Automatic assignment of the authorship with the given set of documents, Verification of the authorship of the document by selected author, Clustering of documents according to their authorships. Our implemented algorithms are tested on the Czech documents, but system is modular and if we remove or replace some language-dependent components, we can process documents written in any language. Everything is coded in the Python. The system contains tools for preprocessing of Czech data and for management of stored documents in the PostgreSQL database. The thesis also makes empirical observations of performance of the most popular methods for determining authorship of Czech documents. Most measurements were performed on English texts (books, newspaper articles, rarely e-mails) and until now the statistics for Czech data were missing. |
Related projects: |