A Framework for Authorship Identification in the Internet Environment

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

RYGL Jan HORÁK Aleš

Year of publication 2011
Type Article in Proceedings
Conference Proceedings of Fifth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2011
MU Faculty or unit

Faculty of Informatics

Citation
Web
Field Use of computers, robotics and its application
Keywords authorship identification;authorship similarity
Description Misuse of anonymous online communication for illegal purposes has become a major concern. In this paper, we present a framework named ART (Authorship Recognition Tool), that is designed to minimize manual procedures and maximize the efficiency of authorship identification based on the content of Internet electronic documents. The framework covers the phases of document retrieval and database document management. ART provides implementations of efficient authorship identification algorithm and authorship similarity algorithm including the possibility to obtain extra data for learning and tests. The framework also determines whether or not different author’s identities are interlinked. The authorship is analysed by machine learning and natural language processing methods. Technical information such as IP address is considered only as an optional attribute for the machine learning because it can be easily forged or devalued if the author communicates from public places or through proxy servers.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.