Algorithm for Document Authorship Identification and Plagiarism Evaluation Based on Generalized Suffix Tree


Aleksandar Veljković




Identifying an author of an anonymous text document is an important problem when dealing with historical data. As authors have their own characteristic writing styles, expressed through specific phrases, sentence constructions or word choices, their text documents incorporate the style and create implicit connection with the author. This paper proposes an approach for identification of authors of the anonymous documents, based on generalized suffix tree data structure and defined similarity score, suitable for analysis of digitized historical text documents. The following method can also be used for detecting and evaluating plagiarism, where the document author is known, but the document shows a high similarity with documents from another author.