Options
Methods of author identification
Issue Date :
2020
Author(s) :
Mykhailiuk Vladyslav
Abstract :
As a result of the research, 2 methods of identification of an unknown author of a work belonging to the library of known authors were implemented, the method of text clustering was also implemented and testing of methods with and without clustering was performed. A criterion method was also proposed to select the 𝑛-grams that would best serve as a marker to identify the author.
800 texts by 16 authors were used for testing. As a result, it was found that the method that uses the density of the distribution function is suitable for identifying the authors of works of both large texts (50,000+ characters) and small (10,000+ characters). And the method that uses p-statistics is only suitable for use on large works. With clustering of texts, much better results were obtained in a test sample for both methods.
800 texts by 16 authors were used for testing. As a result, it was found that the method that uses the density of the distribution function is suitable for identifying the authors of works of both large texts (50,000+ characters) and small (10,000+ characters). And the method that uses p-statistics is only suitable for use on large works. With clustering of texts, much better results were obtained in a test sample for both methods.
Bibliographic description :
Mykhailiuk V. Methods of author identification : graduation thesis … master's : 113 Applied Mathematics / Vladyslav Mykhailiuk. - Kyiv, 2020. - 25 p.
File(s) :
Loading...
Format
Adobe PDF
Size :
646.15 KB
Checksum :
(MD5):fc7bcd9ca91cfedba81bbddea9f870da
This work is distributed under the Creative Commons license CC BY-NC