Параметри
Methods of author identification
Дата випуску :
2020
Автор(и) :
Mykhailiuk Vladyslav
Анотація :
As a result of the research, 2 methods of identification of an unknown author of a work belonging to the library of known authors were implemented, the method of text clustering was also implemented and testing of methods with and without clustering was performed. A criterion method was also proposed to select the 𝑛-grams that would best serve as a marker to identify the author.
800 texts by 16 authors were used for testing. As a result, it was found that the method that uses the density of the distribution function is suitable for identifying the authors of works of both large texts (50,000+ characters) and small (10,000+ characters). And the method that uses p-statistics is only suitable for use on large works. With clustering of texts, much better results were obtained in a test sample for both methods.
800 texts by 16 authors were used for testing. As a result, it was found that the method that uses the density of the distribution function is suitable for identifying the authors of works of both large texts (50,000+ characters) and small (10,000+ characters). And the method that uses p-statistics is only suitable for use on large works. With clustering of texts, much better results were obtained in a test sample for both methods.
Бібліографічний опис :
Mykhailiuk V. Methods of author identification : graduation thesis … master's : 113 Applied Mathematics / Vladyslav Mykhailiuk. - Kyiv, 2020. - 25 p.
Файл(и) :
Вантажиться...
Формат
Adobe PDF
Розмір :
646.15 KB
Контрольна сума:
(MD5):fc7bcd9ca91cfedba81bbddea9f870da
Ця робота розповсюджується на умовах ліцензії Creative Commons CC BY-NC