From word to number: linguistic and statistical “portrait” of the idiolect of Stepan Bandera

Зубань, Оксана МиколаївнаОксана МиколаївнаЗубаньКривенок, Владислав МихайловичВладислав МихайловичКривенок2025-08-212025-08-212024Зубань О., Кривенок В. Від слова до цифри: лінгвостатистичний “портрет” ідіостилю Степана Бандери. Українське мовознавство. 2024. Вип. 1 (54). С. 222-254.УДК 811.161.24’324’38https://doi.org/10.17721/um/54(2024).222-254https://ir.library.knu.ua/handle/15071834/7190Background. The personality of S. Bandera has repeatedly been the subject of research by scholars in terms of non-linguistic categories. However, one of the most expressive aspects of the figure – the linguistic one – has not yet been the subject of systematic and in-depth study. Furthermore, the journalistic heritage of S. Bandera has not yet been subjected to a systematic linguistic and statistical analysis, which renders this study a valuable contribution to the field. The objective of this article is to provide a statistical parameterisation of the lexical and part-of-speech structure of S. Bandera’s texts, based on a textual sample of 10 journalistic articles (first editions). Methods: the method of retrospective editing of the text of reprints, statistical methods (calculation of statistical parameters, comparison of empirical data with the confidence interval of Ukrainian journalism, the method of statistical modelling), and the method of graphical presentation of statistical data. Results: 1) the articles from the 1978 edition were edited retrospectively and the textual material was digitised and categorised in a format suitable for automatic processing; 2) a corpus of all digitised first editions of articles was compiled; 3) the text of the articles – from the first edition and the 1978-edition – were compared; 4) frequency dictionaries in the Ukrainian Language Corpus were compiled based on the reconstructed texts; 5) a stylometric model of statistical research was formed, and several parameters of the statistical structure (lexical and part-of-speech) of S. Bandera’s text were calculated; 6) empirical values of linguistic and statistical parameters calculated manually were compared with confidence intervals of the media style and with the automatically calculated empirical data of the TextAttributor 1.0 web application. Conclusions. The study resulted in the creation of the inaugural lexicographic system of frequency dictionaries in Ukrainian computer lexicography, based on the initial editions of S. Bandera’s articles. The linguistic and statistical modelling of the texts allows us to speak of the systematisation of statistical data and the formation of a linguistic and statistical “portrait” of S. Bandera’s idiolect. This is a set of all empirical data on the statistical structure of his texts, comprising nine statistical parameters, which have been calculated and compared with the confidence intervals of the statistical parameters of the journalistic style of the Ukrainian language in general.Вступ. Особистість С. Бандери неодноразово ставала об’єктом дослідження вчених на предмет позамовних категорій, тоді як одна з найвиразніших сторін діяча – мовна – досі не була системно й глибинно вивчена. Не стала публіцистична спадщина С. Бандери й об’єктом системного лінгвостатистичного дослідження, що і зумовлює актуальність дослідження. Метою статті є опис статистичної параметризації лексичної та частиномовної структури текстів С. Бандери на матеріалі текстової вибірки 10-ти публіцистичних статей (першодруків). Методи: метод ретроспективного редагування тексту повторних друкованих видань, статистичні методи (обчислення статистичних параметрів, зіставлення емпіричних даних із довірчим інтервалом української публіцистики, метод статистичного моделювання), методика графічного унаочнення статистичних даних. Результати: 1) проведено ретроспективне редагування статей редакції 1978 р. і систематизовано текстовий матеріал дослідження у цифровому форматі, придатному для автоматичного оброблення; 2) укладено текстовий корпус усіх оцифрованих першодруків статей; 3) порівняно два варіанти тексту статей – першого видання та 1978 р.; 4) за реконструйованими текстами укладено частотні словники у Корпусі української мови; 5) сформовано стилеметричну модель статистичного дослідження і обчислено низку параметрів статистичної структури (лексичної та частиномовної) тексту С. Бандери; 6) порівняно емпіричні значення лінгвостатистичних параметрів, обчислених вручну, із довірчими інтервалами медійного стилю та автоматично обчисленими емпіричними даними вебзастосунку TextAttributor 1.0. Висновки. У ході дослідження було укладено першу в українській комп’ютерній лексикографії лексикографічну систему частотних словників на матеріалі текстів першодруків статей С. Бандери. Проведене лінгвостатистичне моделювання текстів дозволяє говорити про систематизацію статистичних даних і формування лінгвостатистичного “портрета” ідіостилю С. Бандери – це сукупність усіх емпіричних даних про статистичну структуру його текстів за дев’ятьома статистичними параметрами, зіставленими з довірчими інтервалами обчислених статистичних параметрів публіцистичного стилю української мови загалом.ukукраїнська моваідіостильстилеметрична модельстатистичний параметркорпус текстівчастотний словникUkrainian languageidiolectstylometric modelstatistical parametercorpus of textsfrequency dictionaryFrom word to number: linguistic and statistical “portrait” of the idiolect of Stepan BanderaВід слова до цифри: лінгвостатистичний “портрет” ідіостилю Степана БандериСтаття