Mishchenko L. THE METHOD OF RECOGNIZING FAKE NEWS ON THE INTERNET BASED ON NATURAL LANGUAGE PROCESSING

Українська версія

Thesis for the degree of Doctor of Philosophy (PhD)

State registration number

0824U002552

Thesis Registration Form

0824U002552.pdf

Applicant for

Liudmyla Mishchenko

Specialization

123 - Комп’ютерна інженерія

Specialized Academic Board

ДФ 26.002.184; ID 6516

National Technscal University of Ukraine "Kiev Polytechnic Institute".

Essay

The dissertation addresses the urgent scientific and practical problem of promptly recognizing and labeling fake news messages using limited a priori information. This is essential for mitigating the negative impact of fake news by marking them effectively, especially given the high volume, dense flow, and limited content of individual messages. The study analyzed information and psychological influence (IPI) measures, identifying citizens as IPI targets and the goal of IPI as changing opinions, moods, and actions. The research found that the most common IPI method is spreading deceptive information via fake news, primarily through short messages on news websites and social media, usually in a natural language news format. Traditional approaches to fake news detection rely on complex indicator groups, providing only post-facto results and lacking real-time operationality. The study identified speed of recognition and labeling as the key quality indicators in countering fake news. It analyzed requirements for rapid fake news detection under conditions of high volume, dense flow, and limited message content, choosing content analysis as the detection method. The study detailed processes for comprehensive and rapid fake news detection and labeling, analyzing the use of frequency analysis of text tokens from short messages and the creation of a dynamically updated fake news dictionary. The Bayesian statistical criterion, adapted for the linguistic style of messages, and the feasibility of unsupervised machine learning methods were evaluated. The method proposed combines natural language processing (NLP) techniques: frequency analysis of text tokens, an enhanced content analysis method, binary message classification using an improved Naive Bayes classifier, and the BM25 ranking function. This approach ensures rapid fake news recognition with 85-93% accuracy for binary content-based labeling, outperforming the traditional TF-IDF method integrated with Naive Bayes, which achieves 80-90% accuracy, thus improving fake news detection efficiency by 2.5%. The improved binary classification and labeling method using the Naive Bayes classifier and BM25 ranking function features adaptive parameter selection based on experimental frequency analysis data from reliable sources. This method increases text data classification accuracy by 14% on a dynamically updated set of short messages without compromising speed compared to known Internet fake news classification implementations. The content analysis method was enhanced using an unsupervised learning scheme, utilizing a dynamically changing dataset from reliable sources to form a fake news recognition dictionary. This ensures the timely formation of fake news features, considering dynamic changes in their style and scope. Experimental results showed the potential for consistently forming binary evaluations and labeling fakes for users under critical access conditions. The method also allows for prompt recognition and binary labeling of fake news on low-performance devices with low power consumption and temporary lack of access to global networks. The developed server software supports the implementation of NLP models for the proposed method, providing prompt fake news labeling for mobile users. This software can be integrated into information-analytical resources and used to implement cloud services for Internet fake news recognition. In summary, the dissertation's relevance lies in solving the problem of rapid fake news message recognition under conditions of high volume, dense flow, and limited individual message content.

Thesis supervisor

Iryna A. Klymenko

Official opponents

Mariia S. Dorosh
Viktoriia Vysotska

Reviewers

Yurii Oliinyk
Oleksii Pysarchuk

Research papers

Mishchenko L., Klymenko, I. (2023). Recognizing fake news based on natural language processing using the BM25 algorithm with fine-tuned parameters. Eastern-European Journal of Enterprise Technologies, 6 (2 (126)), c. 33–40. DOI: https://doi.org/10.15587/1729-4061.2023.293513.

L.D. Mishchenko, I. A. Klymenko, A method of accelerated fake news recognition based on natural language processing and removal of vowels in words. Збірник наукових праць «Проблеми інформатизації та управління» 1(73)/2023, 2023-04-28. с. 39-44. ISSN 2073-4751. DOI: https://doi.org/10.18372/2073-4751.73.17643.

L. Mishchenko, I. Klymenko, METHOD FOR DETECTING FAKE NEWS THROUGH WRITING STYLE. Технічні науки та технології, 4 (34). Чернігів, Україна. DOI: https://doi.org/10.25140/2411-5363-2023-4(34).

Л. Міщенко, І. Клименко. РОЗПІЗНАВАННЯ ФЕЙКОВИХ НОВИН ІЗ ВИКОРИСТАННЯМ ОБРОБКИ ПРИРОДНОЇ МОВИ Й АРХІТЕКТУРИ З НИЗЬКИМ ЕНЕРГОСПОЖИВАННЯМ ДЛЯ ПЕРИФЕРІЙНИХ ОБЧИСЛЕНЬ. Збірник наукових праць «Проблеми інформатизації та управління» 4(76)/2023. c. 49-57. DOI: https://doi.org/10.18372/2073- 4751.76.18241

L. Mishchenko, I. Klymenko, METHOD FOR DETECTING FAKE NEWS BASED ON NATURAL LANGUAGE PROCESSING. The VI International Scientific and Practical Conference «Modern ways of solving the problems of science in the world», February 13 – 15, 2023. Warsaw, Poland. p. 375- 378.

Людмила Міщенко. СПОСІБ РОЗПІЗНАВАННЯ ФЕЙКОВИХ НОВИН. Science, society, education: topical issues and development prospects: V International Scientific and Practical Conference Kharkiv, Ukraine. 12- 14 April 2020

L. Mishchenko, I. Klymenko, V. Tkachenko. The fake news recognition method based on Naïve Bayes with improved TF-IDF algorithm. Mathematical Modeling and Simulation of Systems (MODS'2023). Chernihiv, Ukraine, November 13-15, 2023.

0824U002661

Alina Rybalchenko

The method of optimal data placement in billing OLTP-systems based on the rank approach

0824U002439

Serhii S. Korotkov

METHODOLOGY OF BUILDING A MANAGEMENT INFORMATION SYSTEM CITY TRANSPORT INFRASTRUCTURE BASED ON S-HYPERNETWORK THEORY

0824U002425

Kyrylo Pshenychnyi

Temporal finite state machines models and verification methods in hardware description languages

0824U002359

Yurii O. Voichur

Methods and tools for predicting the level of quality and security of computer systems' software

0824U002059

Kyrylo M. Leichenko

Methods and tools for planning the deployment of flying networks to ensure data transmission in conditions of devastation.