The dissertation addresses the urgent scientific and practical problem of promptly recognizing and labeling fake news messages using limited a priori information. This is essential for mitigating the negative impact of fake news by marking them effectively, especially given the high volume, dense flow, and limited content of individual messages. The study analyzed information and psychological influence (IPI) measures, identifying citizens as IPI targets and the goal of IPI as changing opinions, moods, and actions. The research found that the most common IPI method is spreading deceptive information via fake news, primarily through short messages on news websites and social media, usually in a natural language news format.
Traditional approaches to fake news detection rely on complex indicator groups, providing only post-facto results and lacking real-time operationality. The study identified speed of recognition and labeling as the key quality indicators in countering fake news. It analyzed requirements for rapid fake news detection under conditions of high volume, dense flow, and limited message content, choosing content analysis as the detection method.
The study detailed processes for comprehensive and rapid fake news detection and labeling, analyzing the use of frequency analysis of text tokens from short messages and the creation of a dynamically updated fake news dictionary. The Bayesian statistical criterion, adapted for the linguistic style of messages, and the feasibility of unsupervised machine learning methods were evaluated.
The method proposed combines natural language processing (NLP) techniques: frequency analysis of text tokens, an enhanced content analysis method, binary message classification using an improved Naive Bayes classifier, and the BM25 ranking function. This approach ensures rapid fake news recognition with 85-93% accuracy for binary content-based labeling, outperforming the traditional TF-IDF method integrated with Naive Bayes, which achieves 80-90% accuracy, thus improving fake news detection efficiency by 2.5%.
The improved binary classification and labeling method using the Naive Bayes classifier and BM25 ranking function features adaptive parameter selection based on experimental frequency analysis data from reliable sources. This method increases text data classification accuracy by 14% on a dynamically updated set of short messages without compromising speed compared to known Internet fake news classification implementations.
The content analysis method was enhanced using an unsupervised learning scheme, utilizing a dynamically changing dataset from reliable sources to form a fake news recognition dictionary. This ensures the timely formation of fake news features, considering dynamic changes in their style and scope.
Experimental results showed the potential for consistently forming binary evaluations and labeling fakes for users under critical access conditions. The method also allows for prompt recognition and binary labeling of fake news on low-performance devices with low power consumption and temporary lack of access to global networks.
The developed server software supports the implementation of NLP models for the proposed method, providing prompt fake news labeling for mobile users. This software can be integrated into information-analytical resources and used to implement cloud services for Internet fake news recognition.
In summary, the dissertation's relevance lies in solving the problem of rapid fake news message recognition under conditions of high volume, dense flow, and limited individual message content.