Kovilin Y. Model of answers generating in the search engines based on an unstructured knowledge base

Українська версія

Thesis for the degree of Candidate of Sciences (CSc)

State registration number

0420U102047

Applicant for

Specialization

  • 01.05.02 - Математичне моделювання та обчислювальні методи

11-11-2020

Specialized Academic Board

Д 08.084.01

National Metallurgical Academy Of Ukraine

Essay

The dissertation is devoted to the solution of the actual scientific and applied problem of model development for automation of processing the semantically unstructured documents for the answers generation in the search engines. The dissertation analyzes the existing approaches to the construction of applied models of the texts in natural language. Existing approaches require the involvement of significant volumes of linguistic knowledge, preliminary construction of ontological markup or manual creation of the semantic dictionaries of knowledge, which significantly reduces the adaptability of models and complicates their applied use. The existing software systems rely in their work on the extraction of knowledge using ontological bases, which makes it impossible to use them for processing the unstructured knowledge bases. Based on the results of the analysis, it became obvious that there is a need to develop the models that allows the automating process of obtaining and presenting semantic models of specialized texts without the need to involve linguistic knowledge or the formation of the preliminary semantic markups or dictionaries. The first developed model was a mathematical model for obtaining the semantic characteristics of a specialized text. The created model is innovative, and uses in its work the method of latent semantic analysis, clustering and spatial analysis. The created model makes it possible to obtain a semantic model of a text without using a previously created semantic structure or semantic dictionaries, which made it possible to fully automate the process of obtaining quantitative values of the semantic characteristics of the text. The carried out tests showed that, despite the use of the frequency characteristics of the text, the created model depends precisely on the semantics of a document, and not on its frequency portrait. In addition, the created model makes it possible to reliably combine semantically related terms into semantic text labels, while establishing a semantic connection with a set of sentences from the source text, which is a prerequisite for building an automatic text generation process. The second developed model was the model of automatic classification of incoming documents, which uses the quantitative characteristics from the semantic model of text in the process of its work. The created model allows system to filter the incoming documents, thus protecting the created knowledge base from being filled with inappropriate texts. The carried out tests showed that the accuracy of the system, built on the basis of the created model, is 90%, which is sufficient for the correct operation of this stage. The last model created is the unstructured knowledge base response generation model. This model works on the basis of the first and second models and allows to generate the new text knowledge relevant to the user’s request. The tests have shown that the use of the developed model allows improvements in the semantic quality of the set of candidate texts for generation by 1.7 times, in comparison with the direct solution of the search problem. The expert assessments carried out by the ball method showed a value of 0.839, which proves the applicability and adequacy of the created model. The developed software application was implemented in the city municipal cultural institution «Centralized system of libraries for children» in Dnipro as a search tool for electronic text processing, in LLC «Sital Ukraine» as a tool of automatic generation of text instructions and in JSC «DniproAzot» as a tool to improve search processes. The results of the dissertation are published in 14 scientific papers, including 8 articles - in journals recommended by the Ministry of Education and Science of Ukraine for publication of dissertations and foreign publications, and 6 - in abstracts and papers of international and national conferences.

Files

Similar theses