Vavilenkova A. Theoretical basis of automatic analysis and synthesis of logic and linguistic models of text documents

Українська версія

Thesis for the degree of Doctor of Science (DSc)

State registration number

0517U000844

Applicant for

Specialization

  • 05.13.06 - Інформаційні технології

13-12-2017

Specialized Academic Board

Д 26.204.01

Essay

The dissertation research solves the problem of the lack of the theory of constructing effective tools for the content processing of electronic text documents by developing a mathematical apparatus for the formal description of an electronic text document based on logic and linguistic modeling. The result of work is the author's information technology of automatic comparative analysis of electronic text documents by content. The main purpose of the dissertation is to improve the quality of the process of automatic comparison of electronic text documents using the first developed theoretical and applied foundations of information technology construction. Author created a mathematical apparatus for the formal description of an electronic text document based on the predicate logic, which, unlike formal grammars, makes it possible to structure textual information, starting with the lowest level of constructing logical relationships and ending with the text as a whole. In the dissertation research, the general form of the logic and linguistic model of the natural language sentence has been improved by introducing into its structure elementary predicates describing parts of the sentence, which represent the ultimate content. Their classification was made according to typical cases of harmonization of components of logic and linguistic models, which allowed to present textual information in the form of conjunction of simple, indivisible content units. Author developed the basic principles and rules for the synthesis of logic and linguistic models of natural language sentences based on the identification of means of meaningful connection in text documents and serves as a basis for constructing logic and linguistic models of electronic text documents. The result of application of developed principles of synthesis of logic and linguistic models of natural language sentences is the creation of a knowledge base for information technology of automatic comparative analysis of electronic text documents by content. Abstract models of logical conversion have been created to formalize the description of logical relationships between parts of text documents and their geometric interpretations. The study provides five types of abstract models (templates) of formation of meaningful links between the natural language sentences, which are based on the types of thematic progressions. Author created the general form of the logic and linguistic model of an electronic text document that contains a linguistic and semantic-syntactic component, and which is based on the basic principles and rules of the synthesis of logic and linguistic models and abstract models of logical conversion. The algorithm of automatic formation of logic and linguistic models of electronic text documents consists of the following stages: text division, construction of logic and linguistic models of sentences of the text, synthesis of logic and linguistic models of natural language sentences, formation of a text base, definition of characteristics of each of the paragraphs, formation of complex syntactical parts of the text and definition the type of coherent text. Author developed the method of automatic comparative analysis of logic and linguistic models of electronic text documents. The algorithm of automatic analysis of logic and linguistic models of electronic text documents was developed, which allows restoring text information using logic and linguistic models of natural language sentences that are the part of it. Author has created the information technology of automatic comparative analysis of electronic text documents based on the logic and linguistic modeling, which, unlike the existing systems of automatic comparison of electronic documents, takes into account content links in the texts, which allowed to increase the percentage of found matches by 12%. The determination of the effectiveness of the created information technology was carried out by comparing with existing systems of comparative analysis of electronic text documents on the basis of the following criteria: functional capabilities, methods used, speed and obtained percentage of coincidence during comparison.

Files

Similar theses