Kosa V. A Method of Experimental Study of Terminological Saturation in Document Collections for Knowledge Elicitation

Українська версія

Thesis for the degree of Doctor of Philosophy (PhD)

State registration number

0821U101735

Applicant for

Specialization

  • 122 - Комп’ютерні науки

26-05-2021

Specialized Academic Board

ДФ 17.051.026

Zaporizhzhia National University

Essay

The object of research: the process of automated extraction, from the collections of relevant documents, of the sets of terms that characterize an arbitrary subject domain, for the further development of the ontologies for this domain, with an account for the phenomenon of terminological saturation. The subject of research: a method of experimental study of terminological saturation in documents collections for knowledge elicitation to be represented with respect to an arbitrary domain of discourse. The goal of research: improvement, in representativeness, efficiency, and effectiveness, of eliciting knowledge, from the collections of professional documents bounded for an arbitrary domain, for the further development of ontologies, by developing a complex computational method for detecting and measuring terminological saturation in the collections of professional textual documents that describe the domain. The results of research: Developed is a novel complex computational method for detecting and measuring terminological saturation in the sequence of incrementally expanded subcollections of the hypothetically available complete collection of professional documents describing an arbitrary domain. Further developed are: the formal denotation of the measure of terminological difference between two sets of terms with the values of the estimations of their significance; the optimized computational method for automated terms extraction based on the C-value method, which computes partial C-Values, computed over the increment of a subcollection, and then merges these partial C-values. Improved is the computational pipeline for the detection and measurement of terminological saturation by the incorporation of the method of selecting relevant documents to the collection increments, the use of the developed technique and algorithms for partly similar terms grouping, the use of the document ordering based on their descending citation frequency for forming collection increments. The scope of application: the results of the research could be used: in academic research for selecting representative collections of scientific papers for the chosen topic; in industry for performing terminological analysis of document collections; in academy as an instrument for bibliography selection by the students of final grades and PhD students.

Files

Similar theses