Taranukha V. Models and Algorithms for Natural Language Text Processing for Flective Languages.

Українська версія

Thesis for the degree of Candidate of Sciences (CSc)

State registration number

0417U003836

Applicant for

Specialization

  • 01.05.01 - Теоретичні основи інформатики та кібернетики

21-09-2017

Specialized Academic Board

Д 26.001.09

Taras Shevchenko National University of Kyiv

Essay

This research deals with the issues of natural language text processing and specialized model building, with special emphasis on flective languages. The major challenge arising in heuristic morphological analysis is caused by homonymy. A heuristic algorithm has been developed which is optimized for morphological analysis that uses individual words, text vocabulary and immediate word context as data sources. The numerical experiment shows high quality of the algorithm. A text model is analyzed at the morpheme and phoneme level and a criterion for determining authorship is obtained. It relies on some features that are hard for controlling by author's consciousness. An n-gram language model is investigated. The distributed model for grammatical and semantical level classes is proposed. Based on the distributed model, the recombination method is proposed. The limits of the suitability for recombination are formulated as theorems on the structure of grammatical and lemmatical classes. Experiments to test the efficiency of the filtering based on distributed model have been conducted.

Files

Similar theses