Voznyuk T. The use of semantic-syntactic tensor model of natural language for analysis of coreferential relations in texts

Українська версія

Thesis for the degree of Candidate of Sciences (CSc)

State registration number

0416U001806

Applicant for

Specialization

  • 01.05.01 - Теоретичні основи інформатики та кібернетики

24-03-2016

Specialized Academic Board

Д 26.001.09

Taras Shevchenko National University of Kyiv

Essay

Thesis is dedicated to improvement of systems for finding coreferential connections in the text using the tensor model of natural language. A large 100Gb text corpus was analyzed and a six-dimensional tensor of typical syntactic structures was built using pipeline data processing architecture and syntactic structures parsers. New efficient algorithms for constructing the control space of syntactic structures was developed to improve the tensor model. This made possible to save more information about the semantic and syntactic relationships in tensors of less dimension. The developed algorithms were used to solve the problem of finding coreferential connections in natural language texts. It was decided to use multi-sieve approach for Coreference resolution, because it demonstrated best result on well-known sets. The paper describes the new sieves designed for the system, which implements classifier of support vector machine for the 28-dimensional feature space. The tensor model of language and control space syntactic structures were used to calculate the feature vector used in machine learning.

Files

Similar theses