Lang C. Methodology and software classify natural language text documents

Українська версія

Thesis for the degree of Candidate of Sciences (CSc)

State registration number

0412U002876

Thesis Registration Form

0412U002876.pdf

Applicant for

Lang Chunlin

Specialization

05.13.05 - Комп'ютерні системи та компоненти

Date of defense

11-06-2012

Specialized Academic Board

Д 26.002.02

Publishing and Printing Institute of Igor Sikorsky Kyiv Polytechnic Institute

Essay

The thesis is devoted to solving the problem of automatic language identification and classification of natural language text documents. The method for automatic identification of languages using statistical N-grams, comparative analysis of different methods of classification of text documents in order to choose optimal precision and recall, the proposed classification natural language text documents using the method developed by statistical N-grams, the method automatically classify text documents in real time, created a software module for the identification and classification natural language text documents. The proposed method of classification of text documents allowed to improve accuracy and speed of classification, to develop appropriate software for use in automatic processing of texts in multilingual information systems. Keywords: automatic language identification, classification of text documents, natural language, N-grams, multi-label classification.

Thesis supervisor

Zajschev Volodimir Grigirjevich

Official opponents

Бузовський Олег Володимирович
Селігей Олександр Минович

Files

АВТОРЕФЕРАТ.doc

ДИССЕРТАЦИЯ.doc

Similar theses

0524U000048

Anna V. Khakhanova

Federated computing of vector-matrix transactions in cyber-social systems

0424U000015

Oleksandr Melnyk

Methods and means of forming graphic primitives on a hexagonal grid

0422U100014

Saprykin Oleksandr Sergiyovych

Models for automated analysis and diagnosis of polymorphic viruses in computer systems and networks

0421U103964

Uzdenov Taras Amurovych

Task scheduling methods for GRID-systems with inalienable resources

0521U101938

Mozhaiev Mykhailo O.

Models and methods of synthesis of a specialized computer system in providing forensic science services for justice