Ostrovskyy O. Recognition methods based on Markov models with hidden variables

Українська версія

Thesis for the degree of Candidate of Sciences (CSc)

State registration number

0414U005111

Applicant for

Specialization

  • 01.05.01 - Теоретичні основи інформатики та кібернетики

24-10-2014

Specialized Academic Board

Д 26.194.02

V.M. Glushkov Institute of Cybernetics of National Academy of Sciences of Ukraine

Essay

The thesis is devoted to hidden Markov models and their generalizations being applied to the two bioinformatics problems: recognition of gene functional fragments and forecasting protein spatial structure. The mathematical apparatus to describe both recognition problems uniformly and to adapt existing quality criteria is built. Probabilistic models based on hidden Markov models and high order Markov chains are proposed, their validity being justified based on amino acid distribution in proteins synthesized from the different DNA strands. Based on the maximum likelihood principle, a dynamic programming algorithm using proposed models is devised for solving the stated recognition problem. Model mixtures with exclusive competences are considered, as well as the corresponding recognition algorithm that processes each observed string of states using a specific constituent algorithm chosen based on observable characteristics of the string. To create the partition into competence regions, we propose predicates based on contents of short sequences of nucleotides or amino acids and describe algorithms for selecting predicates using the training set, which are based on feature selection in machine learning. The problem of reaching optimal parameters for constituent models is solved, as well as the problem of building the optimal competence regions. On conducting the computational experiment, the quality of proposed models and their compositions is shown to relate to the quality of state of the art bioinformatics algorithms.

Files

Similar theses