Yamkovyi K. Information technologies for building composite indicators based on a machine learning approach

Українська версія

Thesis for the degree of Doctor of Philosophy (PhD)

State registration number

0823U101138

Applicant for

Specialization

  • 122 - Комп’ютерні науки

20-12-2023

Specialized Academic Board

ДФ 64.050.103-2969

National Technical University "Kharkiv Polytechnic Institute"

Essay

The scientific and practical task of developing methods and information technologies for building composite indicators based on kernel methods of machine learning and optimal concordance of expert and statistical information is solved in the dissertation work. Research object – the processes of building composite indicators in the tasks of ranking and multi-criteria evaluation and selection. Research subject – methods and information technologies for building composite indicators based on optimal concordance of expert and statistical information and data aggregation. The purpose and objectives of the research – the development of methods and information technologies for building composite indicators based on kernel methods of machine learning and optimal concordance of expert and statistical information, to increase the accuracy of the obtained models and limit their complexity. The introduction substantiates the relevance of the topic of the dissertation, indicates the connection of the work with scientific topics, formulates the goal, tasks and objectives of the research, defines the object, subject and methods of the research, shows the scientific novelty and practical significance of the obtained results, provides information about practical use, personal contribution of the recipient, the approbation of research results and their coverage in publications. Information on the structure and scope of the dissertation work is given. In the first chapter, an analysis of the task of constructing composite indicators and an overview of various approaches to their construction, in particular, machine learning methods, was carried out. Examples of the use of composite indicators in many areas for the construction of various generalized indicators are given: human development, environmental efficiency, investment portfolio, etc. The choice of the goal and tasks of the work is substantiated. The second chapter formulates the task of constructing a composite indicator in terms of machine learning, and a solution to the task of constructing a nonlinear model of a composite indicator based on kernel ridge regression is obtained. The methods of concordance of disparate expert information, which allow finding a compromise between expert assessments of composite indicators and statistical assessments of partial indicators, are analyzed. The proposed method of optimal concordance of expert and statistical information using kernel regression regularization with the use of a priori information on the importance of partial indicators is justified, which significantly increases the accuracy of the obtained models. The third chapter provides the principles of the concept of big data and describes the problems that arise when the amount of information used to construct composite indicators increases. It is proposed to use data aggregation methods to reduce the complexity of the kernel model. Methods of grouping and clustering for data aggregation are considered. To increase the accuracy and efficiency of clustering, it is proposed to use regularization with the help of a target variable at the stage of calculating the distance between points in the feature space, and the proposed method of regularized clustering is outlined. The problem of insufficient data marking, which especially often arises when the amount of data increases, is identified. To solve this problem, it is proposed to use semi-supervised learning methods based on graph regularization and kernel trick during the optimization of the nonlinear preference function. To solve these problems, a two-stage data aggregation algorithm was developed, which uses both global and local patterns in the set structure during aggregation. This approach allows to significant reduce the size of the sample while preserving all properties and patterns. The fourth chapter describes the proposed information technology for building composite indicators using machine learning methods, which implement the methods and algorithms developed in the work. The developed information technology is implemented in the form of a library in the Python programming language with open-source source code and inherits scikit-learn library interfaces and meets all the requirements of project development methodologies in the field of machine learning and data analysis, namely KDD and CRISP-DM. The functionality of the developed information technology, the accuracy of the proposed algorithms, and the obtained research results were analyzed. For this, several multidimensional data sets representing different domains of the area were used. The results showed the efficiency and effectiveness of the methods and algorithms proposed in the work. In the conclusions, the main results of the dissertation work on the solution of the set scientific research problems are presented.

Research papers

K. Yamkovyi, “Adaptation of LambdaMART model to semi-supervised learning,” Вiсник Нацiонального технiчного унiверситету «ХПI». Серiя: Системний аналiз, управлiння та iнформацiйнi технологiї, 2023, №1(9), с. 76—81. (Б)

L. Lyubchyk, K. Yamkovyi, “Comparative Analysis of Modified Semi- Supervised Learning Algorithms on a Small Amount of Labelled Data,” System Research & Information Technologies, 2022, № 4, с. 34—43. (А, Scopus)

O. Akhiiezer, G. Grinberg, L. Lyubchyk, K. Yamkovyi, “Failure rate regression model building from aggregated data using kernel-based machine learning methods,” Вiсник Нацiонального технiчного унiверситету «ХПI». Серiя: Системний аналiз, управлiння та iнформацiйнi технологiї, 2022, № 2 (8), с. 51—56. (Б)

K. Yamkovyi, “Development and comparative analysis of semi-supervised learning algorithms on a small amount of labeled data,” Вiсник Нацiонального технiчного унiверситету «ХПI». Серiя: Системний аналiз, управлiння та iнформацiйнi технологiї, 2021, № 1 (5), с. 98—103. (Б)

L. Lyubchyk, O. Akhiiezer, G. Grinberg, K. Yamkovyi, “Machine Learning-Based Failure Rate Identification for Predictive Maintenance in Industry 4.0,” 2022 12th International Conference on Dependable Systems, Services and Technologies (DESSERT), IEEE, Athens, Greece , 2022, с. 1—5. (Scopus, Греція)

L. Lyubchyk, G. Grinberg, K. Yamkovyi, “Integral Indicator for Complex System Building Based on Semi-Supervised Learning,” 2018 IEEE First International Conference on System Analysis & Intelligent Computing (SAIC), IEEE, Kyiv, Ukraine, 2018, с. 1—5. (Scopus, Україна)

Files

Similar theses