Hornostal O. Ensemble method of computer system state identification

Українська версія

Thesis for the degree of Doctor of Philosophy (PhD)

State registration number

0824U001783

Applicant for

Specialization

  • 123 - Комп’ютерна інженерія

20-06-2024

Specialized Academic Board

ДФ 64.050.138-5697

National Technical University "Kharkiv Polytechnic Institute"

Essay

The dissertation work is devoted to the solution of the actual scientific and applied problem of improvement, development and implementation of methods for identifying the state of computer systems with the aim of improving their efficiency due to the use of ensemble methods of machine learning. The purpose of the dissertation is to improve the quality of identification of the state of computer systems by developing and improving methods for recognizing anomalies and abuses. The object of research is the process of detecting intrusions into computer systems under conditions of external influences. The subject of research is methods of identifying the state of computer systems based on machine learning technology using ensemble meta-algorithms. The following scientific results were obtained within this area: 1. The computer system state identification method based on decision trees and the bagging meta-algorithm was further developed due to the selection of optimal hyperparameters of the classifier setting and the use of a data pre-processing procedure, which is focused on removing anomalous data and reducing the statistical dependence between features, which allowed to improve the quality of state identification computer systems. 2. The ensemble method of identifying the state of the computer system was further developed due to the use of a multilayer perceptron as the basic model of the ensemble and the selection of optimal hyperparameters for the classifier setting, which made it possible to improve the quality of its functioning. 3. The ensemble method for identifying the state of a computer system based on the homogeneous meta-algorithm of bagging has been improved by developing a special procedure for reducing the number of basic classifiers and their ranking during weighted voting, which made it possible to reduce the time of the ensemble and improve the quality of classification of the state of the CS. 4. For the first time, the method for identifying the state of a computer system was proposed, which differs from known methods by using a heterogeneous bagging meta-algorithm and includes a three-stage process for selecting basic classification models based on the Pasting technology, which made it possible to increase the efficiency of identifying the state of the computer system. The practical significance of the obtained results includes the following achievements: - a software model for data pre-processing focused on removing anomalous data and reducing the correlation of features was formed, which allows to increase the recognition speed up to 1.62 times, reduce the training time up to 24.76 times, and also improve the quality of their classification; - a method for identifying the state of the computer system was developed, which includes the established data preprocessing procedure, the process of selecting the input data generation algorithm, and the construction of a bagging classifier with the adjustment of its hyperparameters, which made it possible to improve the quality of classification: the AUC-ROC value of the classifier on the training sample increases by 11% , and on the test sample – by 3%; - the software model of an ensemble classifier based on a multilayer perceptron as a basic classifier and a procedure for selecting the optimal settings of its parameters has been implemented, which made it possible to increase the value of classification accuracy by 4.67%; - the software that performs ensemble pruning based on the maximization of the absolute accuracy of the base classifiers and classification using weighted voting based on the logarithmic loss function was developed, which allowed to improve the quality indicators of the bagging ensemble classification, namely the value of the F1-Score metric up to 2.4%; - the method for forming a heterogeneous ensemble, which includes the selection of basic classifiers, learning homogeneous bagging ensembles based on them, creating combination groups (pools) from basic classifiers and forming a heterogeneous ensemble using the Pasting procedure was developed, which made it possible to improve the quality of classification, namely to increase the F1-Score of models when working on test data by 9.5% compared to the standard homogeneous bagging ensemble based on decision trees and by 2% compared to the maximum value among homogeneous ensembles. According to the results of the research, the theoretical and practical value was confirmed, a study of their effectiveness was conducted and practical recommendations were formed regarding their application.

Research papers

1. О. А. Горносталь та С. Ю. Гавриленко, "Розробка адаптивних шаблонів фіксації аномальної поведінки комп'ютерної системи", Зб. наукових праць Системи обробки інформації, Харків.: ХУ ПС, 2016, Вип. 3(140), с.11-14. (Б)

2. O. Hornostal and S. Gavrylenko, V. Chelak, and V. Vassilev, "Development of a method for identification the state of a computer system using fuzzy cluster analysis", Advanced Information Systems, Kharkiv, 2020, vol. 4, no. 2, pp. 8-11. (Б)

3. O. Hornostal and S. Gavrylenko, "Development of a method for identification of the state of computer systems based on bagging classifiers", Advanced Information Systems, 2021, vol. 5, no. 4, pp. 5–9. (Б)

4. О. Горносталь та С. Гавриленко, "Метод ідентифікації стану комп’ютерної системи на основі ансамблевих класифікаторів з покращеною процедурою голосування", Системи управління, навігації та зв’язку. Збірник наукових праць, 2023, т. 3, вип. 73, с. 79-85. (Б)

5. O. Hornostal and S. Gavrylenko, "Application of heterogeneous ensembles in problems of computer system state identification", Advanced Information Systems, 2023, vol. 7, no. 4, pp. 5–12. (Б)

6. О. А. Горносталь та С. Ю. Гавриленко, "Аналіз ефективності фільтрації несприятливого мережевого трафіку з використанням комплексних систем", Інформатика, управління та штучний інтелект. Матеріали другої науково-технічної конференції студентів, магістрів та аспірантів, Харків, 2015, с. 13.

7. О. А. Горносталь та С. Ю. Гавриленко, "Виявлення аномальної поведінки комп'ютерних систем за допомогою контрольних карт Шухарта та карт кумулятивних сум", Матеріали міжнародної конференції «Проблеми науково-технічного та правового забезпечення кібербезпеки у сучасному світі», Харків, 2016, с.14-15.

8. O. Hornostal, V. Chelak, S. Gavrylenko and, S. Gornostal, "Intrusion detection in computer systems", Proceedings of the symposium "Metrology and metrology assurance", Sozopol, Bulgaria, 2016, pp. 342-347.

9. O. Hornostal, S. Gavrylenko, and V. Chelak, "Development of a heuristic scanner for an antivirus program on the basis of the Mamdani fuzzy logic method", Proceedings of the 28th International Scientific Symposium Metrology and Metrology Assurance, Sozopol, Bulgaria, 2018, pp.129-133.

10. O. Hornostal, V. Chelak, S. Gavrylenko, and S. Gornostal, "Identification of the computer system state based on multidimensional discriminant analysis", in Proceedings of the 29th International Scientific Symposium Metrology and Metrology Assurance, Sozopol, Bulgaria, 2019, pp. 192-196. (Scopus, Bulgaria)

11. O. Hornostal, and S. Gavrilenko, "Identification of Anomalies in the Behavior of a Computer System using Fuzzy Cluster Analysis", Proceedings of the 7th International Informatics, management and artificial intelligence, Kharkiv, 2019, p. 21.

12. O. Hornostal, V. Chelak, and S. Gavrylenko, "Research of Intelligent Data Analysis Methods for Identification of Computer System State", in Proceedings of the 30th International Scientific Symposium Metrology and Metrology Assurance (MMA), Sozopol, Bulgaria, 2020, pp. 1-5. (Scopus, Bulgaria)

13. O. Hornostal, S. Gavrylenko and V. Chelak, "Ensemble approach based on bagging and boosting for identification the computer system state", in Proceedings of the 31th International Scientific Symposium Metrology and Metrology Assurance, Sozopol, Bulgaria, 2021, pp. 1-7. (Scopus, Bulgaria)

14. О. А. Горносталь та С. Ю. Гавриленко, "Дослідження методів підвищення ефективності роботи беггінг-класифікаторів у задачах ідентифікації стану комп’ютерних систем", Матеріали VIII міжнародної науково-технічної конференції “Інформатика, управління та штучний інтелект" (ІУШІ-2021), Харків, 2021.

15. О. А. Горносталь та С. Ю. Гавриленко, "Дослідження беггінг-алгоритмів для ідентифікації стану комп’ютерної системи", Матеріали ІV Всеукраїнської науково-практичної інтернет-конференції студентів, аспірантів та молодих вчених за тематикою «Сучасні комп’ютерні системи та мережі в управлінні»: збірка наукових праць, під редакцією Г.О. Райко, Херсон, 2021, с. 27-28.

16. О. А. Горносталь, та С. Ю. Гавриленко, "Дослідження та вдосконалення методів підвищення точності роботи bagging-ансамблів для классифікації стану комп’ютерних систем", на дев'ятій міжнародній науково-технічної конференції "Інформатика, Управління та Штучний Інтелект" (ІУШІ-2022), Харків - Краматорськ, 2022, с. 29.

17. O. Hornostal, S. Gavrylenko, and V. Chelak, "Construction Method of Fuzzy Decision Trees for Identification the Computer System State", in Proceedings of the 32th International Scientific Symposium Metrology and Metrology Assurance, Sozopol, Bulgaria, 2022, pp. 1-5. (Scopus, Bulgaria)

18. O. Hornostal, S. Gavrylenko, and V. Chelak, "Research of Methods of Identifying the Computer Systems State based on Bagging Classifiers", in IEEE 3rd KhPI Week on Advanced Technology (KhPIWeek), Kharkiv, Ukraine, 2022, pp. 1-6. doi: 10.1109/KhPIWeek57572.2022.9916439. (Scopus, Ukraine)

19. O. Hornostal and S. Gavrylenko, "Study of Methods for Improving the Meta-Algorithm of the Bagging Classifier", 2023 IEEE 4th KhPI Week on Advanced Technology (KhPIWeek), Kharkiv, Ukraine, 2023, pp. 1-6. (Scopus, Ukraine)

20. О. А. Горносталь та С. Ю. Гавриленко, "Метод підвищення якості ансамблевого класифікатору за рахунок диверсифікації базових моделей", ХXIII Міжнародна науково-технічна конференція Проблеми інформатики та моделювання, Харків, 2023, c. 33-34.

Files

Similar theses