Kyrychenko O. The Study of Statistical Characteristics of Complex Networks by Methods of Intelligent Data Analysis

Українська версія

Thesis for the degree of Doctor of Philosophy (PhD)

State registration number

0824U000903

Applicant for

Specialization

  • 121 - Інженерія програмного забезпечення

19-02-2024

Specialized Academic Board

ДФ 76.051.044 (ID 4519)

Yuriy Fedkovych Chernivtsi National University

Essay

The dissertation deals with the study of statistical characteristics of complex networks and the cluster structure of the web space using methods of intelligent data analysis, in particular, the development of information technology for the clustering of large data, which were collected and processed by specially created software. In addition, stochastic matrices have been studied, which, due to their specific spectral properties, are the main mathematical object in the study of the cluster structure of the web space. The dissertation contains an introduction, four chapters, conclusions, a list of literature, and appendices. The introduction substantiates the relevance of the research topic, formulates the goal, task, subject, object and research methods; indicates the scientific novelty, theoretical and practical significance of the obtained results; presents and analyzes the link between the current research and scientific topics. The personal contribution of the candidate, as well as information about the approval and publication of the main results of the research are shown. Chapter 1 contains key information on the theory of complex networks, a description of the main areas of research and tasks that the theory of complex networks deals with. An overview and description of the main models (the Erdős–Rényi, Watts-Strogatz, Barabási–Albert models) are provided. The methods of cluster analysis, an important technique of intellectual analysis of complex networks, are classified and reviewed. Chapter 2 of the dissertation describes the concept of crawling as one of the means of gathering information, and provides an overview of existing software tools for collecting information in the web space. Chapter 2 has a practical value; its main result is the developed specialized software – a crawler with a built-in analytical module for intelligent information processing. Chapter 3 deals with the study of following educational segments of the web space: Ukrainian (edu.ua), Israeli (ac.il) and Polish (edu.pl). The information on these segments was collected and processed using a personally developed information technology. The application of this development enables to obtain the statistical characteristics and cluster structure of the above-mentioned segments of the web space and to carry out a comparative analysis. Chapter 4 deals with the issue of clustering in a graph based on the adjacency matrix. The main object of research is the stochastic matrix, which specifies the transition probabilities on the graph and is determined from the adjacency matrix. In this chapter, the spectral properties of the stochastic matrix are analyzed, taking into account the cluster structure of the graph. The main theoretical results of this chapter are follows: the fact of the convergence of the eigenvalues of the matrix P under the conditions imposed on the elements of the adjacency matrix A is proved (theorem 4.3.1); the established fact about the asymptotic equivalence of the spectra of the matrices and allows using a stochastic matrix with independent elements instead of the corresponding stochastic matrix P, the elements of which are not independent (lemma 4.4.1); a partial approach to the estimation of the distribution of the elements of matrix P is considered under the condition of the indicative distribution of matrix A elements (lemmas 4.5.1 and 4.5.2). A new algorithm has been developed for checking whether elements (graph vertices) belong to one cluster. a criterion for estimating the optimal number of clusters k_opt is built. . On the basis of the Monte Carlo method, a comparison of the developed method for estimating the number of clusters with some classical algorithms was carried out. The main results of the dissertation research are summarized in the conclusions. The appendices present scientific publications, information on the approval of the results of the dissertation. Theoretical significance. The results of theoretical research, namely the development of the theory of graph research, formulated and proven lemmas and theorems, can be used for further research in this field. They can also be applied in the educational courses in Yuriy Fedkovych Chernivtsi National University, related to teaching aids for the educational process and research. Practical significance. The crawler, information technology, and method for determining the optimal number of clusters developed in the dissertation can be used for further practical research of complex networks. The proposed approaches are used by the companies "Kvant Azimuth" and "Qlicks B.V.". Keywords: model (mathematical, economic), simulation, dynamics, intelligent data analysis, clustering, k-means, information system, information technology, intelligent system, software, software testing, software testing levels, specification of software requirements, functional and non-functional software requirements, statistical methods.

Research papers

1. Kyrychenko О., Ostapov S., Kanovsky I. Investigation of the certain internet domain statistical characteristics / Статистичні характеристики деяких зон інтернету та їх дослідження. Eastern-European Journal of Enterprise Technologies. 2013. Vol. 6, no. 12(66). Р. 91–96.

2. Кириченко О.Л., Малик І.В., Остапов С.Е. Стохастичні моделі в задачах штучного інтелекту. Вісник Київського національного університету імені Тараса Шевченка. Серія фізико-математичні науки. 2021. № 2. С. 53–57.

3. Kyrychenko O. Information technology for statistical cluster analysis of information in complex networks. Computer Systems and Information Technologies. 2022. No 4. Р. 47–51.

4. Кириченко О. Особливості архітектури програмного забезпечення для збору та аналізу статистичної інформації в глобальній мережі. Information Technology: Computer Science, Software Engineering and Cyber Security. 2023. № 2. С. 107–112.

Files

Similar theses