Nikitin V. Methods of increasing the efficiency of data consistency in information systems

Українська версія

Thesis for the degree of Doctor of Philosophy (PhD)

State registration number

0824U001862

Applicant for

Specialization

  • 126 - Інформаційні системи та технології

21-06-2024

Specialized Academic Board

ДФ 26.002.160; ID 5546

National Technscal University of Ukraine "Kiev Polytechnic Institute".

Essay

The dissertation work is devoted to the development of methods for distributed document-oriented databases that allow speeding up data reconciliation and improving collision resistance in the process of searching for inconsistent data in various network information systems, such as IoT, heterogeneous multicomputer systems, analytical systems of administrative management, financial systems, research on environmental safety and nature management, and others. Also, special software for conducting experiments was implemented. The following results were obtained as a result of the dissertation work. A method of ensuring data consistency in distributed NoSQL document-oriented databases using a transactional clock has been developed. It receives transactions from client applications and stores them in appropriate queues. Queues are processed depending on the priority of the transaction. With the highest priority, the queue is processed earlier, compared to those with a lower priority. This allows you to figure out critical data (for example, transmitting financial transactions) that should be processed first. When the queue is processed, the transaction clock merges the transactions into a resulting transaction. To do this, it uses the creation time of the transaction, which allows them to be globally ordered and merged in the order in which they were created, rather than received by the transactional clock. After receiving the resulting transaction, it is transmitted to other replicas on which writing takes place. It should be noted that when using a transactional clock, read operations take place directly from replicas, which allows to reduce the load on the host machine. The active anti-entropy method has been improved using a Bloom spectral filter and a hashing algorithm instead of a Merkle tree. Its creation is due to the fact that the classical mechanism of active anti-entropy uses many expensive hashing operations. Also, when hashing a large amount of data, the probability of collisions increases, which can affect the timely identification of inconsistencies. If this happens, the system may be in an inconsistent state because the start of the reconciliation procedure will occur later. Therefore, for distributed NoSQL document-oriented databases, it was decided to use the gossip protocol, which consists in a decentralized method of node interaction. This provides reliability, compared to a centralized approach, because the failure of one node will not affect the availability of the system. The problem with using a centralized approach is that failing a master requires some latency due to the consensus protocol used to elect a new master from existing replicas. A decentralized approach allows the information system to be available for record operations, although this makes it difficult to maintain consistency. To search for inconsistencies, a certain snapshot is used, which consists of a spectral Bloom filter and a hash value. The algorithm to produce the Bloom spectral filter was modified specifically for the developed method of active anti-entropy, which makes it possible to speed up the identification of data inconsistencies in the method of active anti-entropy. The speed of forming the spectral Bloom filter has been increased by using an algorithm based on prime numbers instead of using hash functions. The results of the experiments show that the developed algorithm has higher collision resistance comparing to single hash function and higher speed comparing to usage of a several hashing functions. Also, a collision-resistant hashing method was developed specifically for the developed active anti-entropy method. Its purpose is to reduce the number of collisions when hashing data that differs in size. This is very important for active antientropy, as it allows early mismatch detection. A special transaction clock service with its own application programming interface has been implemented for the possibility of using the developed method of ensuring consistency using a transaction clock with a distributed MongoDB database in information systems. A special service of active antientropy with its own application programming interface has been implemented for the use in information systems of the developed method of active antientropy with the MongoDB distributed database. A prototype of the financial information system was implemented, in which the distributed database consists of eight nodes. With its use, the results of the study of the developed methods of ensuring data consistency were obtained. The application programming interface was implemented using the Python3 programming language. Docker and docker-compose were used to create test environment and orchestrate the necessary components.

Research papers

Nikitin V., Krylov E., Kornaga Y., Anikin V. Combined indexing method in NoSQL databases // Adaptive Systems of Automatic Control Interdepartmental scientific and technical collection. №1(38), 2021. P. 3 – 9. DOI: https://doi.org/10.20535/1560-8956.38.2021.232948

Nikitin V., Krylov E., Kornaga Y., Anikin V. Modification of hashing algorithm to increase rate of operations in NoSQL databases // Adaptive Systems of Automatic Control Interdepartmental scientific and technical collection. № 2 (39), 2021. P. 39 – 43. DOI: https://doi.org/10.20535/1560-8956.39.2021.247395

Mukhin V., Kornaga Y., Zavgorodnii V., Fartushnyi I., Pashov R., Nikitin V., Stepanov A. Method of determining the required number of database nodes in a distributed data processing system / 2021 IEEE 3rd International Conference on Advanced Trends in Information Theory (ATIT) // IEEE. DOI: https://doi.org/10.1109/ATIT54053.2021.9678569

Nikitin V., Krylov E. Comparison of hashing methods for supporting consistency in distributed databases // Adaptive Systems of Automatic Control Interdepartmental scientific and technical collection. № 1 (40), 2022. P. 48 – 53. DOI: https://doi.org/10.20535/1560-8956.40.2022.261646

Nikitin V., Krylov E. A collision-resistant hashing algorithm for maintaining consistency in distributed nosql databases // Adaptive Systems of Automatic Control Interdepartmental scientific and technical collection. № 2 (41), 2022. P. 45 – 57. DOI: https://doi.org/10.20535/1560-8956.41.2022.271338

Крилов Є. В., Нікітін В. А. Використання транзакцiйного годинника для пришвидшення процесу узгодження даних в розподілених системах // Фахове видання категорії Б “Науковий вісник Ужгородського університету. Серія «Математика і інформатика»”. № 1 (42), 2023. С. 188 – 192. DOI: https://doi.org/10.24144/2616-7700.2023.42(1).188-192

Nikitin V., Krylov E. Primary-based Spectral Bloom filter for the ensuring consistency in distributed document-based NoSQL databases using active anti-entropy mechanism // Computer Systems and Information Technologies. №3, (2023). P. 75 – 80. DOI: https://doi.org/10.31891/csit-2023-3-9

Nikitin V., Krylov E. Active anti-entropy mechanism based on Spectral Bloom filter and PH-2 hash algorithm for reconcilation of replicas of NoSQL distributed document oriented databases // Information Technology and Society. №3 (9), 2023. P. 63 – 67. DOI: https://doi.org/10.32689/maup.it.2023.3.8

Nikitin V., Krylov E. Consistency optimization methods in distributed NoSQL databases // Інженерія програмного забезпечення і передові інформаційні технології (SoftTech-2022): матеріали тез доповідей IIІ Всеукраїнської наук.-практ. конф. молодих вчених та студентів (м. Київ, 23–25 листопада 2022 р.). Секція кафедри інформатики та програмної інженерії. – К. : КПІ ім. Ігоря Сікорського, 2022. – С. 27 – 30. URL: https://drive.google.com/file/d/1CP9EaBTT_rJAXsINbanSVGnP2jkg9FJ0/view

Белоус Р. В., Нікітін В. А. Варіанти забезпечення суворої узгодженості в NoSQL // Grail of Science, (24), c. 364 – 365. DOI: https://doi.org/10.36074/grail-of-science.17.02.2023.065

Similar theses