This dissertation proposes scientifically grounded methods and software tools aimed at improving query performance in high-load distributed systems. To achieve this goal, the research focuses on optimizing network traffic, enhancing data consistency mechanisms, and implementing efficient resource rebalancing. Special attention is paid to optimizing the behavior of distributed systems that utilize the Raft consensus algorithm, as these approaches form the foundation for reliable data updates and synchronization across multiple nodes. The optimization of Raft and its related processes for data transmission and consistency allows for a significant reduction in response time and network load, which is critical for the stable operation of high-load applications.
For the first time, a method for minimizing network traffic volume in the Raft Consensus Algorithm for distributed databases has been developed. This method combines principles inherent to both Raft and leaderless replication, based on the preliminary exchange of metadata between nodes and subsequent result caching. The essence of the method lies in the initial exchange of metadata containing information on cardinality and data vectors before transmitting primary data. This approach reduces the volume of data transmitted over the network, as nodes can synchronize only the changes that require updates. Following this metadata exchange, local caching of the results occurs, minimizing the amount of transmitted data and consequently reducing network load while enhancing the efficiency of the Raft algorithm in distributed databases.
The method for query optimization in distributed databases has been improved by enhancing data rebalancing using genetic algorithms with elitism and adaptive crossover. This approach enables more efficient data distribution across system nodes, reducing query execution time. The inclusion of elitism ensures that the best solutions are preserved at each stage of the algorithm, while adaptive crossover increases solution diversity and accelerates convergence toward the optimal solution. As a result, the modified rebalancing method improves query performance in distributed databases, especially under high-load conditions.
Additionally, a data consistency method for distributed databases has been enhanced using the Levenshtein-based approach. Unlike existing methods, this approach minimizes network traffic during the data consistency process, particularly in scenarios involving frequent and minor changes. The method employs an advanced version of the Levenshtein algorithm to accurately identify minimal differences between data versions, allowing for the transmission of only the changes rather than full data copies. Consequently, the volume of transmitted data is significantly reduced, which is particularly important in environments with frequent updates and modifications of small data volumes. This ensures efficient replica synchronization while maintaining high system performance.
A specialized software system was developed to investigate the scientific results obtained. The software represents an electronic online journal for students, teachers, parents, and school administration. This product enables teachers to assign grades, create and manage homework, and generate performance reports for students. Students can access a personal dashboard displaying their academic performance, while parents can obtain information related to their child's grades and assignments. The school administration can generate various reports and statistics on student performance and other metrics.
To ensure efficiency, reliability, and scalability, the application was built based on Raft architecture, which guarantees data consistency in the distributed system. The application is implemented using a modern technology stack, including Docker for containerization and deployment management, Laravel for backend development following MVC principles, and Vue.js for building a dynamic and responsive user interface. These technologies allowed for the creation of a flexible, fault-tolerant, and easily scalable system that effectively supports all necessary functions and enables the research and analysis of scientific results within the context of distributed systems.