Petrov O. Models, methods and tools for distributed data warehouses optimization

Українська версія

Thesis for the degree of Candidate of Sciences (CSc)

State registration number

0409U005036

Thesis Registration Form

0409U005036.pdf

Applicant for

Petrov Olexandr Valeriyovich

Specialization

05.13.06 - Інформаційні технології

Date of defense

16-10-2009

Specialized Academic Board

К11.051.08

Essay

Candidate of technical sciences' thesis on specialty 05.13.06 - Information technologies. Donetsk national technical university, 2009. In this dissertation a new scientific and practical problem of optimization of a set of materialized view, and also of optimization of data allocation in a distributed data warehouse is being observed. The thesis is devoted to development of new models and methods of distributed data warehouses (DDW) optimization. Structural analysis of distributed data warehouse has been conducted and on the basis of this analysis the main components of its physical and logical architecture have been revealed. Then using object-oriented approach object models of the distributed data warehouse's typical components were developed. For these models' development UML language was used. A new approach to the problem of optimization of distributed data warehouses has been proposed. This approach is based on a simultaneous use the object-oriented model of distributed data warehouse that had been developed before and a modified genetic algorithm. The modified genetic algorithm is used to obtain new schemas of data allocation in a distributed data warehouse (by manipulations with schemas of data allocation that are represented by doubled multichromosomes with genetic operators of recombination, crossover and mutation). In the same time object model of distributed data warehouse receives schemas of data allocation in DDW and then it is being used to calculate values of criterion of efficiency of distributed data warehouse. These values then are sent to the genetic algorithm which uses them as values of fitness-function. The approach proposed allows increasing distributed data warehouse's efficiency by decreasing of the average time of user select queries processing. A new modification of genetic algorithm for the problem of distributed data warehose optimization was developed. For encoding schemas of data distribution in distributed data warehouse for the first time doubled multichromosomes were used. New versions of genetic operators of recombination, crossover and mutation were developed. This modification of genetic algorithm allows obtaining suboptimal solutions of the problem of distributed data warehouse optimization. Also in this modification not only data allocation between the nodes of distributed data warehouse is being optimized but the set of materialized views is being optimized. Basing on the developed object model and the modified genetic algorithm a new tool for distributed data warehouse modeling and optimization was developed. This tool consists of three subsystems: subsystem of modeling, analysis and optimization of distributed data warehouse. The tool that has been developed can be used at the stage of distributed data warehouse development as well as it can be used when a distributed data warehouse is already implemented and functioning. As an object of experiments the distributed data warehouse of Donetsk regional office of Prominvestbank was selected. Real data of functioning of this DDW were obtained and used to test model adequacy. Then a series of computational experiments of modeling and optimization of distributed data warehouse was made. At the beginning an influence of data allocation and technical characteristics of DDW to its efficiency was researches. These experiments showed that doubling of channels' capacity leads to a slight increasing of DDW efficiency. Also it has been researched that creation of some new materialized views and allocation them on the nodes of most use leads to increasing of DDW efficiency as well. But this increase isn't enough to solve the distributed data warehouse optimization problem since the solutions obtained are far from optimal anyhow. Basing on the experiments' results rational parameters of the modified genetic algorithm were determined (such as probabilities genetic operators, number of generation and number of individuals in the population). Then these parameters were used for genetic algorithm that was used for data warehouse optimization. The suboptimal solution obtained with use of genetic algorithm was compared with optimal solution that was obtained used brute force method and the difference between these solutions was just 4,37%. This means that the genetic algorithm developed allows to receive suboptimal solutions that are very close to the global optimum. Also some recommendations for distributed data warehouse efficiency increasing were developed. Keywords: distributed data warehouse, fact table, dimension table, materialized view, fragmentation, replication, object-oriented modeling, genetic algorithm, crossover, mutation, and recombination.

Thesis supervisor

Lazdyn Sergiy Volodymirovich

Official opponents

Левикін Віктор Макарович
Ульшин Віталій Олександрович

Files

aref.doc

Диссертация.pdf

Similar theses

0424U000107

Oleksandra V. Kovyrova

Models and instrumental tools of express diagnostics for using in the biology and medicine

0524U000111

Tetiana A. Honcharenko

Methodological foundations of the formation of a unified information environment for the automation of object-spatial systems in construction projects

0524U000108

Ihor M. Liakh

Methodological foundations of information technology for gene expression data processing and its application in the field of bioinformatics

0524U000074

Andrii Shyshatskyi

Intelligent methods of managing interference protection of radio communication systems under conditions of destabilizing influences

0524U000069

Gnatchuk Elizaveta Gennadievna

Theoretical and applied principles of information technology for supporting medical decision-making considering the civil law grounds