Belas A. Models and methods of intellectual data analysis for forecasting nonlinear nonstationary processes

Українська версія

Thesis for the degree of Doctor of Philosophy (PhD)

State registration number

0823U100148

Applicant for

Specialization

  • 122 - Комп’ютерні науки

09-03-2023

Specialized Academic Board

ДФ 26.002.11

National Technscal University of Ukraine "Kiev Polytechnic Institute".

Essay

In the dissertation study, the problem of increasing the adequacy of mathematical models of nonlinear nonstationary financial and economic processes and the accuracy of corresponding forecasts is solved by applying modern methods of intellectual data analysis to statistical data presented in the form of time series. In the work, the classes of nonlinear nonstationary processes used for modeling and forecasting was selected, considered, and described, as well as a description of mathematical models and approaches that are used to describe their dynamics based on statistical data in the form of time series. Types of processes are selected for research, such as integrated, heteroskedastic processes, Levy processes, processes with a stochastic trend, and logistic-type processes. In the work, a class of financial and economic processes was selected, and the problem of their forecasting was formulated. However, the developed methodology can be applied in other systems (technical, medical, etc.) with the corresponding defined dynamics. For forecasting, the approach of predictive analytics using methods of machine learning is considered. The methodological basis of the work is the modern analytical methodology of SEMMA. Collected statistical data for experiments based on sales of the Walmart store, sales of antidiabetic drugs in Australia, and sales of fuel in the USA to apply the developed methodology to real statistical data. For exploratory data analysis, statistical and graphical analysis approaches, as well as statistical tests for determining process nonlinearity and nonstationarity were proposed: White's test for nonlinearity and KPSS test for nonstationarity were chosen as the main ones. Methods for the detection and processing of anomalies and missing values were considered. To extract the noise component from the time series, the Kalman filter and the exponential smoothing method were considered. Digital filtering approaches should be used carefully, not always as a necessary stage of the model building process, but as a possible option, necessarily checking at the end of the modeling process the quality of the obtained forecasts with and without the use of preliminary filtering. The thesis proposes a method of building models of nonlinear processes, which is distinguished by the use of separate procedures for optimizing the structure of linear and nonlinear components of the model with their subsequent additive combination into a single model, which ensures an increase in the adequacy of the model and the accuracy of forecasts in general. Different methods of combining estimates of forecasts of different models were considered, and the boosting method was chosen as the main one for use in the work. An approach for the selection and modeling of the linear component of the process based on regression models was described, as well as an approach for using the obtained AR model to form forecast estimates, criterion for assessing the adequacy of the obtained models, such as BIC, was considered and selected. A criterion base based on the MSE, MAE, and RMSLE was built to assess the quality of forecasts. To describe the nonlinear component, an autoregressive approach based on ARIMA, with an algorithm for automatic model building, as well as approaches based on recurrent (RNN) and convolutional (CNN) neural networks were considered, and their advantages and disadvantages were analyzed. For neural networks, approaches to optimization of model parameters were analyzed, and the Adam algorithm was proposed as the most effective. Different approaches for multi-step forecasting using neural networks were considered, and the approach using multi-output networks was chosen as the main one. The need for adaptive construction of models for forecasting nonlinear nonstationary processes, the main principles of adaptation of such models were considered. Appropriate approaches to adaptation for both linear models and neural networks were considered. To adapt linear models, the parameter estimation method based on the Monte Carlo method for Markov chains has been improved. A comparative analysis of the obtained forecasts with the results of the use of known approaches and methods was carried out. According to the results of all practical experiments, it was shown that the use of the developed AR-CNN approach allows for obtaining adequate models and accurate forecasts with relative ease of construction and small computational losses.

Files

Similar theses