The Ph.D. thesis is devoted to solving the current scientific and technical problem of developing real-time methods to search and recognize objects in video images on a mobile platform.
The introduction substantiates the relevance of the topic of dissertation research, formulates the purpose of the study and the scientific and technical tasks necessary to achieve it, shows the connection of the study with scientific programs and topics, provides the scientific novelty of the results obtained, their practical value and the personal contribution of the applicant. Information about the work results' testing and the author's personal contribution and publication are presented.
The first section analyzes existing approaches to integrating search and object recognition systems, namely, varieties and architectural features of recognition models and algorithms for tracking an arbitrary class of objects. The analysis results showed that integrating such systems requires applying a particular set of filters, specialized activation functions, and object-tracking algorithms. During the analysis, the Yolo family of convolutional neural network models was chosen as the basic neural network, as the most promising in the field of object recognition. In addition, an analysis of existing mobile systems for searching and recognizing objects in real time was carried out. It was determined that a significant problem of such systems is the lack of an effective platform for automatic training and integrating models into the mobile platform. Also, one of the problems is increasing the efficiency of such systems since they mostly have limited hardware capabilities. As a conclusion to the first chapter, a set of methods and tools for solving the problem of search and recognition in video images on a mobile platform in real time was formed, and the task of the dissertation research was formulated.
In the second section, metrics for evaluating the results of object recognition and tracking were proposed. The general structure of the Yolov4 convolutional neural network model for the mobile platform is formed and described. A modified method of recognition object clustering based on k-means++ was used to create recognition anchors. Methods of filtering recognition results have been developed. Three object tracking algorithms have been developed: algorithmic, algorithmic with reinforcement learning, and an operational tracking algorithm based on the IOU minimization filter, using the Hungarian algorithm as a convergence function. Methods of memoization of tracking objects have been developed. Finally, a method of quantizing the output weight coefficients of a convolutional neural network by affine transformations is proposed.
In the third chapter, according to the proposed methods and tools, algorithms for training the convolutional neural network model, automatic annotation of input images, and conversion of the model into CoreML format for the mobile platform are developed. According to the selected means of scaling and containerization of Docker, the structure of the system of autonomous annotation, training, and conversion of such a model was built. From this structure, Docker containers can be extracted for each module/service, which will offer scalable hardware capabilities of the operating system. The interdependence between each element of such a system is described. A means of integrating a built-in module for tracking moving objects on the iOS mobile platform is proposed. The integration takes place with the use of the JavaScriptCore library for data transfer between the system and the module..
The fourth chapter presents the developed system architecture of the iOS mobile operating system and the Ubuntu operating system and justifies the choice of components of such systems. The results of system analysis and testing are presented. The obtained research results confirmed the effectiveness of search and recognition algorithms in real time.
Keywords: object recognition, object tracking algorithm, results filtering, scalable environment, activation functions, video images, mobile platform, convolutional neural network, real-time map, object search time, object recognition time, scalable Docker system, Yolo cluster of convolutional neural network models.