The thesis includes an introduction, four chapters, a list of references (106 titles), 5 appendices, 55 figures, and 33 tables. The total volume is 148 pages.
The relevance of the research lies in the demand for the creation of automatic speech recognition (ASR) systems that are resistant to interference and are relatively easy to configure. One of the ways to increase the robustness of ASR systems is to adjust the ASR systems themselves, which makes them more resistant to distortions; this direction is not sufficiently studied due to the variety of obstacles, as well as the extremely high complexity of the learning and recognition algorithms used. Therefore, due to the high complexity and cost of existing systems, it is relevant to develop reliable ASR systems, resistant to the interference of various natures, and relatively easy to debug.
The aim of the thesis is to develop new methods and to improve existing methods of training ASR systems, as well as methods for evaluating the quality and intelligibility of speech signals, which ensure an increase in the accuracy of ASR systems without significantly complicating the configuration procedure.
The object of the research is the process of training the ASR systems while taking into account the volume and nature of a priori information about the parameters of noise or reverberation interference.
The subject of the research is the influence of the volume and nature of a priori information about the parameters of noise or reverberation interference on the accuracy of the ASR system.
Objectives (tasks) of the research:
1. To provide an analytical review of modern speech recognition methods, while paying primary attention to the causes of impairment of the robustness of ASR systems to the effects of noise and reverberation, as well as promising ways to restore such robustness.
2. To establish a relationship between objective measures of intelligibility and quality of reverberation-distorted speech signals, and to identify an objective measure of quality that could be used as a measure of intelligibility in classrooms of various sizes.
3. To establish a relationship between the intelligibility of reverberation-distorted speech and such parameters of reverberation interference as reverberation time and density of early sound reflections.
4. To investigate the potential possibilities of the usage of kurtosis as a clipping degree measure of the speech signal, as well as a marker of the presence of such clipping, which is perceived by the human auditory system.
5. To obtain quantitative estimates of the degree of improvement in the accuracy of speech recognition distorted by noise of different nature and intensity, by training the ASR system on signals distorted by noise, taking into account the volume and nature of a priori information about noise interference.
6. To establish the fundamental possibility of increasing the robustness of APM systems to the effect of reverberation by training the ASR system on signals distorted by reverberation, taking into account the volume and nature of a priori information about reverberation interference.
The scientific novelty of the obtained results:
1. For the first time, for real speech signals, quantitative estimates of the degree of improvement in the accuracy of speech recognition distorted by noise of different natures and intensities were obtained by training an ASR system on noise-distorted signals.
2. For the first time, for real speech signals, quantitative estimates of the degree of improvement in the accuracy of recognition of reverberation-distorted speech were obtained by training an ASR system on reverberation-distorted signals.
3. The indirect method of speech intelligibility assessment, using a measure of signal quality in the form of Barkov spectral distortion, has been improved.
4. The conclusions regarding the dependence of speech intelligibility on the density of sound reflections and reverberation time have been refined, using probabilistic models of impulse characteristics of rooms.
5. The method of detecting the clipping effect of speech signals and objectively evaluating the quality of speech signals distorted by clipping has been improved, based on the use of kurtosis as a measure of signal distortion.
Approbation of research results. The results of the dissertation studies were discussed at 4 international conferences.
Publications. Based on the results of the research, 9 scientific publications were made (including 3 articles in specialized scientific journals of Ukraine, 1 article in a periodical scientific journal of other countries, 1 article in a periodical scientific journal included in the Scopus database, Q3), 1 patent for a utility model, 4 reports in conferences proceedings.
The new theoretical and practical research results were applied in the educational process of the Department of Acoustic and Multimedia Electronic Systems.