Zagvazdin O. Computer-assisted documentation of speech information based on digital processing, segmentation and analysis of speech signals.

Українська версія

Thesis for the degree of Candidate of Sciences (CSc)

State registration number

0413U000611

Applicant for

Specialization

  • 01.05.02 - Математичне моделювання та обчислювальні методи

21-02-2013

Specialized Academic Board

Д 26.194.02

V.M. Glushkov Institute of Cybernetics of National Academy of Sciences of Ukraine

Essay

The dissertation is devoted to creating the means to automate the documentation of speech signals using the mathematical representation of speech signals and methods of segmentation and digital processing thereof. A new approach to voice activity and pause detection is proposed, which is based on the use of adaptive noise threshold and which allows to determine pauses in speech signals with high degree of accuracy, including the signals with the high level of noise and nonstationary noise. An approach to speaker change detection based on Bayesian Information Criterion to compare the speaker models separated by a pause is suggested. The approach involves building the Gaussian Mixture Models for the fragments of the signal before and after the pause determined by the adaptive threshold algorithm. Noise filtration methods for speech signals have been improved due to the use of the noise information obtained from the parts of the signals without the voice activity, which in turn are determined using the adaptive threshold pause detection method. An improved approach to change the playback rate of the speech signal preserving the main acoustical characteristics of the signal using the PSOLA approach has also been suggested. A computer aided distributed transcription information system has also been developed using the suggested approaches and methods, which has demonstrated its usefulness in productivity gain for the groups of distributed transcription operators. Because of the digital signal processing methods and the segmentation methods implemented within the system and its ergonomically designed user interface, the productivity gain compared to manual transcription on a computer without the use of any special tools, has shown to be quite significant.

Files

Similar theses