Plakhotnikova O. Ukrainian speech corpus: the theoretical basis of construction and practical implementation.

Українська версія

Thesis for the degree of Candidate of Sciences (CSc)

State registration number

0418U002888

Applicant for

Specialization

  • 10.02.01 - Українська мова

27-06-2018

Specialized Academic Board

Д 26.001.19

Taras Shevchenko National University of Kyiv

Essay

The dissertation investigates the problem of creating a Ukrainian speech corpus based on selected and specially recorded audio files at the Laboratory of Experimental Phonetics, Institute of Philology, Taras Shevchenko National University of Kyiv. The study is the first to present theoretical and methodological background for building a corpus of modern Ukrainian speech (using the ELAN computer program). The Corpus of the Transcribed Ukrainian Speech (CTUS) was constructed on the above-mentioned base. A part of the Corpus material (17 audio recordings with total duration time of 60 min. 28 sec.) is available on The Language Archive website (Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands). This corpus is open for enlargement and injection by developers and exploitation by users. This thesis substantiates the following structural characteristics of the CTUS according to the classifications of modern electronic text corpora: 1) research corpus; 2) multimedia corpus; 3) fragment corpus; 4) static corpus (eventually dynamic); 5) synchronic corpus; 6) monolingual corpus; 7) phonetically annotated corpus. The research justifies segmentation of speech signal into syntagms in the ELAN computer program for transcribing audio recordings of the CTUS (1813 syntagms in total). We developed principles of individualised transcriptional entries, adapted to ELAN, in order to create phonetic annotation of the CTUS audio recordings database; these principles were used in: 1) broad phonetic individualised transcription based on Cyrillic characters; 2) broad phonetic individualised transcription based on Latin characters (according to the standards of the International Phonetic Alphabet). Additional transcription symbols were used to describe positional and combinatory sound changes in speech. Annotation files of the CTUS, created using the ELAN computer program, have been adapted to search options of this software, as well as to information search using Trova tool on the website of The Language Archive. The above-mentioned search options in Corpus respond to the main needs of phonetic peculiarities research of Ukrainian speech: the search of all basic phonetic units with the use of Cyrillic and Latin characters – from an allophone to a whole syntagm – is provided.

Files

Similar theses