Yusyn Y. Methods and software tools for metamorphic testing of software systems for automatic clustering of natural language text data

Українська версія

Thesis for the degree of Doctor of Philosophy (PhD)

State registration number

0823U100106

Applicant for

Specialization

  • 121 - Інженерія програмного забезпечення

20-02-2023

Specialized Academic Board

ДФ 26.002.06

National Technscal University of Ukraine "Kiev Polytechnic Institute".

Essay

Thesis is devoted to solving the scientific problem of improving the theoretical (methods) and practical (software) foundations of testing software systems for automatic clustering of natural language text data. The dissertation provides several new scientific results, in particular, for the first time, the method of metamorphic testing of software systems for automatic clustering of natural language text data MEETC was developed, which, unlike existing methods, is applicable to software implementations of any deterministic methods of text clustering without the input parameter of the number of clusters and ensures the effectiveness of mutation testing according to the mutation score in 81-100%. For the first time, the method of metamorphic testing of software systems for automatic clustering of natural language text data MEETC-k was developed, which, unlike existing methods, is applicable to software implementations of any deterministic text clustering methods with the input parameter of the number of clusters and ensures the effectiveness of mutation testing according to the mutation score in 86-100%. For the first time, the basic software architecture for metamorphic testing based on the use of a serverless computing model is proposed, which allows to simplify the process of developing software tools for metamorphic testing and to achieve an acceleration of the testing of text clustering software systems by 34-50%, compared to existing architectures. For the first time, the architectural software design pattern "Metamorphic Testing-as-a-Service" was developed, the characteristic feature of which is the decomposition of the metamorphic relation into separate components and the code generation of the bodies of metamorphic relations and serverless functions. The use of the proposed pattern makes it possible to reduce the duplication and coupling of the software code when developing software tools for serverless metamorphic testing and improves the standard metrics of code quality: the number of lines of executable code and the class coupling as a whole, the cyclomatic complexity and the maintainability index for individual components. For the first time, the family of methods for generating corpora of text data CorDeGen has been developed, the characteristic feature of which is the determinism and ease of a priori description of the structure of the received corpus, and which, unlike the existing corpus generation methods, accept as input the minimum possible number of parameters, thus simplifying the description, storage and reproduction of results: 1. the basic CorDeGen method – unlike other methods of the family, provides the highest speed of corpora generation due to the use of a more complex a priori description of the structure of the received corpus, which is caused by the removal of a part of the generated terms by methods of pre-processing of natural language text data; 2. the CorDeGen+ method – unlike other methods of the family, it avoids removing part of the generated terms from the corpus, which facilitates the a priori description of the structure of the resulting corpus; 3. the SemCorDeGen method – unlike other methods of the family, generates corpora of texts that can be used together with natural language text data processing methods based on the application of semantic models due to the expansion of the number of input parameters. The practical significance of the obtained results lies in the simplification of the process of developing software tools for metamorphic testing in general (due to the use of the basic serverless architecture and the "MTaaS" design pattern) and, in particular, for testing software systems for automatic clustering of natural language text data (due to the use of the family of text corpora generation methods CorDeGen). In addition, the CorDeGen family of methods can also be used in practice in scientific research, to increase their reproducibility. The developed software tools are published in open access, for some of them publicly available NuGet packages have been created and published, which can be connected and used by third-party developers.

Files

Similar theses