The dissertation presents the results of research conducted by the applicant, which focuses on the development and implementation of information technology for educational analytics, which ensures the improvement of the validity of management decisions through the application of methods of intelligent analysis of educational data.
In the context of the digital transformation of education and the rapid growth of educational data volumes (academic performance results, digital activity, attendance, interactions in virtual learning environments, etc.), the application of intelligent information technologies becomes critically important for improving the effectiveness of educational process management. Educational analytics today acts as a powerful data-driven decision-making tool.
There is an urgent need for an information technology that integrates machine learning methods, predictive modeling, model interpretability, and data visualization to effectively identify risks of academic failure, support personalized learning trajectories, and generate visual analytical reports for various stakeholders.
The aim of the dissertation research is to develop and substantiate an educational analytics information technology that ensures effective identification of patterns in educational process data based on data mining methods, with the purpose of improving the quality of managerial decisions and forecasting learning outcomes of students.
The current state of development of Learning Analytics and Educational Data Mining in Ukraine and internationally is analyzed, their conceptual and terminological frameworks are refined, and the conceptual differences between these domains are identified. Modern tasks and methods of educational analytics are summarized, including classification, clustering, regression-based prediction, analysis of behavioral patterns, and social interaction analysis. A scientific landscape analysis based on publications indexed in the Scopus and Dimensions databases is performed using the VOSviewer platform, which enables the identification of key research clusters and development trends. The necessity of applying data mining methods in education is substantiated, and principles and functional requirements for analytical systems in higher education institutions are formulated.
Educational data sources are systematized, and the structure of the information system is developed, covering subsystems for data collection, integration, processing, storage, and analysis. Methods for data integration and preprocessing from the Unified State Electronic Database on Education, the Moodle Learning Management System, and the «Dean’s Office» information system are proposed, enabling the formation of a multidimensional student profile. Algorithms for data cleaning, normalization, and analytical dataset formation are developed. Based on comparative analysis, the selection of data mining models (decision trees, Random Forest, LightGBM, logistic regression) for academic performance prediction tasks, as well as clustering and regression methods for analyzing educational processes, is justified.
The architecture of the educational analytics system is proposed, including modules for data collection, feature engineering, modeling, result evaluation, and visualization. Attributes characterizing students’ demographic, educational, and behavioral characteristics are formalized, along with algorithmic procedures for their transformation into a structured format suitable for intelligent analysis. Mathematical models for assessing and predicting students’ academic performance using interpretable machine learning models are implemented.
A scenario-based approach to academic performance prediction («Initial», «Intermediate», and «Final» scenarios) is implemented, allowing the evaluation of predictive effectiveness at different stages of the learning process. Experimental results confirm the high predictive capability of the developed models; in particular, for the final scenario, Accuracy values of up to 0.91 and Balanced Accuracy in the range of 0.77–0.82 are achieved, demonstrating the stability of the Random Forest and LightGBM models. The statistical reliability and validity of the obtained results are proven. The practical value of the technology for early identification of at-risk students, monitoring academic performance dynamics, and supporting managerial decision-making is demonstrated. Special attention is given to data visualization, which within the proposed technology serves as a key stage of the analytical process and is implemented using interactive Power BI dashboards.