Analisis Determinan Karakter Siswa Menggunakan Explainable Machine Learning (SHAP) dan Klasterisasi Profil Sekolah Studi Kasus Rapor Pendidikan Provinsi Bali
DOI:
https://doi.org/10.53863/kst.v7i02.1988Keywords:
Educational Data Mining, Random Forest, SHAP, Student Character, Education ReportAbstract
Strengthening student character is a key performance indicator in the Merdeka Belajar curriculum, but the identification of the school environment's most influential determinants of character achievement is often assumed. This study aims to quantitatively deconstruct the relationship between school climate and student character quality in Bali Province. Using the Indonesian Education Report dataset released by the Ministry of Primary and Secondary Education (Kemendikdasmen) for the 2023-2025 period with a total of 727 data entries, this study applies the Educational Data Mining methodology with the Random Forest algorithm enhanced by the Synthetic Minority Over-sampling Technique (SMOTE) to address data inequality. The novelty of this study lies in the use of SHapley Additive exPlanations (SHAP) for model transparency and K-Means Clustering for zoning mapping. Experimental results show the model is able to predict character achievement with 77.03% accuracy. The SHAP analysis revealed the interesting finding that Climate for Diversity (influence score of 0.45) and Climate for Gender Equality (0.22) were the strongest predictors, far exceeding the influence of Climate for Security (0.13). This finding challenges the common assumption that physical security is the single most important factor. Furthermore, the clustering analysis identified three school typologies in Bali, including one "Vulnerable" cluster that scored critically on gender equality and diversity despite having adequate security scores. This study recommends shifting the focus of education policy in Bali from a physical security approach to strengthening tolerance and gender equality programs, which have been shown to have a more statistically significant impact
References
Abdollahi, A. (2023). Explainable artificial intelligence (XAI) for interpreting the contributing factors feed into the wildfire susceptibility prediction model. Science of the Total Environment, 879. https://doi.org/10.1016/j.scitotenv.2023.163004
Adhi, M. K. (2020). THE TRANSFORMATION OF BALINESE SATUA VALUES: STRENGTHENING THE CHARACTER EDUCATION OF THE ALPHA GENERATION (A Case study at Saraswati Tabanan Kindergarten, Bali). Jurnal Ilmiah Peuradeun, 8(2), 279–298. https://doi.org/10.26811/peuradeun.v8i2.420
Alam, A. (2023). The Secret Sauce of Student Success: Cracking the Code by Navigating the Path to Personalized Learning with Educational Data Mining. 2023 2nd International Conference on Smart Technologies and Systems for Next Generation Computing Icstsn 2023, https://doi.org/10.1109/ICSTSN57873.2023.10151558
Ali, R. H. (2022). Educational Data Mining For Predicting Academic Student Performance Using Active Classification. Iraqi Journal of Science, 63(9), 3954–3965. https://doi.org/10.24996/ijs.2022.63.9.27
Al-Najjar, H. A. H. (2023). A novel method using explainable artificial intelligence (XAI)-based SHapley Additive exPlanations for spatial landslide prediction using Time-Series SAR dataset. Gondwana Research, 123, 107–124. https://doi.org/10.1016/j.gr.2022.08.004
An, C. (2021). A K-means Improved CTGAN Oversampling Method for Data Imbalance Problem. IEEE International Conference on Software Quality Reliability and Security Qrs, 2021, 883–887. https://doi.org/10.1109/QRS54544.2021.00097
Arafa, A. (2022). RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification. Journal of King Saud University Computer and Information Sciences, 34(8), 5059–5074. https://doi.org/10.1016/j.jksuci.2022.06.005
Arun, D. K. (2021). Student academic performance prediction using educational data mining. 2021 International Conference on Computer Communication and Informatics, ICCCI 2021, 2021. https://doi.org/10.1109/ICCCI50826.2021.9457021
Badhon, B. (2025). A Multi-Module Explainable Artificial Intelligence Framework for Project Risk Management: Enhancing Transparency in Decision-making. Engineering Applications of Artificial Intelligence, 148. https://doi.org/10.1016/j.engappai.2025.110427
Chang, C. C. (2015). Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience, 4(1). https://doi.org/10.1186/s13742-015-0047-8
Compton, T. (2025). Beyond the Black Box: Integrating Lexical and Semantic Methods in Quantitative Discourse Analysis with BERTopic. arXiv Preprint arXiv:2508.19099, https://arxiv.org/abs/2508.19099
Data Rapor Pendidikan Indonesia. (2025). [Dataset].
Doz, D. (2024). Factors affecting students’ performance on national assessments of mathematics in Italy: A random forest approach. Assessment in Education: Principles, Policy and Practice, 31(5), 325–352. https://doi.org/10.1080/0969594X.2025.2457687
Fergusson, L. (2022). Consciousness-based education in Bali: A second- and third-person embedded multiple-case study of Negeri Bali Mandara. Asia Pacific Journal of Education, 42, 88–104. https://doi.org/10.1080/02188791.2021.1898932
Gebreyesus, Y. (2023). Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP). Future Internet, 15(3). https://doi.org/10.3390/fi15030088
Hamilton, R. I. (2024). Using SHAP Values and Machine Learning to Understand Trends in the Transient Stability Limit. IEEE Transactions on Power Systems, 39(1), 1384–1397. https://doi.org/10.1109/TPWRS.2023.3248941
Musa, A. B. (2024). Understanding Student Performance in Foundation Year: Insights from Logistic Regression, Naïve Bayes, and Random Forest Models. International Journal of Information and Education Technology, 14(12), 1716–1723. https://doi.org/10.18178/ijiet.2024.14.12.2202
Nitiasih, P. K. (2025). Future development of peace education in Bali: Lessons from a critical analysis of the peace education curricula of Hiroshima. Edelweiss Applied Science and Technology, 9(2), 37–50. https://doi.org/10.55214/25768484.v9i2.4427
Song, Z. (2023). Prediction for CET-4 Based on Random Forest. Procedia Computer Science, 228, 429–437. https://doi.org/10.1016/j.procs.2023.11.049
Widana, I. W. (2023). The special education teachers’ ability to develop an integrated learning evaluation of Pancasila student profiles based on local wisdom for special needs students in Indonesia. Kasetsart Journal of Social Sciences, 44(2), 527–536. https://doi.org/10.34044/j.kjss.2023.44.2.23
Yang, X. (2022). Research on Forecasting of Student Grade Based on Adaptive K-Means and Deep Neural Network. Wireless Communications and Mobile Computing, 2022. https://doi.org/10.1155/2022/5454158
Zeng, G. (2020). On the confusion matrix in credit scoring and its analytical properties. Communications in Statistics Theory and Methods, 49(9), 2080–2093. https://doi.org/10.1080/03610926.2019.1568485
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Md. Wira Putra Dananjaya, Ngakan Nyoman Kutha Krisnawijaya, Gede Humaswara Prathama, I Gusti Ngurah Darma Paramartha, Adie Wahyudi Oktavia Gama

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal
















