Analisis Determinan Karakter Siswa Menggunakan Explainable Machine Learning (SHAP) dan Klasterisasi Profil Sekolah Studi Kasus Rapor Pendidikan Provinsi Bali

Authors

  • Md. Wira Putra Dananjaya Universitas Pendidikan Nasional, Denpasar, Indonesia
  • Ngakan Nyoman Kutha Krisnawijaya Universitas Pendidikan Nasional, Denpasar, Indonesia
  • Gede Humaswara Prathama Universitas Pendidikan Nasional, Denpasar, Indonesia
  • I Gusti Ngurah Darma Paramartha Universitas Pendidikan Nasional, Denpasar, Indonesia
  • Adie Wahyudi Oktavia Gama Universitas Pendidikan Nasional, Denpasar, Indonesia

DOI:

https://doi.org/10.53863/kst.v7i02.1988

Keywords:

Educational Data Mining, Random Forest, SHAP, Student Character, Education Report

Abstract

Strengthening student character is a key performance indicator in the Merdeka Belajar curriculum, but the identification of the school environment's most influential determinants of character achievement is often assumed. This study aims to quantitatively deconstruct the relationship between school climate and student character quality in Bali Province. Using the Indonesian Education Report dataset released by the Ministry of Primary and Secondary Education (Kemendikdasmen) for the 2023-2025 period with a total of 727 data entries, this study applies the Educational Data Mining methodology with the Random Forest algorithm enhanced by the Synthetic Minority Over-sampling Technique (SMOTE) to address data inequality. The novelty of this study lies in the use of SHapley Additive exPlanations (SHAP) for model transparency and K-Means Clustering for zoning mapping. Experimental results show the model is able to predict character achievement with 77.03% accuracy. The SHAP analysis revealed the interesting finding that Climate for Diversity (influence score of 0.45) and Climate for Gender Equality (0.22) were the strongest predictors, far exceeding the influence of Climate for Security (0.13). This finding challenges the common assumption that physical security is the single most important factor. Furthermore, the clustering analysis identified three school typologies in Bali, including one "Vulnerable" cluster that scored critically on gender equality and diversity despite having adequate security scores. This study recommends shifting the focus of education policy in Bali from a physical security approach to strengthening tolerance and gender equality programs, which have been shown to have a more statistically significant impact

References

Abdollahi, A. (2023). Explainable artificial intelligence (XAI) for interpreting the contributing factors feed into the wildfire susceptibility prediction model. Science of the Total Environment, 879. https://doi.org/10.1016/j.scitotenv.2023.163004

Adhi, M. K. (2020). THE TRANSFORMATION OF BALINESE SATUA VALUES: STRENGTHENING THE CHARACTER EDUCATION OF THE ALPHA GENERATION (A Case study at Saraswati Tabanan Kindergarten, Bali). Jurnal Ilmiah Peuradeun, 8(2), 279–298. https://doi.org/10.26811/peuradeun.v8i2.420

Alam, A. (2023). The Secret Sauce of Student Success: Cracking the Code by Navigating the Path to Personalized Learning with Educational Data Mining. 2023 2nd International Conference on Smart Technologies and Systems for Next Generation Computing Icstsn 2023, https://doi.org/10.1109/ICSTSN57873.2023.10151558

Ali, R. H. (2022). Educational Data Mining For Predicting Academic Student Performance Using Active Classification. Iraqi Journal of Science, 63(9), 3954–3965. https://doi.org/10.24996/ijs.2022.63.9.27

Al-Najjar, H. A. H. (2023). A novel method using explainable artificial intelligence (XAI)-based SHapley Additive exPlanations for spatial landslide prediction using Time-Series SAR dataset. Gondwana Research, 123, 107–124. https://doi.org/10.1016/j.gr.2022.08.004

An, C. (2021). A K-means Improved CTGAN Oversampling Method for Data Imbalance Problem. IEEE International Conference on Software Quality Reliability and Security Qrs, 2021, 883–887. https://doi.org/10.1109/QRS54544.2021.00097

Arafa, A. (2022). RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification. Journal of King Saud University Computer and Information Sciences, 34(8), 5059–5074. https://doi.org/10.1016/j.jksuci.2022.06.005

Arun, D. K. (2021). Student academic performance prediction using educational data mining. 2021 International Conference on Computer Communication and Informatics, ICCCI 2021, 2021. https://doi.org/10.1109/ICCCI50826.2021.9457021

Badhon, B. (2025). A Multi-Module Explainable Artificial Intelligence Framework for Project Risk Management: Enhancing Transparency in Decision-making. Engineering Applications of Artificial Intelligence, 148. https://doi.org/10.1016/j.engappai.2025.110427

Chang, C. C. (2015). Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience, 4(1). https://doi.org/10.1186/s13742-015-0047-8

Compton, T. (2025). Beyond the Black Box: Integrating Lexical and Semantic Methods in Quantitative Discourse Analysis with BERTopic. arXiv Preprint arXiv:2508.19099, https://arxiv.org/abs/2508.19099

Data Rapor Pendidikan Indonesia. (2025). [Dataset].

Doz, D. (2024). Factors affecting students’ performance on national assessments of mathematics in Italy: A random forest approach. Assessment in Education: Principles, Policy and Practice, 31(5), 325–352. https://doi.org/10.1080/0969594X.2025.2457687

Fergusson, L. (2022). Consciousness-based education in Bali: A second- and third-person embedded multiple-case study of Negeri Bali Mandara. Asia Pacific Journal of Education, 42, 88–104. https://doi.org/10.1080/02188791.2021.1898932

Gebreyesus, Y. (2023). Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP). Future Internet, 15(3). https://doi.org/10.3390/fi15030088

Hamilton, R. I. (2024). Using SHAP Values and Machine Learning to Understand Trends in the Transient Stability Limit. IEEE Transactions on Power Systems, 39(1), 1384–1397. https://doi.org/10.1109/TPWRS.2023.3248941

Musa, A. B. (2024). Understanding Student Performance in Foundation Year: Insights from Logistic Regression, Naïve Bayes, and Random Forest Models. International Journal of Information and Education Technology, 14(12), 1716–1723. https://doi.org/10.18178/ijiet.2024.14.12.2202

Nitiasih, P. K. (2025). Future development of peace education in Bali: Lessons from a critical analysis of the peace education curricula of Hiroshima. Edelweiss Applied Science and Technology, 9(2), 37–50. https://doi.org/10.55214/25768484.v9i2.4427

Song, Z. (2023). Prediction for CET-4 Based on Random Forest. Procedia Computer Science, 228, 429–437. https://doi.org/10.1016/j.procs.2023.11.049

Widana, I. W. (2023). The special education teachers’ ability to develop an integrated learning evaluation of Pancasila student profiles based on local wisdom for special needs students in Indonesia. Kasetsart Journal of Social Sciences, 44(2), 527–536. https://doi.org/10.34044/j.kjss.2023.44.2.23

Yang, X. (2022). Research on Forecasting of Student Grade Based on Adaptive K-Means and Deep Neural Network. Wireless Communications and Mobile Computing, 2022. https://doi.org/10.1155/2022/5454158

Zeng, G. (2020). On the confusion matrix in credit scoring and its analytical properties. Communications in Statistics Theory and Methods, 49(9), 2080–2093. https://doi.org/10.1080/03610926.2019.1568485

Published

2025-12-17

How to Cite

Dananjaya, M. W. P., Krisnawijaya, N. N. K., Prathama, G. H., Paramartha, I. G. N. D., & Gama, A. W. O. (2025). Analisis Determinan Karakter Siswa Menggunakan Explainable Machine Learning (SHAP) dan Klasterisasi Profil Sekolah Studi Kasus Rapor Pendidikan Provinsi Bali. Jurnal Kridatama Sains Dan Teknologi, 7(02), 936–948. https://doi.org/10.53863/kst.v7i02.1988

Similar Articles

<< < 5 6 7 8 9 10 11 12 13 14 > >> 

You may also start an advanced similarity search for this article.