Evaluasi Model Machine learning untuk Prediksi Keparahan Kanker Berdasarkan Data Real-world Global
DOI:
https://doi.org/10.53863/kst.v7i02.1940Keywords:
cancer, machine learning, Logistic Regression, K-Nearest Neighbors, Support Vector MachineAbstract
Cancer is one of the leading causes of death worldwide and places a significant burden on healthcare systems. Information on cancer severity is crucial for prioritizing treatment and resource planning. This study aims to develop and compare machine learning-based cancer severity classification models using global cancer patient data from 2015–2024. The dataset comprises 50,000 patients with various demographic, lifestyle, environmental, and clinical attributes, as well as severity scores (Target Severity Score). The dataset used in this study was obtained from the open data platform Kaggle (www.kaggle.com), which contains global cancer patient data from 2015 to 2024. The severity score is converted into a binary variable with two classes: low and high severity. The research steps include data preprocessing (cleaning, categorical transformation of variables with one-hot encoding, standardization), data division into training and testing data with a stratified 80:20 ratio, and the development of three classification models: Logistic Regression, K-Nearest Neighbors (K-NN), and Support Vector Machine (SVM) with RBF kernel. Model performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrix, and validated with 5-fold cross-validation. Experimental results showed that Logistic regression achieved 99.82% accuracy, 99.86% precision, 99.78% recall, and 99.82% F1-score, with very small classification errors. SVM achieved 98.22% accuracy with also high performance, while K-NN only achieved an accuracy of around 79.42%. Cross-validation results confirmed that Logistic regression had the highest average accuracy and the most stability. Thus, Logistic regression is recommended as the primary model for predicting cancer severity in this dataset and has the potential for further development as a component of a clinical decision support system
References
Adiningrum, N. T. R., Rianti, R., & Priyanto, C. (2023). Rancang bangun aplikasi prediksi kanker payudara dengan pendekatan machine learning. Jurnal Informatika dan Teknik Elektro Terapan, 11(3s1). https://doi.org/10.23960/jitet.v11i3s1.3351
Cahyana, C. W., & Nurlayli, A. (2023). Analisis performa logistic regression, naïve Bayes, dan random forest sebagai algoritma pendeteksi kanker payudara. INSERT: Information System and Emerging Technology Journal, 4(1), 51–64. https://doi.org/10.23887/insert.v4i1.62362
Chazar, C., & Widhiaputra, B. E. (2020). Machine learning diagnosis kanker payudara menggunakan algoritma Support Vector Machine. INFORMASI (Jurnal Informatika dan Sistem Informasi), 12(1), 67–80. https://doi.org/10.37424/informasi.v12i1.48
Desiani, A., Zayanti, D. A., Ramayanti, I., Ramadhan, F. F., & Giovillando. (2025). Perbandingan algoritma Support Vector Machine (SVM) dan Logistic regression dalam klasifikasi kanker payudara. Jurnal Kecerdasan Buatan dan Teknologi Informasi, 4(1), 33–42. https://doi.org/10.69916/jkbti.v4i1.191
Hero, S. K. (2021). Faktor resiko kanker payudara. Jurnal Medika Hutama, 3(1), 1533–1537.
Juliani, D. (2024). Implementasi machine learning untuk klasifikasi penyakit kanker paru menggunakan metode naïve Bayes dengan tambahan fitur chatbot. Jurnal Ilmu Pengetahuan dan Teknologi (IPTEK), 8(2). https://doi.org/10.31543/jii.v8i2.351
Kusumawaty, J., Noviati, E., Sukmawati, I., Srinayanti, Y., & Rahayu, Y. (2021). Efektivitas edukasi SADARI (pemeriksaan payudara sendiri) untuk deteksi dini kanker payudara. ABDIMAS: Jurnal Pengabdian Masyarakat, 4(1), 496–501.
Maulani, R. N., & Fatah, Z. (2025). Klasifikasi data kanker payudara menggunakan algoritma Decision Tree berbasis RapidMiner. JAMASTIKA: Jurnal Mahasiswa Teknik Informatika, 4(2). https://doi.org/10.35473/jamastika.v4i2.4504
Marfianti, E. (2021). Peningkatan pengetahuan kanker payudara dan keterampilan periksa payudara sendiri (SADARI) untuk deteksi dini kanker payudara di Semutan Jatimulyo Dlingo. Jurnal Abdimas Madani dan Lestari (JAMALI), 3(1), 25–31. https://doi.org/10.20885/jamali.vol3.iss1.art4
Mubarog, I., Setyanto, A., & Sismoro, H. (2019). Sistem klasifikasi pada penyakit breast cancer dengan menggunakan metode naïve Bayes. Creative Information Technology Journal, 6(2), 109–118.
Nurnawati, E. K. (2022). Penerapan algoritma Decision Tree untuk memprediksi kanker payudara menggunakan data mining dan machine learning. Jurnal Dinamika Informatika, 11(2), 103–112.
Oktafiani, R., Hermawan, A., & Avianto, D. (2023). Pengaruh komposisi split data terhadap performa klasifikasi penyakit kanker payudara menggunakan algoritma machine learning. Jurnal Sains dan Informasi, 9(1), 19–28. https://doi.org/10.34128/jsi.v9i1.622
Septiany, E. S., Handayani, H. H., Al Mudzakir, T., & Masruriyah, A. F. N. (2024). Optimasi metode Support Vector Machine menggunakan seleksi fitur recursive feature elimination dan forward selection untuk klasifikasi kanker payudara. TIN: Terapan Informatika Nusantara, 5(2), 144–154.
Wardhana, A., Yuliana, T., & Putri, M. (2023). Penerapan algoritma C4.5 untuk prediksi diagnosis kanker payudara. Jurnal Sains Komputer dan Informatika, 9(1), 78–87.
Warnilah, A. I., Sutisna, H., Ratningsih, R., Christian, V., & Maharani, R. (2024). Implementasi machine learning untuk prediksi kanker payudara menggunakan model regresi logistik. EVOLUSI: Jurnal Sains dan Manajemen, 12(2), 76–84. https://doi.org/10.31294/evolusi.v12i2.23315
Cruz, J. A., & Wishart, D. S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2, 59–77. https://doi.org/10.1177/117693510600200030
Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine, 34(2), 113–127. https://doi.org/10.1016/j.artmed.2004.07.002
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118. https://doi.org/10.1038/nature21056
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 13, 8–17. https://doi.org/10.1016/j.csbj.2014.11.005
Panda, N. R., Pati, J. K., Mohanty, J. N., & Bhuyan, R. (2022). A review on Logistic regression in medical research. National Journal of Community Medicine, 13(4), 265–270. https://doi.org/10.55489/njcm.134202222
Kaggle. (2024). Global cancer patients 2015–2024 (global_cancer_patients_2015_2024.csv) [Data set]. Kaggle. https://www.kaggle.com/
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sudriyanto Sudriyanto, Abdul Fatah, Moh Dafa Wahna Putra

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal
















