Comparative Analysis of Random Forest and XGBoost for Detecting Phishing Websites: A Machine Learning Approach
DOI:
https://doi.org/10.53863/kst.v7i02.1933Keywords:
Phishing Detection, Machine Learning, Random Forest, XGBoost, CybersecurityAbstract
Phishing attacks represent one of the most significant cybersecurity threats in the digital era, with over 300,000 complaints reported globally in 2023. In Indonesia, the National Cyber and Crypto Agency reported anomalous traffic related to phishing reaching 47,231,390 incidents in 2023, making it one of the greatest threats to the national digital ecosystem. The complexity of increasingly sophisticated modern phishing attacks requires machine learning-based automatic detection approaches to overcome the limitations of ineffective manual detection methods. This study presents a comparative analysis of Random Forest and XGBoost algorithms for automatic phishing website detection using machine learning techniques. Although both algorithms have proven effective in the cybersecurity domain, comprehensive comparisons considering aspects of performance, interpretability, and computational efficiency in the context of phishing detection remain limited, creating a research gap that needs to be filled to optimize national phishing detection systems. The research methodology includes data collection, preprocessing, model implementation, hyperparameter optimization using randomized search with 5-fold stratified cross-validation, and comparative analysis. Experimental results demonstrate that optimized XGBoost delivers the best performance with 97.78% accuracy and 73% faster training time, while Random Forest offers interpretability advantages with 97.65% accuracy. Feature importance analysis reveals SSL certificate status and anchor URL characteristics as the most critical discriminative features. This study concludes that optimized XGBoost is the more optimal choice for production deployment of real-time phishing detection systems, while Random Forest is more suitable for scenarios requiring model transparency. These findings contribute to the development of national phishing detection systems that support the Indonesian government's digitalization program and protect the public from increasing cybersecurity threats.
References
Abdeljaber, F., Mohammad, R. M., Thabtah, F., & McCluskey, L. (2014). Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2).
Abdul Samad, S. R., Balasubramanian, S., Al-Kaabi, A. S., Sharma, B., Chowdhury, S., Mehbodniya, A., Webber, J. L., & Bostani, A. (2023). Analysis of the performance impact of fine-tuned machine learning model for phishing URL detection. Electronics, 12(7). https://doi.org/10.3390/electronics12071642
Amora, E. N. O., Agoylo, J. C., Olaybar, J. A., Munasque, J. C., & Cerna, P. D. (2025). AI-driven real-time severity prediction for cyber attacks using machine learning. In Proceedings of the 3rd International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS 2025) (pp. 1467–1472). https://doi.org/10.1109/ICSSAS66150.2025.11081230
Badan Siber dan Sandi Negara. (2024). Lanskap keamanan siber Indonesia 2023. https://www.bssn.go.id/
Dutta, A. K. (2021). Detecting phishing websites using machine learning technique. PLOS ONE, 16(10), e0258361. https://doi.org/10.1371/journal.pone.0258361
Fang, X., & Liu, M. (2024). How does the digital transformation drive digital technology innovation of enterprises? Evidence from enterprise digital patents. Technological Forecasting and Social Change, 204, 123428. https://doi.org/10.1016/j.techfore.2024.123428
Federal Bureau of Investigation. (2023). 2022 Internet crime report. https://www.ic3.gov/Media/PDF/AnnualReport/2022_IC3Report.pdf
Kementerian Komunikasi dan Digital Republik Indonesia. (2023). Memenuhi layanan digital hingga pelosok. https://www.komdigi.go.id/berita/artikel/detail/memenuhi-layanan-digital-hingga-pelosok
Marchal, S., François, J., State, R., & Engel, T. (n.d.). PhishStorm: Detecting phishing with streaming analytics. http://www.quadrodeofertas.com.br/www1.paypal-com/encripted/ssl218
Ovi, S., Rahman, M., & Hossain, M. (2024). PhishGuard: A multi-layered ensemble model for optimal phishing website detection. Proceedings of the International Conference on Smart Technologies and Innovations. https://doi.org/10.1109/STI64222.2024.10951075
Rajalim, S. (2025). Understanding human vulnerabilities: A study on social engineering techniques in cybersecurity. International Journal for Research in Applied Science and Engineering Technology, 13(6), 1660–1666. https://doi.org/10.22214/ijraset.2025.72502
Safi, A., & Singh, S. (2023). A systematic literature review on phishing website detection techniques. Journal of King Saud University – Computer and Information Sciences, 35(2), 590–611. https://doi.org/10.1016/j.jksuci.2023.01.004
Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357. https://doi.org/10.1016/j.eswa.2018.09.029
Wu, C.-Y., Kuo, C.-C., & Yang, C.-S. (2023). Phishing detection with browser extension based on machine learning. Proceedings of the Asia Joint Conference on Information Security. https://doi.org/10.1109/AsiaJCIS60284.2023.00023
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Yogi Perdana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal
















