Comparative Sentiment Analysis of Election News Articles with Smote using Classification Algorithm
DOI:
https://doi.org/10.53863/kst.v6i02.1253Keywords:
sentiment analysis, SMOTE, KNN, decision treeAbstract
This research focuses on sentiment analysis of news articles about general elections, especially the president and vice president by comparing the performance of classification algorithms, especially Decision Tree and K-Nearest Neighbors (KNN), and evaluating the effectiveness of the SMOTE (Synthetic Minority Over-sampling Technique) technique in overcoming the problem of data imbalance or the dataset shows that the amount of data that has positive sentiment is more than negative sentiment. The main objective of this research is to determine which algorithm is superior in sentiment classification and see how SMOTE can improve the performance of the model. The dataset was scraped and subjected to text normalization, stop words removal, and feature extraction. SMOTE was applied to balance the classes in the dataset, thus overcoming the imbalance that often occurs in sentiment data. Decision Tree and KNN algorithms were used. The results showed that Decision Tree consistently performed better than KNN in terms of 85% accuracy, 44% precision, 47% recall, and 45% F1 score. The application of SMOTE is proven to improve the performance of both algorithms, but the effect is more significant on Decision Tree. Thus, this study concludes that Decision Tree, combined with SMOTE, is a more effective and reliable approach for sentiment analysis of election articles than KNN. These results make an important contribution to the development of sentiment analysis methods that can be applied to understand the dynamics of public opinion in a political context.
References
A’yuniyah, Q. A., & Reza, M. (2023). Penerapan Algoritma K-Nearest Neighbor Untuk Klasifikasi Jurusan Siswa Di Sma Negeri 15 Pekanbaru. Indonesian Journal of Informatic Research and Software Engineering (IJIRSE), 3(1), 39–45. https://doi.org/10.57152/ijirse.v3i1.484
Al-Azani, S., & El-Alfy, E. S. M. (2017). Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text. Procedia Computer Science, 109, 359–366. https://doi.org/10.1016/j.procs.2017.05.365
Azhar, Y. (2017). METODE LEXICON-LEARNING BASED UNTUK IDENTIFIKASI TWEET OPINI BERBAHASA INDONESIA. In Jurnal Nasional Pendidikan Teknik Informatika | (Vol. 6, Issue 3).
Azhar, Y. (2018). Metode Lexicon-Learning Based Untuk Identifikasi Tweet Opini Berbahasa Indonesia. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 6(3), 237. https://doi.org/10.23887/janapati.v6i3.11739
Eddyono, A. S. (2021). Pers Alternatif pada Era Orde Baru: Dijinakkan hingga Dibungkam. Komunika, 8(1), 53–60.
Es-Sabery, F., Es-Sabery, K., Qadir, J., Sainz-De-Abajo, B., Hair, A., García-Zapirain, B., & De La Torre-Díez, I. (2021). A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree Classifier. IEEE Access, 9, 58706–58739. https://doi.org/10.1109/ACCESS.2021.3073215
Habibah, A. F. (2021). Era masyarakat informasi sebagai dampak media baru. Jurnal Teknologi Dan Sistem Informasi Bisnis, 3(2), 350–363.
Harun, A., & Putri Ananda, D. (2021). Analisa Sentimen Opini Publik Tentang Vaksinasi Covid-19 di Indonesia Menggunakan Naïve bayes dan Decission Tree. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 1(1), 58–64. https://doi.org/10.57152/malcom.v1i1.63
Hasan, F. N. (2024). Analisis Sentimen Masyarakat Terhadap Fenomena Childfree ( Kehidupan Tanpa Anak ) Pada Twitter Menggunakan Algoritma Naïve Bayes. 5(3), 853–861. https://doi.org/10.47065/josh.v5i3.5064
Ibrahim, N. M., Yafooz, W. M. S., Emara, A. H. M., & Abdel-Wahab, A. (2022). Utilizing Deep Learning in Arabic Text Classification Sentiment Analysis of Twitter. International Journal of Advanced Computer Science and Applications, 13(12), 830–838. https://doi.org/10.14569/IJACSA.2022.0131297
Keputusan Dirjen Penguatan Riset dan Pengembangan Ristek Dikti, S., Ari Kristanto, A., Harjoseputro, Y., Eric Samodra, J., & Jaya Yogyakarta yuliusharjoseputro, A. (2017). Terakreditasi SINTA Peringkat 2 Implementasi Golang dan New Simple Queue pada Sistem Sandbox Pihak Ketiga Berbasis REST API. Masa Berlaku Mulai, 1(3), 745–750.
Kholifah, B., Thoib, I., Sururi, N., & Kurnia, N. D. (2024). Analisis Sentimen Warganet Terhadap Isu Layanan Transportasi Online Berbasis InSet Lexicon Menggunakan Logistic Regression. 11(1), 14–25.
Lase, S. M. N., Adinda, A., Yuliantika, R. D., & Al, E. (2021). Kerangka Hukum Teknologi Blockchain Berdasarkan Hukum Siber di Indonesia. Padjajaran Law Review, 9(1), 1–20. https://hbr.org/2017/02/a-brief-history-of-
Lazuardi, J. U. S., & Juarna, A. (2023). Analisis Sentimen Ulasan Pengguna Aplikasi Joox Pada Android Menggunakan Metode Bidirectional Encoder Representation From Transformer (Bert). Jurnal Ilmiah Informatika Komputer, 28(3), 251–260. https://doi.org/10.35760/ik.2023.v28i3.10090
Loka, S. K. P., & Marsal, A. (2023). Perbandingan Algoritma K-Nearest Neighbor dan Naïve Bayes Classifier untuk Klasifikasi Status Gizi Pada Balita. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 3(1), 8–14. https://doi.org/10.57152/malcom.v3i1.474
Mohasseb, A., Bader-El-Den, M., Cocea, M., & Liu, H. (n.d.). IMPROVING IMBALANCED QUESTION CLASSIFICATION USING STRUCTURED SMOTE BASED APPROACH. http://trec.nist.gov/data/qa/t2007_qadata.html
Mudjiyanto, B., & Dunan, A. (2020). Media mainstream jadi rujukan media sosial. Majalah Semi Ilmiah Populer Komunikasi Massa, 1(01).
Mustasaruddin, M., Budianita, E., Fikry, M., & Yanto, F. (2023). Klasifikasi Sentiment Review Aplikasi MyPertamina Menggunakan Word Embedding FastText dan SVM (Support Vector Machine). Jurnal Sistem Komputer Dan Informatika (JSON), 4(3), 526. https://doi.org/10.30865/json.v4i3.5695
Nooryuda Prasetya, Y., Winarso, D., & Syahril. (2021). Penerapan Lexicon Based Untuk Analisis Sentimen Pada TwiterTerhadap Isu Covid-19. Jurnal Fasilkom, 11(2), 97–103.
Putra, F., Tahiyat, H. F., Ihsan, R. M., Rahmaddeni, R., & Efrizoni, L. (2024). Penerapan Algoritma K-Nearest Neighbor Menggunakan Wrapper Sebagai Preprocessing untuk Penentuan Keterangan Berat Badan Manusia. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 4(1), 273–281. https://doi.org/10.57152/malcom.v4i1.1085
Putu, N. L. P. M., Ahmad Zuli Amrullah, & Ismarmiaty. (2021). Analisis Sentimen dan Pemodelan Topik Pariwisata Lombok Menggunakan Algoritma Naive Bayes dan Latent Dirichlet Allocation. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(1), 123–131. https://doi.org/10.29207/resti.v5i1.2587
Satrio, B., Dahlan, B. F., Fathan, F., Muwafa, F. Z., & Reyhan, M. (2024). Klasifikasi Sentimen Emosi Pada Dataset Goemotion Menggunakan LSTM. 7(1), 21–25.
Setiawan, S. B., & Isnain, A. R. (2024). Sentimen Analisis Masyarakat Terhadap Pembangunan IKN Menggunakan Algoritma Lexicon Based Approach dan Naïve Bayes. 8(April 2019), 1019–1030. https://doi.org/10.30865/mib.v8i2.7605
Shahriar, K. T., Islam, M. N., Moni, M. A., & Sarker, I. H. (2023). A dynamic topic identification and labeling approach for COVID-19 tweets. Applied Intelligence for Industry 4.0, December 2019, 227–239. https://doi.org/10.1201/9781003256083-18
Supriatna, R., & Rohman, D. (2024). PENERAPAN NATURAL LANGUAGE PROCESSING DALAM ANALISIS SENTIMEN CAWAPRES 2024 MENGGUNAKAN ALGORITMA NAIVE BAYES. 8(1), 1109–1115.
Surya Gemilang, W., Purwantoro, P., & Carudin, C. (2024). Analisis Sentimen Pengguna Instagram Pada Calon Presiden 2024 Menggunakan Algoritma Support Vector Machine. JATI (Jurnal Mahasiswa Teknik Informatika), 7(4), 2849–2855. https://doi.org/10.36040/jati.v7i4.7256
Syamala, M., & Nalini, N. J. (2020). A filter based improved decision tree sentiment classification model for real-time amazon product review data. International Journal of Intelligent Engineering and Systems, 13(1), 191–202. https://doi.org/10.22266/ijies2020.0229.18
Toruan, C. R. A., Yudistra, N., & Perdana, R. S. (2023). Analisis Sentimen Tokocrypto pada Twitter menggunakan Metode Long Short-Term Memory. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 7(2), 719–726. http://j-ptiik.ub.ac.id
Utomo, P. B., Wahyudi, D., & Nalendra, A. K. (2024). Implementasi Convolution-Augmented Transfomer Berbasis Kecerdasan Buatan dalam Analisis Sentimen Teks Hasil Konversi Suara ke Teks. 8(1), 63–71.
Winarso, D., Yanda Noor Yudha, & Syahril. (2021). Analisis Sentimen Masyarakat Pada Twiter Terhadap Isu Covid-19 Menggunakan Metode Lexicon Based. Jurnal Fasilkom, 11(2), 97–103. https://doi.org/10.37859/jf.v11i2.2772
Yang, S. (2018). Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis. International Journal of Computer and Information Engineering, 12(7), 525–529.
Yani, S., Jumeilah, F. S., & Kadafi, M. (2020). Algoritma K-Nearest Neighbor Untuk Menentukan Kelayakan Keluarga Penerima Bantuan Pangan Non Tunai (Studi Kasus?: Kelurahan Karya Jaya). Journal of Information Technology Ampera, 1(2), 75–87. https://doi.org/10.51519/journalita.volume1.isssue2.year2020.page75-87
Zelina, N., & Afiyati, A. (2024). Analisis Sentimen Ulasan Pengguna Aplikasi M- Banking Menggunakan Algoritma Support Vector Machine dan Decision Tree. 7(1), 31–37.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Fathir Fathir,Afsa Rizki,Yuliyanti Yuliyanti,Siti Mutmainah
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal