Comparison of Algorithms on Machine Learning For Spam Email Classification

Hery Iswanto(1*), Erni Seniwati(2), Yuli Astuti(3), Dina Maulina(4),

(1) Informatics Study Program, Universitas AMIKOM Yogyakarta
(2) Information Systems Study Program, Universitas AMIKOM Yogyakarta
(3) Informatics Management Study Program, Universitas AMIKOM Yogyakarta
(4) Informatics Management Study Program, Universitas AMIKOM Yogyakarta
(*) Corresponding Author


The rapid development of email use and the convenience provided make email as the most frequently used means of communication. Along with its development, many parties are abusing the use of email as a means of advertising promotion, phishing and sending other unimportant emails. This information is called spam email. One of the efforts in overcoming the problem of spam emails is by filtering techniques based on the content of the email. In the first study related to the classification of spam emails, the Naïve Bayes method is the most commonly used method. Therefore, in this study researchers will add Random Forest and K-Nearest Neighbor (KNN) methods to make comparisons in order to find which methods have better accuracy in classifying spam emails. Based on the results of the trial, the application of Naïve bayes classification algorithm in the classification of spam emails resulted in accuracy of 83.5%, Random Forest 83.5% and KNN 82.75%

Full Text:



N. Q. Fitriyah, H. Oktavianto and H. , "Deteksi Spam Pada Email Berbasis Fitur Konten Menggunakan Naïve Bayes", JUSTINDO (Jurnal Sistem & Teknologi Informasi Indonesia), Vol. 5, No. 1, Februari 2020, vol. 5, pp. 1-7, 2020.

A. A. Alukar, S. B. Ranade, S. V. Joshi, S. S. Ranade, P. A. Sonewar, P. N. Mahalle and A. V. Desphande, "Proposed Data Science Approach for Email Spam Classification using Machine Learning Techniques", Internet of Things Business Models, Users, and Networks, pp. 1-5, 2017.

W. N. Chandra, G. Indrawan and I. N. Sukaraja, "Spam Filtering Dengan Metode Pos Tagger Dan Klasifikasi Naïve Bayes", Jurnal Ilmiah Teknologi dan Informasia ASIA (JITIKA), vol. 10, pp. 47-55, 2016.

S. P. P. U. T. M. D. K. Mangena Venu Madhavan, "Comparative Analysis of Detection of Email Spam With the Aid of Machine Learning Approaches", IOP Conf. Series: Materials Science and Engineering, vol. 1, pp. 1-12, 2021.

R. T. Wahyuni, D. Prastiyanto and E. Supraptono, "Penerapan Algoritma Cosine Similarity dan Pembobotan TF-IDF pada Sistem Klasifikasi", Jurnal Teknik Elektro, vol. 1, pp. 28-23, 2017.

A. Saleh, "Implementasi Metode Klasifikasi Naïve Bayes Dalam Memprediksi Besarnya Penggunaan Listrik Rumah Tangga", Citec Journal, vol. 2, pp. 207-217, 2015.

A. A. A. Tita Nurul Nuklianggraita, "On the Feature Selection of Microarray Data for Cancer Detection based on Random Forest Classifier", JURNAL INFOTEL, vol. 12, pp. 89-96, 2020.

N. Krisandi, Helmi and H. Prihandono, "Algoritma K-Nearest Neighbor Dalam Klasifikasi Data Hasil Produksi Kelapa Sawit Pada PT. MINAMAS", Buletin Ilmiah Math. Stat. dan Terapannya (Bimaster), vol. 02, pp. 33-38, 2013.

S. Dewi, "Komparasi 5 Metode Algoritma Klasifikasi Data Mining Pada Prediksi Keberhasilan Pemasaran Produk Layanan Perbankan", Jurnal Techno Nusa Mandiri, vol. 13, pp. 60-66, 2016.



  • There are currently no refbacks.

Jumlah Kunjungan:

View My Stats

Published Papers Indexed/Abstracted By: