Performance Evaluation Of SVM With Parameter Optimization On Credit Card Fraud Data Subset Using SMOTE

Ahmad Farrel Mahardika(1), Irza Nuzul Fahrezi(2), Muhammad Hadya Alleredha(3), Khusnatul Amaliah(4*), Dani Rofianto(5),

(1) Politeknik Negeri Lampung, Indonesia
(2) Politeknik Negeri Lampung, Indonesia
(3) Politeknik Negeri Lampung, Indonesia
(4) Politeknik Negeri Lampung, Indonesia
(5) Politeknik Negeri Lampung, Indonesia
(*) Corresponding Author

Abstract


This study evaluates the performance of the Support Vector Machine (SVM) algorithm in detecting credit card fraud by overcoming the class imbalance problem using the Synthetic Minority Oversampling Technique (SMOTE) technique and parameter optimization through Grid Search. The dataset used is sourced from Kaggle, consists of 10,001 transactions, and has been balanced. SMOTE is applied exclusively to the training data to prevent data leakage. The optimization process produces the best parameters at a value of C = 10 and gamma = 0.1. Model evaluation is carried out using recall, precision, F1-score, and AUC-ROC metrics. The results show a significant increase in performance in recognizing fraudulent transactions. The final model recorded a recall of 0.68, precision 0.90, F1-score 0.77, and AUC-ROC 0.98. These findings prove that the combination of SMOTE techniques and parameter optimization can improve the effectiveness of SVM in classifying minority classes more accurately. This approach is considered to have great potential to be applied in automated fraud detection systems in the financial sector.


Full Text:

PDF

References


Agustin, ID, & Abidin, FIN (2024). The Influence Of Financial Literacy, Financial Behavior, Digital Payment And Paylater On Student Consumptive Behavior In The COVID-19 Pandemic Era. Innovative Technologica: Methodical Research Journal, 1(4), 15. Https://Doi.Org/10.47134/Innovative.V1i4.44

Batista, GEAPA, Prati, R.C., & Monard, M.C. (2004). A Study Of The Behavior Of Several Methods For Balancing Machine Learning Training Data. ACM SIGKDD Explorations Newsletter, 6(1), 20–29. Https://Doi.Org/10.1145/1007730.1007735

Ben-Hur, A., & Weston, J. (2010). A User's Guide To Supporting Vector Machines. Methods In Molecular Biology (Clifton, NJ), 609, 223–239. Https://Doi.Org/10.1007/978-1-60327-241-4_13

Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data Mining For Credit Card Fraud: A Comparative Study. Decision Support Systems, 50(3), 602–613. Https://Doi.Org/10.1016/J.Dss.2010.08.008

Chen, T., & Guestrin, C. (2016). Xgboost: A Scalable Tree Boosting System. Proceedings Of The ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, 17-13-August-2016, 785–794. Https://Doi.Org/10.1145/2939672.2939785

Cherkassky, V., & Ma, Y. (2004). Practical Selection Of SVM Parameters And Noise Estimation For SVM Regression. Neural Networks, 17(1), 113–126. Https://Doi.Org/10.1016/S0893-6080(03)00169-2

Cortes, C., & Vapnik, V. (1995). (2015). Support-Vector Networks. Machine Learning, 20(3), 273–297. Journal Of Physics: Conference Series, 628(1), 273–297. Https://Doi.Org/10.1088/1742-6596/628/1/012073

Dal Pozzolo, A., Caelen, O., Bontempi, G., & Johnson, R. A. (2015). Calibrating Probability With Undersampling For Unbalanced Classification Fraud Detection View Project Volatility Forecasting View Project Calibrating Probability With Undersampling For Unbalanced Classification. Ieee. Https://Www.Researchgate.Net/Publication/283349138

Dean, J., & Ghemawat, S. (2008). Mapreduce: Simplified Data Processing On Large Clusters. Communications Of The ACM, 51(1), 107–113. Https://Doi.Org/10.1145/1327452.1327492

Hsu, C., Chang, C., & Lin, C. (2003). A Practical Guide To Support Vector Classification (Presentation). 1–29. Https://Www.Cs.Sfu.Ca/People/Faculty/Teaching/726/Spring11/Svmguide.Pdf

Hutter, F. (2017). Optimization Parameters. Interdisciplinary Mathematical Sciences (Vol. 19). Https://Doi.Org/10.1142/9789814630146_0014

Jasper Snoek, H.L. (2013). Practical Bayesian Optimization Of Machine Learning Algorithms. Religion And The Arts, 17(1–2), 57–73. Https://Doi.Org/10.1163/15685292-12341254

Kim, H. (2022). Deep Learning. Artificial Intelligence For 6G, 22(4),

–303. Https://Doi.Org/10.1007/978-3-030-95041-5_6

Kohavi, R. (1995). A Study Of Cross-Validation And Bootstrapping For Accuracy Estimation And Model Selection. IJCAI International Joint Conference On Artificial Intelligence, 2(June), 1137–1143.

Liu, Y., An, A., & Huang, X. (2006). Boosting Prediction Accuracy On Imbalanced Datasets With SVM Ensembles. Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 3918 LNAI(April), 107–118. Https://Doi.Org/10.1007/11731139_15

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, LOH (2020). SMOTE: Synthetic Minority Over-Sampling Technique. METHOMIKA Journal Of Informatics Management And Computerized Accounting, 4(1), 67–72. Https://Doi.Org/10.46880/Jmika.Vol4no1.Pp67-72

Ramadani, L. (2016). The Influence Of Debit Card And Electronic Money (E-Money) Usage On Student Consumption Expenditure. Journal Of Economics And Development Economics Studies, 8(1), 1–8. Https://Doi.Org/10.17977/Um002v8i12016p001

Sokolova, M., & Lapalme, G. (2009). A Systematic Analysis Of Performance Measures For Classification Tasks. Information Processing And Management, 45(4), 427–437. Https://Doi.Org/10.1016/J.Ipm.2009.03.002

Wu, Top 10 Algorithms In Data Mining. In Knowledge And Information Systems (Vol. 14, Issue 1). Https://Doi.Org/10.1007/S10115-007-0114-2




DOI: https://doi.org/10.30645/ijistech.v9i1.398

Refbacks

  • There are currently no refbacks.







Jumlah Kunjungan:

View My Stats

Published Papers Indexed/Abstracted By: