An Ensemble Voting Classifier Based On Machine Learning Models for Phishing Detection
DOI:
https://doi.org/10.32628/IJSRSET251211Keywords:
Phishing Detection, Ensemble Learning, Hard Voting Classifier, Machine Learning, CybersecurityAbstract
The pervasive threat of phishing attacks has necessitated the development of more effective detection systems. This paper introduces a novel ensemble hard voting classifier that integrates the predictive capabilities of Logistic Regression, Gradient Boosting, and K-Nearest Neighbors to identify phishing websites with enhanced accuracy. Our methodology encompasses a comprehensive analysis starting with a rich dataset from Kaggle, consisting of over 11,000 websites, each described by 30 features. Through meticulous exploratory data analysis, we have discerned significant patterns and feature correlations, which informed the subsequent data preprocessing phase. We standardized feature scales using the StandardScaler and split the dataset into an 80-20 ratio for training and testing, ensuring both effective model learning and validation. The ensemble model capitalizes on the diversity of its constituent classifiers, outperforming individual models with an accuracy of 95.02%. Our approach demonstrates that an ensemble hard voting classifier not only improves the detection rate but also provides a balanced precision-recall performance, crucial for real-world applications.
Downloads
References
Z. Alkhalil, C. Hewage, L. Nawaf, and I. Khan, “Phishing Attacks: A Recent Comprehensive Study and a New Anatomy,” Front Comput Sci, vol. 3, p. 563060, Mar. 2021. DOI: https://doi.org/10.3389/fcomp.2021.563060
E. Dzuba, “Introducing Cloudflare’s 2023 phishing threats report,” Cloudflare Blog, Oct. 2023.
T. Bilot, N. E. Madhoun, K. A. Agha, and A. Zouaoui, “A survey on malware detection with graph representation learning,” arXiv preprint arXiv:2303.16004, 2023. DOI: https://doi.org/10.1145/3664649
R. Ahmad, I. Alsmadi, W. Alhamdani, and L. Tawalbeh, “Zero-day attack detection: a systematic literature review,” Artificial Intelligence Review, pp. 1–79, 2023. DOI: https://doi.org/10.1007/s10462-023-10437-z
O. H. Abdulganiyu, T. Ait Tchakoucht, and Y. K. Saheed, “A systematic literature review for network intrusion detection system (IDS),” Int J Inf Secur, vol. 22, pp. 1125–1162, Oct. 2023. DOI: https://doi.org/10.1007/s10207-023-00682-2
K. Thakur, M. L. Ali, M. A. Obaidat, and A. Kamruzzaman, “A Systematic Review on Deep-Learning-Based Phishing Email Detection,” Electronics, vol. 12, p. 4545, Nov. 2023. DOI: https://doi.org/10.3390/electronics12214545
G. Xiang, B. A. Pendleton, J. Hong, and C. P. Rose, “A Hierarchical Adaptive Probabilistic Approach for Zero Hour Phish Detection,” in Computer Security – ESORICS 2010, pp. 268–285, Berlin, Germany: Springer, 2010. DOI: https://doi.org/10.1007/978-3-642-15497-3_17
E.-S. Apostol and C.-O. Truica˘, “Efficient Machine Learning Ensemble Methods for Detecting Gravitational Wave Glitches in LIGO Time Series,” arXiv, Nov. 2023. DOI: https://doi.org/10.1109/ICCP60212.2023.10398717
Z. Li, K. Ren, Y. Yang, X. Jiang, Y. Yang, and D. Li, “Towards Inference Efficient Deep Ensemble Learning,” arXiv, Jan. 2023. DOI: https://doi.org/10.1609/aaai.v37i7.26048
A. Dziedzic, C. A. Choquette-Choo, N. Dullerud, V. M. Suriyakumar, A. S. Shamsabadi, M. A. Kaleem, S. Jha, N. Papernot, and X. Wang, “Private Multi-Winner Voting for Machine Learning,” arXiv, Nov. 2022. DOI: https://doi.org/10.56553/popets-2023-0031
A. Rahman and S. Tasnim, “Ensemble classifiers and their applications: a review,” arXiv preprint arXiv:1404.4088, 2014.
M. N. Alam, D. Sarma, F. F. Lima, I. Saha, R.-E. Ulfath, and S. Hossain, “Phishing Attacks Detection using Machine Learning Approach,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 20–22, IEEE. DOI: https://doi.org/10.1109/ICSSIT48917.2020.9214225
B. Espinoza, J. Simba, W. Fuertes, E. Benavides, R. Andrade, and T. Toulkeridis, “Phishing attack detection: A solution based on the typical machine learning modeling cycle,” in 2019 International Confer- ence on Computational Science and Computational Intelligence (CSCI), pp. 202–207, IEEE, 2019. DOI: https://doi.org/10.1109/CSCI49370.2019.00041
S. Hossain, D. Sarma, and R. J. Chakma, “Machine learning-based phishing attack detection,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 9, 2020. DOI: https://doi.org/10.14569/IJACSA.2020.0110945
N. F. Abedin, R. Bawm, T. Sarwar, M. Saifuddin, M. A. Rahman, and S. Hossain, “Phishing attack detection using machine learning classification techniques,” in 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), pp. 1125–1130, IEEE, 2020. DOI: https://doi.org/10.1109/ICISS49785.2020.9315895
A. Basit, M. Zafar, A. R. Javed, and Z. Jalil, “A novel ensemble machine learning method to detect phishing attack,” in 2020 IEEE 23rd International Multitopic Conference (INMIC), pp. 1–5, IEEE, 2020. DOI: https://doi.org/10.1109/INMIC50486.2020.9318210
I. Saha, D. Sarma, R. J. Chakma, M. N. Alam, A. Sultana, and S. Hossain, “Phishing attacks detection using deep learning approach,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 1180–1185, IEEE, 2020. DOI: https://doi.org/10.1109/ICSSIT48917.2020.9214132
F. Salahdine, Z. El Mrabet, and N. Kaabouch, “Phishing attacks detection a machine learning-based approach,” in 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0250–0255, IEEE, 2021. DOI: https://doi.org/10.1109/UEMCON53757.2021.9666627
A. K. Jain and B. B. Gupta, “A machine learning based approach for phishing detection using hyperlinks information,” J Ambient Intell Hum Comput, vol. 10, pp. 2015–2028, May 2019. DOI: https://doi.org/10.1007/s12652-018-0798-z
O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, “Machine learning based phishing detection from URLs,” Expert Syst Appl, vol. 117, pp. 345–357, Mar. 2019. DOI: https://doi.org/10.1016/j.eswa.2018.09.029
V. Shahrivari, M. M. Darabi, and M. Izadi, “Phishing Detection Using Machine Learning Techniques,” arXiv, Sept. 2020.
“Phishing website Detector,” Nov. 2023. [Online; accessed 7. Nov. 2023].
X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble learning,” Front Comput Sci, vol. 14, pp. 241–258, Apr. 2020. DOI: https://doi.org/10.1007/s11704-019-8208-z
O. Kramer and O. Kramer, “K-nearest neighbors,” Dimensionality re- duction with unsupervised nearest neighbors, pp. 13–23, 2013. DOI: https://doi.org/10.1007/978-3-642-38652-7_2
A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Frontiers in neurorobotics, vol. 7, p. 21, 2013. DOI: https://doi.org/10.3389/fnbot.2013.00021
R. E. Wright, “Logistic regression.,” 1995.
I. M. De Diego, A. R. Redondo, R. R. Ferna´ndez, J. Navarro, and J. M. Moguerza, “General Performance Score for classification problems,” Appl Intell, vol. 52, pp. 12049–12063, Aug. 2022. DOI: https://doi.org/10.1007/s10489-021-03041-7
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Science, Engineering and Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.