A Review on Data Level Approaches for Managing Imbalanced Classification Problem

Ayushi Chaplot; Naveen Choudhary; Kalpana Jain

doi:10.32628/IJSRSET196225

Authors

Ayushi Chaplot Department of CSE, College of Technology and Engineering, Udaipur, Rajasthan, India
Naveen Choudhary Department of CSE, College of Technology and Engineering, Udaipur, Rajasthan, India
Kalpana Jain Department of CSE, College of Technology and Engineering, Udaipur, Rajasthan, India

DOI:

https://doi.org/10.32628/IJSRSET196225

Keywords:

Imbalanced data, Oversampling, Undersampling, Multiclass Classification.

Abstract

In real world, the distribution of dataset is not in symmetric form. It can vary from application to application and distribution of data in that application. The un-symmetric form of this distribution is called imbalanced class distribution or skewed class distribution. So, the classification of data with skewed distribution of class can lead to the poor performance of the classifier. To solve the problem of imbalanced dataset in which the instances of one class is more than the instances of other class, there are different data level approaches for handling imbalanced classes. So, in this paper we will discuss about different data level approaches and have comparative study among them.

References

V. Lopez, A. Fernandez, S. Garcia, V. Palade and F. Herrera, “An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics,” Elsevier Journal of Information Sciences, vol. 250 pp. 113-141, November 2013.
N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16 pp. 321-357, June 2002.
H. He, Y. Bai, E.A. Garcia and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” IEEE International Joint Conference on Neural Networks, pp. 1322-1328, June 2008.
H. Han, W.Y. Wang, and B.H. Mao, "Borderline-smote: A new over-sampling method in imbalanced data sets learning,” International Conference on Intelligent Computing, vol. 3644 pp. 878-887, August 2005.
I. Tomek, “Two modifications of CNN” IEEE Transactions on System Man and Cybernetics, vol. 6: pp.769-772, November 1976.
D. Wilson, “Asymptotic Properties of Nearest Neighbour Rules Using Edited Data” IEEE Transactions on Systems, Man, and Cybernetrics, vol. 2 pp. 408-421, July 1972.
P. Hart, “The condensed nearest neighbour rule,” Information Theory, IEEE Transactions on, vol. 14 pp. 515-516, May 1968.
M. Kubat, S. Matwin, “Addressing the curse of imbalanced training sets: one-sided selection,” the Fourteenth International Conference on Machine Learning, vol. 97 pp. 179-186, July 1997.
J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution,” Springer Berlin Heidelberg, vol. 2101 pp.63-66 June 2001.
G. Batista, R. C. Prati, M. C. Monard. “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter, vol. 6 pp. 20-29, June 2004.
G. Batista, B. Bazzan, M. Monard, “Balancing Training Data for Automated Annotation of Keywords:a Case Study,” The Second Brazilian Workshop on Bioinformatics, pp. 35-43, December 2003.
R. Barandela., J.L.Sánchez, V. García and E. Rangel, “Strategies for learning in class imbalance problems,” Pattern Recognition, vol.36 pp. 849-851, September 2003.

A Review on Data Level Approaches for Managing Imbalanced Classification Problem

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite