A Review on Data Level Approaches for Managing Imbalanced Classification Problem

Authors(3) :-Ayushi Chaplot, Naveen Choudhary, Kalpana Jain

In real world, the distribution of dataset is not in symmetric form. It can vary from application to application and distribution of data in that application. The un-symmetric form of this distribution is called imbalanced class distribution or skewed class distribution. So, the classification of data with skewed distribution of class can lead to the poor performance of the classifier. To solve the problem of imbalanced dataset in which the instances of one class is more than the instances of other class, there are different data level approaches for handling imbalanced classes. So, in this paper we will discuss about different data level approaches and have comparative study among them.

Authors and Affiliations

Ayushi Chaplot
Department of CSE, College of Technology and Engineering, Udaipur, Rajasthan, India
Naveen Choudhary
Department of CSE, College of Technology and Engineering, Udaipur, Rajasthan, India
Kalpana Jain
Department of CSE, College of Technology and Engineering, Udaipur, Rajasthan, India

Imbalanced data, Oversampling, Undersampling, Multiclass Classification.

  1. V. Lopez, A. Fernandez, S. Garcia, V. Palade and F. Herrera, “An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics,” Elsevier Journal of Information Sciences, vol. 250 pp. 113-141, November 2013.
  2. N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16 pp. 321-357, June 2002.
  3. H. He, Y. Bai, E.A. Garcia and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” IEEE International Joint Conference on Neural Networks, pp. 1322-1328, June 2008.
  4. H. Han, W.Y. Wang, and B.H. Mao, "Borderline-smote: A new over-sampling method in imbalanced data sets learning,” International Conference on Intelligent Computing, vol. 3644 pp. 878-887, August 2005.
  5. I. Tomek, “Two modifications of CNN” IEEE Transactions on System Man and Cybernetics, vol. 6: pp.769-772, November 1976.
  6. D. Wilson, “Asymptotic Properties of Nearest Neighbour Rules Using Edited Data” IEEE Transactions on Systems, Man, and Cybernetrics, vol. 2 pp. 408-421, July 1972.
  7. P. Hart, “The condensed nearest neighbour rule,” Information Theory, IEEE Transactions on, vol. 14 pp. 515-516, May 1968.
  8. M. Kubat, S. Matwin, “Addressing the curse of imbalanced training sets: one-sided selection,” the Fourteenth International Conference on Machine Learning, vol. 97 pp. 179-186, July 1997.
  9. J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution,” Springer Berlin Heidelberg, vol. 2101 pp.63-66 June 2001.
  10. G. Batista, R. C. Prati, M. C. Monard. “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter, vol. 6 pp. 20-29, June 2004.
  11. G. Batista, B. Bazzan, M. Monard, “Balancing Training Data for Automated Annotation of Keywords:a Case Study,” The Second Brazilian Workshop on Bioinformatics, pp. 35-43, December 2003.
  12. R. Barandela., J.L.Sánchez, V. García and E. Rangel, “Strategies for learning in class imbalance problems,” Pattern Recognition, vol.36 pp. 849-851, September 2003.

Publication Details

Published in : Volume 6 | Issue 2 | March-April 2019
Date of Publication : 2019-04-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 91-97
Manuscript Number : IJSRSET196225
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Ayushi Chaplot, Naveen Choudhary, Kalpana Jain, " A Review on Data Level Approaches for Managing Imbalanced Classification Problem, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 6, Issue 2, pp.91-97, March-April-2019. Available at doi : https://doi.org/10.32628/IJSRSET196225      Citation Detection and Elimination     |     
Journal URL : https://ijsrset.com/IJSRSET196225

Article Preview