Review of Data Pre-processing Techniques for Classification

Authors

  • Trilok Suthar  Asst. Prof IT Department, Sigma Institute of Engineering, Vadodara, Gujarat, India
  • Digvijaysinh Mahida  Asst. Prof IT Department, Sigma Institute of Engineering, Vadodara, Gujarat, India
  • Pinkal Shah  Asst. Prof IT Department, Sigma Institute of Engineering, Vadodara, Gujarat, India

Keywords:

Data Mining, Noise, Preprocessing Dataset, Skewness

Abstract

Data mining is the process of extraction useful patterns from a huge dataset. These models and patterns have an effective role in a decision making task. Data mining basically depends on the quality of data. Raw data usually susceptible to missing values, noisy data, incomplete data, inconsistent data and outlier data. So it is important for these data to be pre-processed before being classified using classification algorithms. Data pre-processing is one of the most data mining steps which deals with data preparation and transformation. Preprocessed data make knowledge discovery more efficient. Preprocessing includes several techniques like cleaning, integration, transformation and data reduction. This paper shows the various preprocessing techniques applied to the classification.

References

  1. http://www.cs.ccsu.edu/~markov/ccsu_courses/DataMining- 3.html
  2. Cw.flek.cvut.cz/lib/exe/fetch.php/cources/ac4m33sad/2_tutorial.pdf.
  3. S. McClean, B. Scotney and M. Shapcott, “UsingBackground Knowledge with Attribute- OrientedData Mining”, Knowledge Discovery and Datamining (Digest no, 1998/310), IEEcolloquiumon, pp. 1/1-1/4, 1998.
  4. J. Shena and M. Chen, “A Recycle Technique ofAssociation Rule for Missing Value Completion”in Proc. AINA’03, pp. 526-529, 2003.
  5. Thomas R. Gabriel and Michael R. Berthold,“Missing Values in Fuzzy Rule Induction”,Systems, Man and Cybernetics, IEEEInternational Conference on (Volume: 2), 2005.
  6. M. Shyu, I. P. Appuhamilage, S. Chen and L.Chang, “Handling Missing Values viaDecomposition of the Conditioned Set”, IEEESystems, Man, and cybernetics society, pp. 199-204, 2005.
  7. Olga Troyanskaya, Michael Cantor, GavinSherlock, Pat Brown, Trevor Hastie, RobertTibshirani, David Botstein and Russ B. Altman,“Missing value estimation methods for DNAmicroarrays”, Bioinformatics 17 (6): 520-525,2001.
  8. Anjana Sharma, Naina Mehta, Iti Sharma, ”Reasoning with Missing Values in MultiAttribute Datasets” ,International Journal ofAdvanced Research in Computer Science andSoftware Engineering, Volume 3, Issue 5, May2013 .
  9. R. Malarvizhi, A. Thanamani,” K-NN ClassifierPerforms Better Than K-Means Clustering inMissing Value Imputation”, IOSR Journal ofComputer Engineering (IOSRJCE), vol. 6, pp.12-15, Nov. - Dec 2012.
  10. Phimmarin Keerin and Werasak Kurutach,Tossapon Boongoen, “Cluster-based KNNMissing Value Imputation for DNA Microarray Data”,IEEE International Conference onSystems, Man, and Cybernetics COEX, Seoul,Korea, October 14-17, 2012.
  11. Trilok suthar “novel preprocessing techniques for NID3R” International Journal Of Engineering And Computer Science Volume 6 Issue 5 May 2017.

Downloads

Published

2018-04-10

Issue

Section

Research Articles

How to Cite

[1]
Trilok Suthar, Digvijaysinh Mahida, Pinkal Shah, " Review of Data Pre-processing Techniques for Classification, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 5, pp.304-307, March-April-2018.