Review of Data Pre-processing Techniques for Classification

Authors(3) :-Trilok Suthar, Digvijaysinh Mahida, Pinkal Shah

Data mining is the process of extraction useful patterns from a huge dataset. These models and patterns have an effective role in a decision making task. Data mining basically depends on the quality of data. Raw data usually susceptible to missing values, noisy data, incomplete data, inconsistent data and outlier data. So it is important for these data to be pre-processed before being classified using classification algorithms. Data pre-processing is one of the most data mining steps which deals with data preparation and transformation. Preprocessed data make knowledge discovery more efficient. Preprocessing includes several techniques like cleaning, integration, transformation and data reduction. This paper shows the various preprocessing techniques applied to the classification.

Authors and Affiliations

Trilok Suthar
Asst. Prof IT Department, Sigma Institute of Engineering, Vadodara, Gujarat, India
Digvijaysinh Mahida
Asst. Prof IT Department, Sigma Institute of Engineering, Vadodara, Gujarat, India
Pinkal Shah
Asst. Prof IT Department, Sigma Institute of Engineering, Vadodara, Gujarat, India

Data Mining, Noise, Preprocessing Dataset, Skewness

  1. 3.html
  3. S. McClean, B. Scotney and M. Shapcott, “UsingBackground Knowledge with Attribute- OrientedData Mining”, Knowledge Discovery and Datamining (Digest no, 1998/310), IEEcolloquiumon, pp. 1/1-1/4, 1998.
  4. J. Shena and M. Chen, “A Recycle Technique ofAssociation Rule for Missing Value Completion”in Proc. AINA’03, pp. 526-529, 2003.
  5. Thomas R. Gabriel and Michael R. Berthold,“Missing Values in Fuzzy Rule Induction”,Systems, Man and Cybernetics, IEEEInternational Conference on (Volume: 2), 2005.
  6. M. Shyu, I. P. Appuhamilage, S. Chen and L.Chang, “Handling Missing Values viaDecomposition of the Conditioned Set”, IEEESystems, Man, and cybernetics society, pp. 199-204, 2005.
  7. Olga Troyanskaya, Michael Cantor, GavinSherlock, Pat Brown, Trevor Hastie, RobertTibshirani, David Botstein and Russ B. Altman,“Missing value estimation methods for DNAmicroarrays”, Bioinformatics 17 (6): 520-525,2001.
  8. Anjana Sharma, Naina Mehta, Iti Sharma, ”Reasoning with Missing Values in MultiAttribute Datasets” ,International Journal ofAdvanced Research in Computer Science andSoftware Engineering, Volume 3, Issue 5, May2013 .
  9. R. Malarvizhi, A. Thanamani,” K-NN ClassifierPerforms Better Than K-Means Clustering inMissing Value Imputation”, IOSR Journal ofComputer Engineering (IOSRJCE), vol. 6, pp.12-15, Nov. - Dec 2012.
  10. Phimmarin Keerin and Werasak Kurutach,Tossapon Boongoen, “Cluster-based KNNMissing Value Imputation for DNA Microarray Data”,IEEE International Conference onSystems, Man, and Cybernetics COEX, Seoul,Korea, October 14-17, 2012.
  11. Trilok suthar “novel preprocessing techniques for NID3R” International Journal Of Engineering And Computer Science Volume 6 Issue 5 May 2017.

Publication Details

Published in : Volume 4 | Issue 5 | March-April 2018
Date of Publication : 2018-04-10
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 304-307
Manuscript Number : CI014
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Trilok Suthar, Digvijaysinh Mahida, Pinkal Shah, " Review of Data Pre-processing Techniques for Classification, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 5, pp.304-307, March-April-2018.
Journal URL :

Article Preview