Review of Data Pre-processing Techniques for Classification
Keywords:
Data Mining, Noise, Preprocessing Dataset, SkewnessAbstract
Data mining is the process of extraction useful patterns from a huge dataset. These models and patterns have an effective role in a decision making task. Data mining basically depends on the quality of data. Raw data usually susceptible to missing values, noisy data, incomplete data, inconsistent data and outlier data. So it is important for these data to be pre-processed before being classified using classification algorithms. Data pre-processing is one of the most data mining steps which deals with data preparation and transformation. Preprocessed data make knowledge discovery more efficient. Preprocessing includes several techniques like cleaning, integration, transformation and data reduction. This paper shows the various preprocessing techniques applied to the classification.
References
- http://www.cs.ccsu.edu/~markov/ccsu_courses/DataMining- 3.html
- Cw.flek.cvut.cz/lib/exe/fetch.php/cources/ac4m33sad/2_tutorial.pdf.
- S. McClean, B. Scotney and M. Shapcott, “UsingBackground Knowledge with Attribute- OrientedData Mining”, Knowledge Discovery and Datamining (Digest no, 1998/310), IEEcolloquiumon, pp. 1/1-1/4, 1998.
- J. Shena and M. Chen, “A Recycle Technique ofAssociation Rule for Missing Value Completion”in Proc. AINA’03, pp. 526-529, 2003.
- Thomas R. Gabriel and Michael R. Berthold,“Missing Values in Fuzzy Rule Induction”,Systems, Man and Cybernetics, IEEEInternational Conference on (Volume: 2), 2005.
- M. Shyu, I. P. Appuhamilage, S. Chen and L.Chang, “Handling Missing Values viaDecomposition of the Conditioned Set”, IEEESystems, Man, and cybernetics society, pp. 199-204, 2005.
- Olga Troyanskaya, Michael Cantor, GavinSherlock, Pat Brown, Trevor Hastie, RobertTibshirani, David Botstein and Russ B. Altman,“Missing value estimation methods for DNAmicroarrays”, Bioinformatics 17 (6): 520-525,2001.
- Anjana Sharma, Naina Mehta, Iti Sharma, ”Reasoning with Missing Values in MultiAttribute Datasets” ,International Journal ofAdvanced Research in Computer Science andSoftware Engineering, Volume 3, Issue 5, May2013 .
- R. Malarvizhi, A. Thanamani,” K-NN ClassifierPerforms Better Than K-Means Clustering inMissing Value Imputation”, IOSR Journal ofComputer Engineering (IOSRJCE), vol. 6, pp.12-15, Nov. - Dec 2012.
- Phimmarin Keerin and Werasak Kurutach,Tossapon Boongoen, “Cluster-based KNNMissing Value Imputation for DNA Microarray Data”,IEEE International Conference onSystems, Man, and Cybernetics COEX, Seoul,Korea, October 14-17, 2012.
- Trilok suthar “novel preprocessing techniques for NID3R” International Journal Of Engineering And Computer Science Volume 6 Issue 5 May 2017.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.