Review of Data Pre-processing Techniques for Classification

Trilok Suthar; Digvijaysinh Mahida; Pinkal Shah

doi:10.32628/CI014

Authors

Trilok Suthar Asst. Prof IT Department, Sigma Institute of Engineering, Vadodara, Gujarat, India
Digvijaysinh Mahida Asst. Prof IT Department, Sigma Institute of Engineering, Vadodara, Gujarat, India
Pinkal Shah Asst. Prof IT Department, Sigma Institute of Engineering, Vadodara, Gujarat, India

Keywords:

Data Mining, Noise, Preprocessing Dataset, Skewness

Abstract

Data mining is the process of extraction useful patterns from a huge dataset. These models and patterns have an effective role in a decision making task. Data mining basically depends on the quality of data. Raw data usually susceptible to missing values, noisy data, incomplete data, inconsistent data and outlier data. So it is important for these data to be pre-processed before being classified using classification algorithms. Data pre-processing is one of the most data mining steps which deals with data preparation and transformation. Preprocessed data make knowledge discovery more efficient. Preprocessing includes several techniques like cleaning, integration, transformation and data reduction. This paper shows the various preprocessing techniques applied to the classification.

References

http://www.cs.ccsu.edu/~markov/ccsu_courses/DataMining- 3.html
Cw.flek.cvut.cz/lib/exe/fetch.php/cources/ac4m33sad/2_tutorial.pdf.
S. McClean, B. Scotney and M. Shapcott, “UsingBackground Knowledge with Attribute- OrientedData Mining”, Knowledge Discovery and Datamining (Digest no, 1998/310), IEEcolloquiumon, pp. 1/1-1/4, 1998.
J. Shena and M. Chen, “A Recycle Technique ofAssociation Rule for Missing Value Completion”in Proc. AINA’03, pp. 526-529, 2003.
Thomas R. Gabriel and Michael R. Berthold,“Missing Values in Fuzzy Rule Induction”,Systems, Man and Cybernetics, IEEEInternational Conference on (Volume: 2), 2005.
M. Shyu, I. P. Appuhamilage, S. Chen and L.Chang, “Handling Missing Values viaDecomposition of the Conditioned Set”, IEEESystems, Man, and cybernetics society, pp. 199-204, 2005.
Olga Troyanskaya, Michael Cantor, GavinSherlock, Pat Brown, Trevor Hastie, RobertTibshirani, David Botstein and Russ B. Altman,“Missing value estimation methods for DNAmicroarrays”, Bioinformatics 17 (6): 520-525,2001.
Anjana Sharma, Naina Mehta, Iti Sharma, ”Reasoning with Missing Values in MultiAttribute Datasets” ,International Journal ofAdvanced Research in Computer Science andSoftware Engineering, Volume 3, Issue 5, May2013 .
R. Malarvizhi, A. Thanamani,” K-NN ClassifierPerforms Better Than K-Means Clustering inMissing Value Imputation”, IOSR Journal ofComputer Engineering (IOSRJCE), vol. 6, pp.12-15, Nov. - Dec 2012.
Phimmarin Keerin and Werasak Kurutach,Tossapon Boongoen, “Cluster-based KNNMissing Value Imputation for DNA Microarray Data”,IEEE International Conference onSystems, Man, and Cybernetics COEX, Seoul,Korea, October 14-17, 2012.
Trilok suthar “novel preprocessing techniques for NID3R” International Journal Of Engineering And Computer Science Volume 6 Issue 5 May 2017.

Review of Data Pre-processing Techniques for Classification

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite