Investigation of Performance Analysis of Classification Algorithm in Data Mining

Authors(2) :-Dr. Mohd Ashraf, Dr. Zair Hussain

Data mining is now one of the most active field of research. Extracting those nuggets of information is becoming crucial and one of its important technique is classification. It helps to group the data in some predefined classes. Various techniques for classification exists which classifies the data using different algorithms. Each algorithm has its own area of best and worst performance. This paper concentrates on the four most famous algorithms, i.e., Decision Tree, Na´ve Bayes, K Nearest Neighbour and Genetic Programming and the effect on their performance of time and accuracy when the number of instances are incrementally decreased. This paper will also investigate the difference in result when working with binary class or multiclass datasets and suggest the algorithms to follow when using certain kind of dataset.

Authors and Affiliations

Dr. Mohd Ashraf
Department of Computer Science & Engineering, Maulana Azad National Urdu University Hyderabad, Telangana, India
Dr. Zair Hussain
Department of Information Technology, Maulana Azad National Urdu University Hyderabad, Telangana, India

Decision Tree, Na´ve Bayes, K-Nearest Neighbor, Genetic Programming, Accuracy

  1. Radhika Kotecha, Vijay Ukani and Sanjay Garg, "An Empirical Analysis of Multiclass Classification Techniques in Data Mining", INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN TECHNOLOGY, Vol.2, NUiCONE, DECEMBER, 2011
  2. Ian H. Witten, Eibe Frank, Mark A. Hall, "What’s It All About?," ] Data Mining Practical Machine Learning Tools and Techniques, Third Edition. USA, 2011.
  3. Wikipedia. (2014, November, 11), Data MiningOnline].Available: http://en.wikipedia.org/wiki/Data_mining
  4. Matthieu Cord, and Sarah Jane Delany, "Supervised Learning," P´adraig Cunningham.
  5. Harvinder Chauhan, Anu Chauhan, "Evaluating Performance of Decision Tree Algorithms," International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014
  6. UCI repository. (2008, July, 15). Index of /Datasets/UCI/arff Online]. Available: http://repository.seasr.org/Datasets/UCI/arff/
  7. leyan. (2013, April, 05). Genetic Programming Classifier for Weka Online]. Available: http://sourceforge.net/projects/wekagp/
  8. Machine Learning Group at the University of Waikato. (2014). Weka 3: Data Mining Software in Java Online]. Availabe: http://www.cs.waikato.ac.nz/ml/weka/
  9. Jiawei Han and Micheline Kamber, "Introduction," Data Mining: Concepts and Techniques, Second Edition. University of Illinois at Urbana-Champaign , USA, 2006.
  10. Medeswara Rao, Kondamudi, Sudhir Tirumalasetty, "Improved Clustering And Naïve Bayesian Based Binary Decision Tree With Bagging Approach," International Journal of Computer Trends and Technology (IJCTT) - volume 5 number 2 -Nov 2013
  11. MIT Press. (2013). The GP Tutorial Online]. Available: http://www.geneticprogramming.com/Tutorial/
  12. R.S. Michalski and R.L. Chilausky "Learning by Being Told and Learning from Examples: An Experimental Comparison of the Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis", International Journal of Policy Analysis and Information Systems, Vol. 4, No. 2, 1980.

Publication Details

Published in : Volume 4 | Issue 4 | March-April 2018
Date of Publication : 2018-04-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 58-66
Manuscript Number : IJSRSET184425
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Dr. Mohd Ashraf, Dr. Zair Hussain, " Investigation of Performance Analysis of Classification Algorithm in Data Mining, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 4, pp.58-66, March-April-2018.
Journal URL : http://ijsrset.com/IJSRSET184425

Follow Us

Contact Us