Investigation of Performance Analysis of Classification Algorithm in Data Mining

Authors

  • Dr. Mohd Ashraf  Department of Computer Science & Engineering, Maulana Azad National Urdu University Hyderabad, Telangana, India
  • Dr. Zair Hussain  Department of Information Technology, Maulana Azad National Urdu University Hyderabad, Telangana, India

Keywords:

Decision Tree, Naïve Bayes, K-Nearest Neighbor, Genetic Programming, Accuracy

Abstract

Data mining is now one of the most active field of research. Extracting those nuggets of information is becoming crucial and one of its important technique is classification. It helps to group the data in some predefined classes. Various techniques for classification exists which classifies the data using different algorithms. Each algorithm has its own area of best and worst performance. This paper concentrates on the four most famous algorithms, i.e., Decision Tree, Naïve Bayes, K Nearest Neighbour and Genetic Programming and the effect on their performance of time and accuracy when the number of instances are incrementally decreased. This paper will also investigate the difference in result when working with binary class or multiclass datasets and suggest the algorithms to follow when using certain kind of dataset.

References

  1. Radhika Kotecha, Vijay Ukani and Sanjay Garg, "An Empirical Analysis of Multiclass Classification Techniques in Data Mining", INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN TECHNOLOGY, Vol.2, NUiCONE, DECEMBER, 2011
  2. Ian H. Witten, Eibe Frank, Mark A. Hall, "What’s It All About?," ] Data Mining Practical Machine Learning Tools and Techniques, Third Edition. USA, 2011.
  3. Wikipedia. (2014, November, 11), Data MiningOnline].Available: http://en.wikipedia.org/wiki/Data_mining
  4. Matthieu Cord, and Sarah Jane Delany, "Supervised Learning," P´adraig Cunningham.
  5. Harvinder Chauhan, Anu Chauhan, "Evaluating Performance of Decision Tree Algorithms," International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014
  6. UCI repository. (2008, July, 15). Index of /Datasets/UCI/arff Online]. Available: http://repository.seasr.org/Datasets/UCI/arff/
  7. leyan. (2013, April, 05). Genetic Programming Classifier for Weka Online]. Available: http://sourceforge.net/projects/wekagp/
  8. Machine Learning Group at the University of Waikato. (2014). Weka 3: Data Mining Software in Java Online]. Availabe: http://www.cs.waikato.ac.nz/ml/weka/
  9. Jiawei Han and Micheline Kamber, "Introduction," Data Mining: Concepts and Techniques, Second Edition. University of Illinois at Urbana-Champaign , USA, 2006.
  10. Medeswara Rao, Kondamudi, Sudhir Tirumalasetty, "Improved Clustering And Naïve Bayesian Based Binary Decision Tree With Bagging Approach," International Journal of Computer Trends and Technology (IJCTT) - volume 5 number 2 -Nov 2013
  11. MIT Press. (2013). The GP Tutorial Online]. Available: http://www.geneticprogramming.com/Tutorial/
  12. R.S. Michalski and R.L. Chilausky "Learning by Being Told and Learning from Examples: An Experimental Comparison of the Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis", International Journal of Policy Analysis and Information Systems, Vol. 4, No. 2, 1980.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
Dr. Mohd Ashraf, Dr. Zair Hussain, " Investigation of Performance Analysis of Classification Algorithm in Data Mining, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 4, pp.58-66, March-April-2018.