Ensemble Learning Approach based Rule Extraction from Support Vector Machine

Authors

  • Chitra A  Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamilnadu, India
  • Anto S  Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamilnadu, India

Keywords:

diagnosis of diabetes, ensemble learning, random forest (RF), rule extraction, support vector machines (SVMs)

Abstract

In recent years, support vector machines (SVMs) have shown good performance in a number of application areas. The existing system is concentrated on the discovery of risk of having pre-diabetes or undiagnosed diabetes and to facilitate people decide whether they should see a physician for further evaluation. However the existing system has issue with prediction results by using C4.5, naïve bayes tree and neural network algorithms. To avoid the above mentioned issue we go for proposed system. In proposed scenario, we introduced an efficient algorithm named as Support Vector Machine (SVM) which is utilized to screen diabetes, and an ensemble learning module is added. The proposed system is used to develop an ensemble system for diabetes diagnosis. Specifically, the rules are extracted from the SVM algorithm and it is applied to provide comprehensibility and transparent representation. These rule sets can be regarded as a second opinion for diagnosis and a tool to screen the individuals with undiagnosed diabetes by lay users. From the experimental result, we can conclude that the proposed system is better than the existing scenario in terms of reduction of the incidence of diabetes and its complications.

References

  1. Heikes, Kenneth E., et al. "A Simple Screening Tool for Detecting Undiagnosed Diabetes and Prediabetes." Diabetes 56 (2007).
  2. Buijsse, Brian, et al. "Risk assessment tools for identifying individuals at risk of developing type 2 diabetes." Epidemiologic reviews (2011): mxq019.
  3. L. Kuncheva and C.Whitaker, “Measures of diversity in classifier ensembles,” Machine. Learning, vol. 51, pp. 181–207, 2010.
  4. S. M. Attard, A. H. Herring, E. J. Mayer-Davis, B. M. Popkin, J. B. Meigs, and P.Gordon-Larsen, “Multilevel examination of diabetes in modernizing china: What elements of urbanisation are most associated with diabetes?” Diabetologia, vol. 55, no. 12, pp. 3182–3192, 2012.
  5. Y Saeys, I. Inza, and P Larra˜naga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507–2517, 2007.
  6. P. Paokanta, “β-thalassemia knowledge elicitation using data engineering: PCA, pearson’s chi square and machine learning,” Int. J. Comput. Theory Eng., vol. 4, no. 5, pp. 702–706, 2012.
  7. U. Fayyad and K. Irani, “Multi-interval discretization of continuous valued attributes for classification learning,” in Proc. 10th Proc. 13th Int. Joint Conf. Artif. Intell., 1993.
  8. Q. Yanjun, “Random forest for bioinformatics,” in Ensemble Machine Learning. New York, NY, USA: Springer, 2012, pp. 307–323.
  9. H. N´u˜nez, C. Angulo, and A. Catal`a, “Rule extraction from support vector machines,” in Proc. Eur. Symp. Artif. Neural Netw., 2002, pp. 291–296.
  10. Y. Zhang, H. Su, T. Jia, and J. Chu, “Rule extraction from trained support vector machines,” in Proc. 9th Pacific-Asia Conf. Adv. Knowl. Discovery Data Mining, 2005, pp. 61–70.
  11. G. Fung, S. Sandilya, and R. Rao, “Rule extraction from linear support vector machines,” in Proc. 11th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2005, pp. 32–40.
  12. N. Barakat and J. Diederich, “Learning-based rule-extraction from support vector machines: Performance on benchmark data sets,” in Proc. 14th Int. Conf. Comput. Theory Appl., 2004, pp. 178–190.
  13. A. Khan and K. Revett, “Data mining the PIMA dataset using rough set theory with a special emphasis on rule reduction,” in Proc. INMIC 8th Int. Multitopic Conf., 2004, pp. 334–339.
  14. N. H. Barakat and A. P. Bradley, “Rule extraction from support vector machines: A sequential covering approach,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 6, pp. 729–741, Jun. 2007.
  15. N. Barakat and J. Diederich, “Eclectic rule-extraction from support vector machines,” Int. J. Comput. Intell., vol. 2, no. 1, pp. 59–62, 2005.
  16. X. J. Fu, C. J.Ong, S.Keerthit, andG.G.Hung, “Extracting the knowledge embedded in support vector machines,” in Proc. IEEE Int. Conf. Neural Netw., 2004, pp.107112.
  17. K. Heikes, D. Eddy, B. Arondekar, and L. Schlessinger, “Diabetes risk calculator: A simple tool for detecting undiagnosed diabetes and prediabetes,” Diabetes Care, vol. 31, no. 5, pp. 1040–1045, 2008.
  18. L. Tapak, H. Mahjub, O. Hamidi, and J. Poorolajal, “Real-data comparison of data mining methods in prediction of diabetes in Iran,” Healthcare Informat. Res., vol. 19, no. 3, pp. 177–185, 2013.
  19. O. Akgobek, “A hybrid approach for improving the accuracy of classification algorithms in data mining,” Energy Edu. Sci. Technol. Part A-Energy Sci. Res., vol. 29, no. 2, pp. 1039–1054, 2012.
  20. J. Lee, B. Keam, E. J. Jang, M. S. Park, J. Y. Lee, D. B. Kim, C. H. Lee, T. Kim, B. Oh, H. J. Park, K. B. Kwack, C. Chu, and H. L. Kim, “Development of a predictive model for type 2 diabetes mellitus using genetic and clinical data,” Osong Public Health Res. Perspect., vol. 2, no. 2, pp. 75–82, 2011.

Downloads

Published

2017-12-31

Issue

Section

Research Articles

How to Cite

[1]
Chitra A, Anto S, " Ensemble Learning Approach based Rule Extraction from Support Vector Machine , International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 2, Issue 2, pp.1260-1266, March-April-2016.