A survey on Diabetes Prediction Models Using Data Mining Techniques: issues and challenges.

Authors

  • Swati D. Patel  Assistant Professor, Dharmsinh Desai University, Nadiad, Gujarat, India

DOI:

https://doi.org/10.32628/IJSRSET23103208

Keywords:

Diabetes Prediction Models, Data Mining, Early Intervention, Machine Learning Algorithms, Feature Selection

Abstract

Diabetes is a chronic disease that affects a significant number of individuals worldwide, and timely detection and management can prevent or delay the development of severe complications. To aid in early diagnosis and treatment, data mining techniques have been extensively utilized to create predictive models for diabetes. This review paper provides an overview of recent studies on diabetes prediction models developed using data mining techniques. The review paper discusses various data mining techniques employed for diabetes prediction, such as decision trees, neural networks, logistic regression, support vector machines, and ensemble methods which combine multiple models to improve performance, have also been utilized. The paper analyzes the strengths and limitations of these techniques. The review emphasizes the significance of feature selection in enhancing the performance of diabetes prediction models. Feature selection can reduce data dimensionality, eliminate irrelevant or redundant features, and improve model interpretability. Finally, the paper presents potential areas for future research in this field, including developing more interpretable models, exploring the use of deep learning techniques, and integrating multiple data sources to enhance prediction accuracy.

References

  1. Arora, N., Singh, A., Al-Dabagh, M.Z.N., Maitra, S.K., 2022. A Novel Architecture for Diabetes Patients’ Prediction Using K-Means Clustering and SVM. Math. Probl. Eng. 2022, 4815521. https://doi.org/10.1155/2022/4815521
  2. Bhatt, D., 2022. Privacy-Preserving in Machine Learning (PPML). Anal. Vidhya. URL https://www.analyticsvidhya.com/blog/2022/02/privacy-preserving-in-machine-learning-ppml/(accessed 4.9.23).
  3. Cai, J., Luo, J., Wang, S., Yang, S., 2018. Feature selection in machine learning: A new perspective. Neurocomputing 300, 70–79. https://doi.org/10.1016/j.neucom.2017.11.077
  4. Carvalho, T.P., Soares, F.A.A.M.N., Vita, R., Francisco, R. da P., Basto, J.P., Alcalá, S.G.S., 2019. A systematic literature review of machine learning methods applied to predictive maintenance. Comput. Ind. Eng. 137, 106024. https://doi.org/10.1016/j.cie.2019.106024
  5. Chen, C.-W., Tsai, Y.-H., Chang, F.-R., Lin, W.-C., 2020. Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results. Expert Syst. 37, e12553. https://doi.org/10.1111/exsy.12553
  6. Chu, X., Ilyas, I.F., 2016. Qualitative data cleaning. Proc. VLDB Endow. 9, 1605– 1608. https://doi.org/10.14778/3007263.3007320
  7. Chu, X., Ilyas, I.F., Krishnan, S., Wang, J., 2016. Data Cleaning: Overview and Emerging Challenges, in: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16. Association for Computing Machinery, New York, NY, USA, pp. 2201–2206. https://doi.org/10.1145/2882903.2912574
  8. Data Generalization: The Specifics of Generalizing Data [WWW Document], n.d. . Satori. URL https://satoricyber.com/data-masking/data-generalization/ (accessed 3.28.23).
  9. Florian, E., Sgarbossa, F., Zennaro, I., 2021. Machine learning-based predictive maintenance: A cost-oriented model for implementation. Int. J. Prod. Econ. 236, 108114. https://doi.org/10.1016/j.ijpe.2021.108114
  10. Fregoso-Aparicio, L., Noguez, J., Montesinos, L., García-García, J.A., 2021. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol. Metab. Syndr. 13, 148. https://doi.org/10.1186/s13098-021-00767-9
  11. Gopika, N., Kowshalaya M.E., A.M., 2018. Correlation Based Feature Selection Algorithm for Machine Learning, in: 2018 3rd International Conference on Communication and Electronics Systems (ICCES). Presented at the 2018 3rd International Conference on Communication and Electronics Systems (ICCES), pp. 692–695. https://doi.org/10.1109/CESYS.2018.8723980
  12. Hall, M.A., 1999. Correlation-based feature selection for machine learning (Thesis).The University of Waikato.Ho, S.Y., Phua, K., Wong, L., Bin Goh, W.W., 2020. Extensions of the External Validation for Checking Learned Model Interpretability and Generalizability. Patterns 1, 100129. https://doi.org/10.1016/j.patter.2020.100129
  13. Katuwal, G.J., Chen, R., 2016. Machine Learning Model Interpretability for Precision Medicine. https://doi.org/10.48550/arXiv.1610.09045
  14. Li, Y., Wang, J.-L., Tian, Z.-H., Lu, T.-B., Young, C., 2009. Building lightweight intrusion detection system using wrapper-based feature selection mechanisms. Comput. Secur. 28, 466–475. https://doi.org/10.1016/j.cose.2009.01.001
  15. Lipton, Z.C., 2018. The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16, 31–57. https://doi.org/10.1145/3236386.3241340
  16. Liu, H., Zhou, M., Liu, Q., 2019. An embedded feature selection method for imbalanced data classification. IEEECAA J. Autom. Sin. 6, 703–715. https://doi.org/10.1109/JAS.2019.1911447
  17. Mahboob Alam, T., Iqbal, M.A., Ali, Y., Wahab, A., Ijaz, S., Imtiaz Baig, T., Hussain, A., Malik, M.A., Raza, M.M., Ibrar, S., Abbas, Z., 2019. A model for early prediction of diabetes. Inform. Med. Unlocked 16,100204. https://doi.org/10.1016/j.imu.2019.100204
  18. Performance evaluation of random forest with feature selection methods in prediction of diabetes-ProQuest[WWW Document], n.d.URL https://www.proquest.com/openview/6f2a0e9f67089d1e6318b937f438a8af/ 1?pq origsite=gscholar&cbl=1686344 (accessed 4.9.23).
  19. Wang, A., An, N., Chen, G., Li, L., Alterovitz, G., 2015. Accelerating wrapper-based feature selection with K-nearest-neighbor. Knowl.-Based Syst. 83, 81–91. https://doi.org/10.1016/j.knosys.2015.03.009
  20. Why data quality is important for machine learning [WWW Document], n.d. URL https://labelbox.ghost.io/blog/data-quality-for-machine-learning/ (accessed 4.9.23).

Downloads

Published

2023-08-30

Issue

Section

Research Articles

How to Cite

[1]
Swati D. Patel "A survey on Diabetes Prediction Models Using Data Mining Techniques: issues and challenges." International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 10, Issue 4, pp.263-267, July-August-2023. Available at doi : https://doi.org/10.32628/IJSRSET23103208