A survey on Diabetes Prediction Models Using Data Mining Techniques: issues and challenges.
DOI:
https://doi.org/10.32628/IJSRSET23103208Keywords:
Diabetes Prediction Models, Data Mining, Early Intervention, Machine Learning Algorithms, Feature SelectionAbstract
Diabetes is a chronic disease that affects a significant number of individuals worldwide, and timely detection and management can prevent or delay the development of severe complications. To aid in early diagnosis and treatment, data mining techniques have been extensively utilized to create predictive models for diabetes. This review paper provides an overview of recent studies on diabetes prediction models developed using data mining techniques. The review paper discusses various data mining techniques employed for diabetes prediction, such as decision trees, neural networks, logistic regression, support vector machines, and ensemble methods which combine multiple models to improve performance, have also been utilized. The paper analyzes the strengths and limitations of these techniques. The review emphasizes the significance of feature selection in enhancing the performance of diabetes prediction models. Feature selection can reduce data dimensionality, eliminate irrelevant or redundant features, and improve model interpretability. Finally, the paper presents potential areas for future research in this field, including developing more interpretable models, exploring the use of deep learning techniques, and integrating multiple data sources to enhance prediction accuracy.
References
- Arora, N., Singh, A., Al-Dabagh, M.Z.N., Maitra, S.K., 2022. A Novel Architecture for Diabetes Patients’ Prediction Using K-Means Clustering and SVM. Math. Probl. Eng. 2022, 4815521. https://doi.org/10.1155/2022/4815521
- Bhatt, D., 2022. Privacy-Preserving in Machine Learning (PPML). Anal. Vidhya. URL https://www.analyticsvidhya.com/blog/2022/02/privacy-preserving-in-machine-learning-ppml/(accessed 4.9.23).
- Cai, J., Luo, J., Wang, S., Yang, S., 2018. Feature selection in machine learning: A new perspective. Neurocomputing 300, 70–79. https://doi.org/10.1016/j.neucom.2017.11.077
- Carvalho, T.P., Soares, F.A.A.M.N., Vita, R., Francisco, R. da P., Basto, J.P., Alcalá, S.G.S., 2019. A systematic literature review of machine learning methods applied to predictive maintenance. Comput. Ind. Eng. 137, 106024. https://doi.org/10.1016/j.cie.2019.106024
- Chen, C.-W., Tsai, Y.-H., Chang, F.-R., Lin, W.-C., 2020. Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results. Expert Syst. 37, e12553. https://doi.org/10.1111/exsy.12553
- Chu, X., Ilyas, I.F., 2016. Qualitative data cleaning. Proc. VLDB Endow. 9, 1605– 1608. https://doi.org/10.14778/3007263.3007320
- Chu, X., Ilyas, I.F., Krishnan, S., Wang, J., 2016. Data Cleaning: Overview and Emerging Challenges, in: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16. Association for Computing Machinery, New York, NY, USA, pp. 2201–2206. https://doi.org/10.1145/2882903.2912574
- Data Generalization: The Specifics of Generalizing Data [WWW Document], n.d. . Satori. URL https://satoricyber.com/data-masking/data-generalization/ (accessed 3.28.23).
- Florian, E., Sgarbossa, F., Zennaro, I., 2021. Machine learning-based predictive maintenance: A cost-oriented model for implementation. Int. J. Prod. Econ. 236, 108114. https://doi.org/10.1016/j.ijpe.2021.108114
- Fregoso-Aparicio, L., Noguez, J., Montesinos, L., García-García, J.A., 2021. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol. Metab. Syndr. 13, 148. https://doi.org/10.1186/s13098-021-00767-9
- Gopika, N., Kowshalaya M.E., A.M., 2018. Correlation Based Feature Selection Algorithm for Machine Learning, in: 2018 3rd International Conference on Communication and Electronics Systems (ICCES). Presented at the 2018 3rd International Conference on Communication and Electronics Systems (ICCES), pp. 692–695. https://doi.org/10.1109/CESYS.2018.8723980
- Hall, M.A., 1999. Correlation-based feature selection for machine learning (Thesis).The University of Waikato.Ho, S.Y., Phua, K., Wong, L., Bin Goh, W.W., 2020. Extensions of the External Validation for Checking Learned Model Interpretability and Generalizability. Patterns 1, 100129. https://doi.org/10.1016/j.patter.2020.100129
- Katuwal, G.J., Chen, R., 2016. Machine Learning Model Interpretability for Precision Medicine. https://doi.org/10.48550/arXiv.1610.09045
- Li, Y., Wang, J.-L., Tian, Z.-H., Lu, T.-B., Young, C., 2009. Building lightweight intrusion detection system using wrapper-based feature selection mechanisms. Comput. Secur. 28, 466–475. https://doi.org/10.1016/j.cose.2009.01.001
- Lipton, Z.C., 2018. The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16, 31–57. https://doi.org/10.1145/3236386.3241340
- Liu, H., Zhou, M., Liu, Q., 2019. An embedded feature selection method for imbalanced data classification. IEEECAA J. Autom. Sin. 6, 703–715. https://doi.org/10.1109/JAS.2019.1911447
- Mahboob Alam, T., Iqbal, M.A., Ali, Y., Wahab, A., Ijaz, S., Imtiaz Baig, T., Hussain, A., Malik, M.A., Raza, M.M., Ibrar, S., Abbas, Z., 2019. A model for early prediction of diabetes. Inform. Med. Unlocked 16,100204. https://doi.org/10.1016/j.imu.2019.100204
- Performance evaluation of random forest with feature selection methods in prediction of diabetes-ProQuest[WWW Document], n.d.URL https://www.proquest.com/openview/6f2a0e9f67089d1e6318b937f438a8af/ 1?pq origsite=gscholar&cbl=1686344 (accessed 4.9.23).
- Wang, A., An, N., Chen, G., Li, L., Alterovitz, G., 2015. Accelerating wrapper-based feature selection with K-nearest-neighbor. Knowl.-Based Syst. 83, 81–91. https://doi.org/10.1016/j.knosys.2015.03.009
- Why data quality is important for machine learning [WWW Document], n.d. URL https://labelbox.ghost.io/blog/data-quality-for-machine-learning/ (accessed 4.9.23).
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.