Random Forest and CatBoost with Handling Imbalanced Class for Detection of Risk Factors Anemia in Children (5-12 Years)

Authors

  • Ditia Yosmita Praptiwi Department of Statistics, IPB University, Bogor, Indonesia Author
  • Dr. Anang Kurnia, S.Si., M.Si Department of Statistics, IPB University, Bogor, Indonesia Author
  • Dr. Anwar Fitrianto, S.Si., M.Sc Department of Statistics, IPB University, Bogor, Indonesia Author
  • Dr. Fitrah Ernawati, M.Sc. National Research and Innovation Agency, Bogor, Indonesia Author

DOI:

https://doi.org/10.32628/IJSRSET24113134

Keywords:

Anemia, Ensemble Learning, G-SMOTE, Imbalanced Class, SHAP

Abstract

The prevalence of anemia in children (5-12 years) remains a public health issue in Indonesia. Early detection and control of risk factors are crucial for prevention. Machine learning models can be employed to address this problem. One practical approach is using ensemble learning models. However, it is expected to encounter imbalanced class problems when analyzing health data. Therefore, this study aims to perform classification modeling using two ensemble learning models: Random Forest (RF) and CatBoost. The proposed methods for handling imbalanced class issues include Random Over Sampling, SMOTE, G-SMOTE, Random Under Sampling, Instance Hardness Threshold (IHT), and SMOTE-ENN. Additionally, SHAP is used to explain the best-performing model based on Shapley values. The research findings indicate that the ensemble learning model using the CatBoost algorithm with G-SMOTE data handling produces the best performance compared to other methods. Based on the average performance metrics from 100 replicate validation, the CatBoost G-SMOTE model produces a sensitivity of 0.7104, specificity of 0.7043, G-Mean of 0.7067, and AUC of 0.7844. Handling the imbalance class problem using the G-SMOTE method effectively increases the sensitivity value in the two proposed ensemble learning models. Meanwhile, the SMOTE-ENN method produces effective G-Mean values for the Random Forest (RF) algorithms. Based on Shapley's value, the features with the highest contribution to predicting anemia in children (5-12 years) are ferritin, vitamin A, consumption of vegetables, diagnosed pneumonia, zinc, calcium total, and consumption of soft or carbonated drinks.

Downloads

Download data is not yet available.

References

F. Ofori, E. Maina, and R. Gitonga, “Using Machine Learning Algorithms to Predict Students’ Performance and Improve Learning Outcome: A Literature Based Review,” J. Inf. Technol., vol. 4, no. 1, pp. 2616–3573, 2020, [Online]. Available: https://stratfordjournals.org/journals/index.php/Journal-of-Information-and-Techn/article/view/480

P. Vuttipittayamongkol, E. Elyan, and A. Petrovski, “On the class overlap problem in imbalanced data classification,” Knowledge-Based Syst., vol. 212, p. 106631, 2021, doi: 10.1016/j.knosys.2020.106631. DOI: https://doi.org/10.1016/j.knosys.2020.106631

R. Hassanzadeh, M. Farhadian, and H. Rafieemehr, “Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms,” BMC Med. Res. Methodol., vol. 23, no. 1, pp. 1–15, 2023, doi: 10.1186/s12874-023-01920-w. DOI: https://doi.org/10.1186/s12874-023-01920-w

G. Douzas and F. Bacao, “Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE,” Inf. Sci. (Ny)., vol. 501, pp. 118–135, 2019, doi: 10.1016/j.ins.2019.06.007. DOI: https://doi.org/10.1016/j.ins.2019.06.007

M. R. Smith, T. Martinez, and C. Giraud-Carrier, “An instance level analysis of data complexity,” Mach. Learn., vol. 95, no. 2, pp. 225–256, 2014, doi: 10.1007/s10994-013-5422-z. DOI: https://doi.org/10.1007/s10994-013-5422-z

N. A. Verdikha, T. B. Adji, and A. E. Permanasari, “Study of Undersampling Method: Instance Hardness Threshold with Various Estimators for Hate Speech Classification,” IJITEE (International J. Inf. Technol. Electr. Eng., vol. 2, no. 2, 2018, doi: 10.22146/ijitee.42152. DOI: https://doi.org/10.22146/ijitee.42152

J. Wang, “Prediction of postoperative recovery in patients with acoustic neuroma using machine learning and SMOTE-ENN techniques,” Math. Biosci. Eng., vol. 19, no. 10, pp. 10407–10423, 2022, doi: 10.3934/mbe.2022487. DOI: https://doi.org/10.3934/mbe.2022487

O. Sagi and L. Rokach, “Ensemble learning: A survey,” WIREs Data Min. Knowl. Discov. , vol. 8, no. 4, p. e1249, 2018, doi: https://doi.org/10.1002/widm.1249. DOI: https://doi.org/10.1002/widm.1249

S. Misra and H. Li, Noninvasive fracture characterization based on the classification of sonic wave travel times. Elsevier Inc., 2019. doi: 10.1016/B978-0-12-817736-5.00009-0. DOI: https://doi.org/10.1016/B978-0-12-817736-5.00009-0

M. M. Islam et al., “Risk Factors Identification and Prediction of Anemia Among Women in Bangladesh Using Machine Learning Techniques,” Current Women`s Health Reviews, vol. 18, no. 1. pp. 22–37, 2022. doi: http://dx.doi.org/10.2174/1573404817666210215161108. DOI: https://doi.org/10.2174/1573404817666210215161108

L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: Unbiased boosting with categorical features,” Adv. Neural Inf. Process. Syst., vol. 2018-Decem, no. Section 4, pp. 6638–6648, 2018.

C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” Artif. Intell. Rev., vol. 54, no. 3, pp. 1937–1967, 2021, doi: 10.1007/s10462-020-09896-5. DOI: https://doi.org/10.1007/s10462-020-09896-5

B. E. Dejene, T. M. Abuhay, and D. S. Bogale, “Predicting the level of anemia among Ethiopian pregnant women using homogeneous ensemble machine learning algorithm,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, pp. 1–11, 2022, doi: 10.1186/s12911-022-01992-6. DOI: https://doi.org/10.1186/s12911-022-01992-6

Kemenkes,Report the 2018 Indonesian Basic Health Research (RISKESDAS).pdf. 2019. [Online]. Available: https://www.badankebijakan.kemkes.go.id/laporan-hasil-survei/%0Ahttps://repository.badankebijakan.kemkes.go.id/id/eprint/3514/

N. Ruaida, W. Sammeng, and M. K. Haluruk, “Dietary Patterns and Nutritional Status of Elementary School Children at SD Inpres 36 Rumah Tiga,” Ghidza Journal Nutrition and Health, vol. 7, no. 2, pp. 305–315, 2023, doi: 10.22487/ghidza.v7i2.1022. DOI: https://doi.org/10.22487/ghidza.v7i2.1022

D. Yanti, I. Irwanto, and A. Wibowo, “The Influence of Hemoglobin (Hb) Levels on the Academic Achievement of School-Age Children in Grades II-VI at SDN Sonoageng 6 Prambon Nganjuk,” Indones. J. Public Heal., vol. 12, no. 1, p. 97, 2017, doi: 10.20473/ijph.v12i1.2017.97-105. DOI: https://doi.org/10.20473/ijph.v12i1.2017.97-105

WHO, Haemoglobin concentrations for the diagnosis of anaemia and assessment of severity. 2011. doi: 2011.

D. J. Millward, “Nutrition, infection and stunting: The roles of deficiencies of individual nutrients and foods, and of inflammation, as determinants of reduced linear growth of children,” Nutr. Res. Rev., vol. 30, no. 1, pp. 50–72, 2017, doi: 10.1017/S0954422416000238. DOI: https://doi.org/10.1017/S0954422416000238

A. Z. Orsango, W. Habtu, T. Lejisa, E. Loha, B. Lindtjørn, and I. M. S. Engebretsen, “Iron deficiency anemia among children aged 2–5 years in southern Ethiopia: A community-based cross-sectional study,” PeerJ, pp. 1–19, 2021, doi: 10.7717/peerj.11649. DOI: https://doi.org/10.7717/peerj.11649

O. N. Sahana and S. Sumarmi, “Relationship between Micronutrient Intake and Hemoglobin Levels in Women of Reproductive Age,” Media Nutrition Indones., vol. 10, no. 2, pp. 184–191, 2017, doi: 10.20473/mgi.v10i2.184-191. DOI: https://doi.org/10.20473/mgi.v10i2.184-191

I. N. Ayuningtyas, A. F. A. Tsani, A. Candra, and F. F. Dieny, “Analysis of Heme and Non-Heme Iron Intake, Vitamin B12 and Folate and Intake of Iron Enhancers and Inhibitors Based on Anemia Status in Female Students,” J. Nutr. Coll., vol. 11, no. 2, pp. 171–181, 2022, doi: 10.14710/jnc.v11i2.32197. DOI: https://doi.org/10.14710/jnc.v11i2.32197

H. M. Abdel-Maksoud, K. A. Hasan, and M. A. Helwa, “Evaluation of iron deficiency anemia as a predisposing factor in the occurrence of pneumonia in children,” Trends Med. Res., vol. 11, no. 2, pp. 69–75, 2016, doi: 10.3923/tmr.2016.69.75. DOI: https://doi.org/10.3923/tmr.2016.69.75

V. Greffeuille et al., “Associations between Zinc and Hemoglobin Concentrations in Preschool Children and Women of Reproductive Age: An Analysis of Representative Survey Data from the Biomarkers Reflecting Inflammation and Nutritional Determinants of Anemia (BRINDA) Project,” J. Nutr., vol. 151, no. 5, pp. 1277–1285, 2021, doi: 10.1093/jn/nxaa444. DOI: https://doi.org/10.1093/jn/nxaa444

M. G. T. and J. F. S. Wager, NIH Public Access,” Bone, vol. 23, no. 1, pp. 1–7, 2011, doi: 10.1016/j.pharmthera.2013.01.016.Structure.

T. Peters, L. Apt, and J. F. Ross, “Effect of phosphates upon iron absorption studied in normal human subjects and in an experimental model using dialysis.,” Gastroenterology, vol. 61, no. 3, pp. 315–322, 1971, doi: 10.1016/s0016-5085(19)33527-9. DOI: https://doi.org/10.1016/S0016-5085(19)33527-9

Downloads

Published

05-06-2024

Issue

Section

Research Articles

How to Cite

[1]
Ditia Yosmita Praptiwi, Anang Kurnia, Anwar Fitrianto, and Fitrah Ernawati, “Random Forest and CatBoost with Handling Imbalanced Class for Detection of Risk Factors Anemia in Children (5-12 Years)”, Int J Sci Res Sci Eng Technol, vol. 11, no. 3, pp. 302–312, Jun. 2024, doi: 10.32628/IJSRSET24113134.

Similar Articles

1-10 of 38

You may also start an advanced similarity search for this article.