Adaptive Federated Data Cleaning with Explainability: A Basic Threshold-Driven Approach for Heterogeneous Data Environments
DOI:
https://doi.org/10.32628/IJSRSET25122207Keywords:
Federated Learning, Data Cleaning, Threshold-Based Algorithm, Explainable AI, Outlier Removal, Adaptive SystemsAbstract
Automated data cleaning is critical for ensuring data quality and robustness of machine learning models. However, modern data environments are increasingly decentralized and heterogeneous, making centralized cleaning methods less viable, particularly when privacy is a concern. In this paper, we propose a novel framework for adaptive data cleaning based on a simple threshold-driven algorithm within a federated learning context. Our approach removes outliers by employing statistical measures (mean and standard deviation) to identify anomalies across distributed nodes. Additionally, we integrate explainability features so that each cleaning decision is transparent to end users. Experimental evaluations on both synthetic and real-world datasets indicate that our method yields notable improvements in data quality while preserving user privacy. We discuss current limitations and outline future avenues for enhancing scalability and extending the framework to handle multimodal data.
Downloads
References
Cô, P.-O., Nikanjam, A., Ahmed, N., Humeniuk, D., & Khomh, F. (2023). Data Cleaning and Machine Learning: A Systematic Literature Review. arXiv preprint arXiv:2310.01765.
Lee, G. Y., Alzamil, L., Doskenov, B., & Termehchy, A. (2021). A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance. arXiv preprint arXiv:2109.07127.
Additional literature on federated learning and explainable AI.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.