Federated Learning for Privacy-Preserving HR Analytics in Healthcare and Finance

Authors

  • Sudheer Devaraju  Walmart Global Tech, Banglore, India
  • Srikanth Katta  Takeda Global, Haryana, India

DOI:

https://doi.org/10.32628/IJSRSET23116180

Keywords:

Federated Learning, HR Analytics, Data Privacy, Healthcare, Finance

Abstract

HR analytics and data privacy are becoming more important, especially in high regulation industries like healthcare and finance, and AI is being used in these analytics more and more. However, centralized machine learning approaches are still traditionally based on centralizing sensitive employee data across various companies, breaking privacy rules, and enhancing security threats. In this paper, we discuss how federated learning can be a new paradigm of collaborative training of AI models across organizations without breaking data privacy. In this work, we leverage a federated learning framework to enable healthcare and finance companies to jointly train HR analytics models with data remaining locally under constraints of privacy regulations. The framework protects individual employee data in the collaborative learning process, through secure aggregation protocols, differential privacy techniques and homomorphic encryption. We evaluate the framework on real world datasets and demonstrate how the framework improves model performance and privacy preservation. We demonstrate in our federated learning results that we can achieve similar accuracy as centralized training with greatly reduced privacy risk. This research demonstrates the potential of federated learning in privacy preserving HR analytics and cross organizational collaboration in sensitive industries.

References

  1. S. T. Kavya and G. Sudheer Kumar, "A survey on machine learning algorithms for HR analytics," International Journal of Recent Technology and Engineering, vol. 8, no. 2S11, pp. 3724–3729, 2019.
  2. A. Tursunbayeva, S. Di Lauro, and C. Pagliari, "People analytics—A scoping review of conceptual boundaries and value propositions," International Journal of Information Management, vol. 43, pp. 224–247, 2018.
  3. M. Ryan and D. Watson, "The ethics of people analytics: Risks, opportunities and recommendations," Personnel Review, vol. 50, no. 3, pp. 771–783, 2020.
  4. D. Angrave, A. Charlwood, I. Kirkpatrick, M. Lawrence, and M. Stuart, "'HR and analytics: Why HR is set to fail the big data challenge," Human Resource Management Journal, vol. 26, no. 1, pp. 1–11, 2016.
  5. M. H. Jarrahi, A. Sutherland, and G. Sawyer, "Algorithmic management of work: A critical review," Academy of Management Annals, vol. 15, no. 2, pp. 719–761, 2021.
  6. F. Bélanger and R. E. Crossler, "Privacy in the digital age: A review of information privacy research in information systems," MIS Quarterly, vol. 35, no. 4, pp. 1017–1042, 2011.
  7. E. Bertino and E. Ferrari, "Big data security and privacy," in A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, pp. 425–439, Springer, 2018.
  8. M. Sharma, H. Liu, and H. Wang, "Privacy preservation techniques in big data," in Privacy and Security Policies in Big Data, pp. 83–95, IGI Global, 2019.
  9. Q. Yang, Y. Liu, T. Chen, and Y. Tong, "Federated machine learning: Concept and applications," ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–19, 2019.
  10. B. McMahan and D. Ramage, "Federated learning: Collaborative machine learning without centralized training data," Google Research Blog, vol. 3, 2017.
  11. J. Kone?ný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, "Federated learning: Strategies for improving communication efficiency," arXiv preprint arXiv:1610.05492, 2016.
  12. K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konecny, S. Mazzocchi, H. B. McMahan, et al., "Towards federated learning at scale: System design," arXiv preprint arXiv:1902.01046, 2019.
  13. A. Hard, K. Rao, R. Mathews, S. Ramaswamy, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, and D. Ramage, "Federated learning for mobile keyboard prediction," arXiv preprint arXiv:1811.03604, 2018.
  14. T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, "Federated learning: Challenges, methods, and future directions," IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
  15. S. Niknam, H. S. Dhillon, and J. H. Reed, "Federated learning for wireless communications: Motivation, opportunities, and challenges," IEEE Communications Magazine, vol. 58, no. 6, pp. 46–51, 2020.
  16. W. Dai, D. Cai, Y. Yang, and Q. Yang, "Federated learning for healthcare informatics," Journal of Healthcare Informatics Research, vol. 5, no. 1, pp. 1–19, 2021.
  17. Y. Liu, J. J. Q. Yu, J. Kang, D. Niyato, and S. Zhang, "Privacy-preserving traffic flow prediction: A federated learning approach," IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7751–7763, 2020.
  18. P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., "Advances and open problems in federated learning," arXiv preprint arXiv:1912.04977, 2019.
  19. K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, "Practical secure aggregation for privacy-preserving machine learning," in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191, 2017.
  20. C. Dwork, F. McSherry, K. Nissim, and A. Smith, "Calibrating noise to sensitivity in private data analysis," in Theory of Cryptography Conference, pp. 265–284, Springer, 2006.
  21. X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, "On the convergence of FedAvg on non-IID data," arXiv preprint arXiv:1907.02189, 2019.
  22. R. C. Geyer, T. Klein, and M. Nabi, "Differentially private federated learning: A client level perspective," arXiv preprint arXiv:1712.07557, 2017.
  23. C. Gentry, "A fully homomorphic encryption scheme," Stanford University, 2009.
  24. S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne, "Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption," arXiv preprint arXiv:1711.10677, 2017.
  25. L. Lyu, H. Yu, and Q. Yang, "Threats to federated learning: A survey," arXiv preprint arXiv:2003.02133, 2020.
  26. Z. Wang, M. Song, Z. Zhang, Y. Song, Q. Wang, and H. Qi, "Beyond inferring class representatives: User-level privacy leakage from federated learning," in IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 2512–2520, IEEE, 2019.
  27. I. Damgård, V. Pastro, N. Smart, and S. Zakarias, "Multiparty computation from somewhat homomorphic encryption," in Annual Cryptology Conference, pp. 643–662, Springer, 2012.
  28. C. Dwork and A. Roth, "The algorithmic foundations of differential privacy," Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
  29. M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, "Deep learning with differential privacy," in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318, 2016.
  30. P. Paillier, "Public-key cryptosystems based on composite degree residuosity classes," in International Conference on the Theory and Applications of Cryptographic Techniques, pp. 223–238, Springer, 1999.
  31. A. Trask, B. Thorne, D. Mané, and P. Pascanu, "PySyft: A decentralized privacy preserving deep learning framework," arXiv preprint arXiv:1905.01851, 2019.
  32. M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, "Deep learning with differential privacy," in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318, 2016.
  33. Y. Cheng, D. Wang, P. Zhou, and T. Zhang, "Model compression and acceleration for deep neural networks: The principles, progress, and challenges," IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 126–136, 2018.
  34. J. Kone?ný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, "Federated learning: Strategies for improving communication efficiency," arXiv preprint arXiv:1610.05492, 2016.
  35. Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, "Federated learning with non-IID data," arXiv preprint arXiv:1806.00582, 2018.
  36. S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2009.
  37. Y. Mansour, M. Mohri, and A. Rostamizadeh, "Domain adaptation with multiple sources," in Advances in Neural Information Processing Systems, pp. 1041–1048, 2009.
  38. A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, "Analyzing federated learning through an adversarial lens," in International Conference on Machine Learning, pp. 634–643, PMLR, 2019.
  39. V. Costan and S. Devadas, "Intel SGX explained," IACR Cryptology ePrint Archive, vol. 2016, no. 86, pp. 1–118, 2016.
  40. S. Goldwasser, S. Micali, and C. Rackoff, "The knowledge complexity of interactive proof systems," SIAM Journal on Computing, vol. 18, no. 1, pp. 186–208, 1989.

Downloads

Published

2023-11-16

Issue

Section

Research Articles

How to Cite

[1]
Sudheer Devaraju, Srikanth Katta "Federated Learning for Privacy-Preserving HR Analytics in Healthcare and Finance" International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 10, Issue 6, pp.415-423, November-December-2023. Available at doi : https://doi.org/10.32628/IJSRSET23116180