Privacy Preserving Collaborative Model Document Clustering Using TF-IDF Approach
Keywords:
Continuous Bag of words, Skip-gram model, Google DeepMind, AlphaGo, Healthcare providerAbstract
With the expanded popularity of public computing infrastructures (e.g., cloud platform), it has been more advantageous than any other time in recent days for distributed users (across the Internet) to perform collaborative learning through the shared infrastructure. While the potential advantages of (collective) machine learning can be gigantic, and the large-scale training data may posture generous privacy risks. In other words, centralized collection of data from different participants may raise great concerns in data confidentiality and privacy. For instance, in certain application scenarios such as healthcare, individuals/patients may not reveal their sensitive information (e.g., protected health data) to any other person, and the exposure of such exclusive information is prohibited by the laws or controls of HIPAA1. To manage such privacy issues, a clear approach is to encode sensitive information before sharing it. However, data encryption hinders data utilization and computation, making it hard to proficiently perform (community) machine learning compared with the case in plaintext domain.
References
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., "Mastering the game of go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484–489, 2016.
- R. L. Rivest, L. Adleman, and M. L. Dertouzos, "On data banks and privacy homomorphisms," Foundations of secure computation, vol. 4, no. 11, pp. 169–180, 1978.
- V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and N. Taft, "Privacy-preserving ridge regression on hundreds of millions of records," in Proc. of S&P’13. IEEE, 2013, pp. 334–348.
- A. C. Yao, "Protocols for secure computations," in Proc. of FOCS’82. IEEE, 1982, pp. 160–164.
- V. Nikolaenko, S. Ioannidis, U. Weinsberg, M. Joye, N. Taft, and D. Boneh, "Privacy-preserving matrix factorization," in Proc. Of CCS’13. ACM, 2013, pp. 801–812.
- S. Kim, J. Kim, D. Koo, Y. Kim, H. Yoon, and J. Shin, "Efficient privacy-preserving matrix factorization via fully homomorphic encryption," in Proc. of AsiaCCS’16. ACM, 2016, pp. 617–628.
- R. Bost, R. A. Popa, S. Tu, and S. Goldwasser, "Machine learning classification over encrypted data," in Proc. of NDSS’15, 2015.
- N. Dowlin, R. Gilad-Bachrach, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing, "Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy," in Proc. Of ICML’16, vol. 48, 2016, pp. 201–210.
- C. Dwork, "Differential privacy," in Proc. of ICALP’06. Springer, 2006, pp. 1–12.
- K. Chaudhuri, A. D. Sarwate, and K. Sinha, "A near-optimal algorithm for differentially-private principal components." Journal of Machine Learning Research, vol. 14, no. 1, pp. 2905–2943, 2013.
- J. Zhang, Z. Zhang, X. Xiao, Y. Yang, and M. Winslett, "Functional mechanism: regression analysis under differential privacy," Proc. of VLDB’12, vol. 5, no. 11, pp. 1364–1375, 2012.
- R. Shokri and V. Shmatikov, "Privacy-preserving deep learning," in Proc. of CCS’15. ACM, 2015, pp. 1310–1321.
- M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, "Deep learning with differential privacy," in Proc. of CCS’16. ACM, 2016, pp. 308–318.
- Y. Elmehdwi, B. K. Samanthula, and W. Jiang, "Secure k-nearest neighbor query over encrypted data in outsourced environments," in Proc. of ICDE’14. IEEE, 2014, pp. 664–675.
- O. Goldreich, Foundations of cryptography: volume 2, basic applications. Cambridge university press, 2009.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.