Big Data Processing with Data Provenance Using HDM Framework
Keywords:
Big Data, Data Flow Optimization, Provenance ManagementAbstract
Big Data applications are becoming more complex and expe-riencing frequent changes and updates. In practice, manual optimization of complex big data jobs is time-consuming and error-prone. Maintenance and management of evolving big data applications is a challenging task as well. We demon-strate HDM, Hierarchically Distributed Data Matrix, as a big data processing framework with built-in data ow op-timizations and integrated maintenance of data provenance information that supports the management of continuously evolving big data applications. In HDM, the data ow of jobs are automatically optimized based on the functional DAG representation to improve the performance during ex-ecution. Additionally, comprehensive meta-data related to explanation, execution and dependency updates of HDM ap-plications are stored and maintained in order to facilitate the debugging, monitoring, tracing and reproducing of HDM jobs and programs.
References
- P Carbone, A. Katsifodimos, S. Ewen, V. Markl,S.Haridi, and K. Tzoumas. Apache inkTM: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28{38, 2015.
- J Dean and S. Ghemawat. MapReduce: simpli ed data processing on large clusters. Commun. ACM, 51(1), 2008.
- S Sakr. Big Data 2.0 Processing Systems - A Survey. Springer Briefs in Computer Science. Springer, 2016.
- D Sculley, G. Holt, D. Golovin, E. Davydov, T.Phillips, D. Ebner, V. Chaudhary, and M. Young. Machine learning: The high interest credit card of technical debt. In SE4ML: Software Engineering for Machine Learning, 2014.
- D Wu, S. Sakr, L. Zhu, and Q. Lu. Composable and E cient Functional Big Data Processing Framework. In IEEE Big Data, 2015.
- M Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In HotCloud, 2010.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.