Big Data Processing with Data Provenance Using HDM Framework

Authors(2) :-Rajat Bodankar, Roshani Talmale

Big Data applications are becoming more complex and expe-riencing frequent changes and updates. In practice, manual optimization of complex big data jobs is time-consuming and error-prone. Maintenance and management of evolving big data applications is a challenging task as well. We demon-strate HDM, Hierarchically Distributed Data Matrix, as a big data processing framework with built-in data ow op-timizations and integrated maintenance of data provenance information that supports the management of continuously evolving big data applications. In HDM, the data ow of jobs are automatically optimized based on the functional DAG representation to improve the performance during ex-ecution. Additionally, comprehensive meta-data related to explanation, execution and dependency updates of HDM ap-plications are stored and maintained in order to facilitate the debugging, monitoring, tracing and reproducing of HDM jobs and programs.

Authors and Affiliations

Rajat Bodankar
M.Tech Scholar, Department of Computer Science and Engineering Tulsiramji Gaikwad-Patil College of Engineering and Technology Nagpur, Maharashtra, India
Roshani Talmale
Project Guide Dept. of Computer Science and Engineering Tulsiramji Gaikwad-Patil College of Engineering and Technology Nagpur, Maharashtra, India

Big Data, Data Flow Optimization, Provenance Management

  1. P Carbone, A. Katsifodimos, S. Ewen, V. Markl,S.Haridi, and K. Tzoumas. Apache inkTM: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28{38, 2015.
  2. J Dean and S. Ghemawat. MapReduce: simpli ed data processing on large clusters. Commun. ACM, 51(1), 2008.
  3. S Sakr. Big Data 2.0 Processing Systems - A Survey. Springer Briefs in Computer Science. Springer, 2016.
  4. D Sculley, G. Holt, D. Golovin, E. Davydov, T.Phillips, D. Ebner, V. Chaudhary, and M. Young. Machine learning: The high interest credit card of technical debt. In SE4ML: Software Engineering for Machine Learning, 2014.
  5. D Wu, S. Sakr, L. Zhu, and Q. Lu. Composable and E cient Functional Big Data Processing Framework. In IEEE Big Data, 2015.
  6. M Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In HotCloud, 2010.

Publication Details

Published in : Volume 4 | Issue 6 | January-February 2018
Date of Publication : 2018-02-28
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 210-214
Manuscript Number : IJSRSET1848154
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Rajat Bodankar, Roshani Talmale, " Big Data Processing with Data Provenance Using HDM Framework, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 6, pp.210-214, January-February-2018.
Journal URL :

Article Preview

Follow Us

Contact Us