Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

Authors

  • Varsha B.Bobade  Department of Computer Engineering, JSPMs Imperial College of Engineering &Research, Wagholi, Pune, Maharastra, India

Keywords:

Big Data, Hadoop Framework, Online Aggregation, Combiners.

Abstract

Big Data is term that refers to data sets whose size (volume), complexity (variability), and rate of growth (velocity) make them difficult to capture, manage, process or analyzed. To analyze this enormous amount of data Hadoop can be used. Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers.

I proposed a modified MapReduce architecture that allows data to be pipelined between operators. This reduces completion times and improve system utilization for batch jobs as well. I present a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see early returns from a job as it is being computed. The objective of the proposed technique is to signicantly improve the performance of Hadoop MapReduce for efficient Big Data processing.

References

  1. Vikram Phaneendra and E. Madhusudhan Reddy, Big Data- solutions for RDBMS problems- A survey, In 12th IEEE/IFIP Network Operations and Management Symposium (NOMS 2010) (Osaka, Japan, Apr 19, 2013)
  2. Kiran kumara Reddi & Dnvsl Indira, Different Technique to Transfer Big Data : survey, IEEE Transactions on 52(8) (Aug.2013) 2348 2355
  3. Jimmy Lin MapReduce Is Good Enough?, The control project. IEEE Computer 32 (2013).
  4. Jiawei Han and Micheline Kamber, Classification and Prediction in Data Mining: Concepts and Techniques, 2nd ed., San Francisco, CA The Morgan Kaufmann, 2006.
  5. Laptev, K. Zeng, and C. Zaniolo, Early accurate results for advanced analytics on mapreduce, vol. 5, no. 10. VLDB Endowment, 2012, pp.10281039.
  6. Report from Pike research, http://www.pikeresearch.com/research/smartgrid-data-analytics.
  7. National Climate Data Center Online]. Available:http://www.ncdc.noaa.gov/oa/ncdc.html
  8. Borthakur, The Hadoop Distributed File System: Architecture and Design, 2007.
  9. Hellerstein, P. Haas, and H. Wang,Online aggregation, In SIGMOD Conference, pages 171182, 1997.
  10. Jermaine, S. Arumugam, A. Pol, and A. Dobra, Scalable approximate query processing with the dbo engine, In SIGMOD Conference, pages 725736, 2007.
  11. The apache hadoop project page, http://hadoop.apache.org/, 2013, last visited on 1 May, 2013.
  12. Dean and S. Ghemawat, Mapreduce: simplied data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp. 107 113,2008.
  13. Agarwal, A. Panda, B. Mozafari, S. Madden, and I. Stoica, Blinkdb: Queries with bounded errors and bounded response times on very large data, in ACM EuroSys 2013, 2013.
  14. Pansare, V. R. Borkar, C. Jermaine, and T. Condie, Online aggregation for large mapreduce jobs, vol. 4, no. 11, 2011, pp. 11351145.

Downloads

Published

2016-10-30

Issue

Section

Research Articles

How to Cite

[1]
Varsha B.Bobade, " Implementation of Aggregation of Map and Reduce Function for Performance Improvisation, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 2, Issue 5, pp.196-201, September-October-2016.