IJSRSET calls volunteers interested to contribute towards the scientific development in the field of Science, Engineering and Technology

Home > IJSRSET1622136                                                     

Effective Distribution of Large Scale Datasets Clustering Based on Map Reduce


T Vignesh Kumar, M Yuvaraj, S Anusha
  • Abstract
  • Authors
  • Keywords
  • References
  • Details
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk .Data that are generated from variety of sources with massive volumes, high rates, and different data structure are collectively known as Big Data. MapReduce framework was built as a parallel distributed programming model to process such large-scale datasets effectively and efficiently. Big Data software analysis solutions were implemented on MapReduce framework, describing their datasets structures and how they were implemented with MongoDB as NoSQL Database. NoSQL encompasses a wide variety of different database technologies that were developed in response to the demands presented in building modern applications. MongoDB stores data using a flexible document data model. Documents contain one or more fields, including arrays, binary data and sub-documents.

Thus, the demand for building a service stack to distribute, manage, and process massive data sets has risen drastically. In this paper, we investigate the Big Data Broadcasting problem for a single source node to broadcast a big chunk of data to a set of nodes with the objective of minimizing the maximum completion time. Big-data computing is a new critical challenge for the ICT industry. Engineers and researchers are dealing with data sets of petabyte scale in the cloud computing paradigm.

T Vignesh Kumar, M Yuvaraj, S Anusha

Clustering, Datasets, Map Reduce, Big Data, ICT, MongoDB, NoSQL Database, ERP, LSBT, LHC

  1. R. E. Bryant, R. H. Katz, and E. D. Lazowska, “Big-data computing: Creating revolutionary break throughs in commerce, science, and society,” In Computing Research Initiatives for the 21st Century., 2008.
  2. A. Szalay and J. Gray, “2020 computing: Science in an exponential world,” Nature 440, 413-414, March, 2006.
  3. G. Brumfiel, “High-energy physics: Down the petabyte highway,” Nature 469, 282-283 January, 2011.
  4. J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Proc. of Operating Systems Design and Implementation (OSDI), 2004.
  5. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Bur- rows, T. Chandra, A. Fikes, , and R. E. Gruber, “Bigtable: A distributed storage system for structured data,” Proc. of Operating Systems Design and Implementation (OSDI), 2006.
  6. W. D. Hillis and G. L. Steele, Jr., “Data parallel algorithms,” Commu- nications of the ACM, vol. 29, pp. 1170–1183, December 1986.
  7. U. Rencuzogullari and S. Dwarkadas, “Dynamic adaptation to available resources for parallel computing in an autonomous network of worksta- tions,” Proc. of ACM SIGPLAN PPoPP, 2001.
  8. M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, “Man- aging data transfers in computer clusters with orchestra,” Proc. of ACM Special Interest Group on Data Communication (SIGCOMM), pp. 98–109, 2011.
  9. D. Nukarapu, B. Tang, L. Wang, and S. Lu, “Data replication in data intensive scientific applications with performance guarantee,” IEEE Transactions on Parallel and Distributed Systems, aug. 2011.
  10. C. Peng, M. Kim, Z. Zhang, and H. Lei, “Vdn: Virtual machine image distribution network for cloud data centers,” Proc. of IEEE International Conference on Computer Communications (INFOCOM), 2012.
  11. S. Khuller and Y.-A. Kim, “Broadcasting in heterogeneous networks,” Algorithmica, vol. 48, no. 1, Mar. 2007.
  12. J. Mundinger, R. Weber, and G. Weiss, “Optimal scheduling of peer-to- peer file dissemination,” Journal of Scheduling, vol. 11, no. 2, 2008.

Publication Details

Published in : Volume 2 | Issue 2 | March-April - 2016
Date of Publication Print ISSN Online ISSN
2016-04-30 2395-1990 2394-4099
Page(s) Manuscript Number   Publisher
505-508 IJSRSET1622136   Technoscience Academy

Cite This Article

T Vignesh Kumar, M Yuvaraj, S Anusha, "Effective Distribution of Large Scale Datasets Clustering Based on Map Reduce", International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 2, Issue 2, pp.505-508, March-April-2016.
URL : http://ijsrset.com/IJSRSET1622136.php