Manuscript Number : IJSRSET1623136
A Survey on Hadoop Storage Issues
Authors(2) :-Reetesh Rai, Shravan Kumar
Hadoop is an open source implementation of Google’s MapReduce framework. MapReduce is the heart of the apache’s Hadoop. The file system which is used by the Hadoop for storing the files is known as Hadoop distributed file system (HDFS) which is an open source implementation of the google file system (GFS). Hadoop allows the parallel processing of the large data sets by splitting the larger data set into smaller partitions and each partition is fed to the separate task in the data node by the job tracker. The data node is the node where the data actually resides. The task tracker resides on the data node and it runs the tasks and also reports the status of the tasks to the job tracker. In a MapReduce, the slowest running task decides the job completion time. If the task is slower, it delays the progress of the entire job. This slowest running task is known as the straggler. There can be many reasons for the straggler to occur. One of the reasons is the data skew. This paper reviews the different types of the data skew, where in MapReduce data skew can occur and what is the measure taken to overcome these problems.
Reetesh Rai
Mapreduce, HDFS, Straggler, Data Skew
Publication Details
Published in :
Volume 2 | Issue 3 | May-June 2016 Article Preview
LNCT, Jabalpur, Madhya Pradesh, India
Shravan Kumar
LNCT, Jabalpur, Madhya Pradesh, India
Date of Publication :
2016-06-30
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) :
499-505
Manuscript Number :
IJSRSET1623136
Publisher : Technoscience Academy
Journal URL :
https://ijsrset.com/IJSRSET1623136