A Survey of Approaches for Hadoop with Clustering Techniques

Prof. Abhishek Pandey; Ankita Malviya

doi:10.32628/IJSRSET184978

Authors

Prof. Abhishek Pandey Takshshila Institute of Engineering and Technology, Jabalpur, Madhya Pradesh, India
Ankita Malviya Takshshila Institute of Engineering and Technology, Jabalpur, Madhya Pradesh, India

Keywords:

Big Data, Clustering, Data Mining

Abstract

Data mining environment creates a lot of information, that should be investigated, examples must be removed from that to pick up learning. In this new period with blast of information both organized and unstructured, in the field of genomics, meteorology, science, ecological examination and numerous others, it has gotten to be hard to process, oversee and break down examples utilizing customary databases and architectures. Thus, a legitimate structural engineering ought to be comprehended to pick up information about the Big Data.This paper presents a review of various algorithms from necessary for handling such large data set. These algorithms define various structures and methods implemented to handle Big Data, also in the paper are listed various tool that were developed for analyzing them.

References

"Big Data for Development: Challenges and Opportunities", Global Pulse, May 2016
Joseph McKendrick, "Big Data, Big Challenges, Big Opportunities: 2012 IOUG Big Data Strategies Survey", IOUG, Sept 2016
Nigel Wallis, "Big Data in Canada: Challenging Complacency for Competitive Advantage", IDC, Dec 2017
Ivanka Valova, Monique Noirhomme, "Processing Of Large Data Sets: Evolution, Opportunities And Challenges", Proceedings of PCaPAC17
Neha Saxena, Niket Bhargava, Urmila Mahor, Nitin Dixit, "An Efficient Technique on Cluster Based Master Slave Architecture Design", Fourth International Conference on Computational Intelligence and Communication Networks, 2016
Edmon Begoli, James Horey, "Design Principles for Effective Knowledge Discovery from Big Data", Joint Working Conference on Software Architecture & 6th European Conference on Software Architecture, 2017
Kapil Bakshi, "Considerations for Big Data: Architecture and Approach", IEEE, 2017
N. Beckmann, H. -P. Kriegal, R. Schneider, and B. Seeger, "The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles," Proc. ACM SIGMOD, May 2016
S. Arya, D. Mount, N. Netanyahu, R. Silverman, A. Wu, "An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions, " Proc. Fifth Symp. Discrete Algorithm (SODA), 2017, pp. 573-582.
Lawrence 0. Hall, Nitesh Chawla , Kevin W. Bowyer, "Decision Tree Learning on Very Large Data Sets", IEEE, Oct 2016
Zhiwei Fu, Fannie Mae, "A Computational Study of Using Genetic Algorithms to Develop Intelligent Decision Trees", Proceedings of the 2001 IEEE congress on evolutionary computation, 2016.
Mr. D. V. Patil, Prof. Dr. R. S. Bichkar, "A Hybrid Evolutionary Approach To Construct Optimal Decision Trees with Large Data Sets", IEEE, 2016
Guillermo Sinchez-Diaz , Jose Ruiz-Shulcloper, "A Clustering Method for Very Large Mixed Data Sets", IEEE, 2017
Mehmet Koyuturk, Ananth Grama, and Naren Ramakrishnan, "Compression, Clustering, and Pattern Discovery in very High-Dimensional Discrete-Attribute Data Sets", IEEE Transactions On Knowledge And Data Engineering, April 2005, Vol. 17, No. 4
Emily Namey, Greg Guest, Lucy Thairu, Laura Johnson, "Data Reduction Techniques for Large Qualitative Data Sets", 2007
Moshe Looks, Andrew Levine, G. Adam Covington, Ronald P. Loui, John W. Lockwood, Young H. Cho, "Streaming Hierarchical Clustering for Concept Mining", IEEE, 2007
Yen-ling Lu, chin-shyurng fahn, "Hierarchical Artificial Neural Networks For Recognizing High Similar Large Data Sets. ", Proceedings of the Sixth International Conference on Machine Learning and Cybernetics, August 2007
Archana Singh, Megha Chaudhary, Dr (Prof.) Ajay Rana, Gaurav Dubey, "Online Mining of data to Generate Association Rule Mining in Large Databases", International Conference on Recent Trends in Information Systems, 2011
David N. Reshef et al.,"Detecting Novel Associations in Large Data Sets", Science AAAS, 2011, Science 334
Shuliang Wang, Wenyan Gan, Deyi Li, Deren Li "Data Field For Hierarchical Clustering", International Journal of Data Warehousing and Mining, Dec. 2011
Tatiana V. Karpinets, Byung H.Park, Edward C. Uberbacher, "Analyzing large biological datasets with association network", Nucleic Acids Research, 2012
M. Vijayalakshmi, M. Renuka Devi, "A Survey of Different Issues of Different Clustering Algorithms used in Large Data Sets", International Journal of Advanced Research in Computer Science and Software Engineering, March 2012
Subashini S, Dr. Kavitha V, "A Metadata Based Storage Model For Securing Data In Cloud Environment", International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2011
Vanja Kontak, Siniša Srblji?, Dejan Škvorc, "Hashing Scheme for Space-efficient Detection and Localization of Changes in Large Data Sets", MIPRO 2012, May 2012
Matthew Smith, Christian Szongott, Benjamin Henne, Gabriele von Voigt, "Big Data Privacy Issues in Public Social Media", IEEE, 2013
"Big data: The next frontier for innovation, competition, and productivity", McKinsey& Company, June 2011 "Challenges and Opportunities with Big Data", 2012
Trevor Hastie, Robert Tibshirani, Jerome Friedman, "The Elements of Statistical Learning: Data Mining, Inference, and Prediction", Springer, 2nd edition, 2008 "2012 Big Data Survey Results", Treasure Data, 2012

A Survey of Approaches for Hadoop with Clustering Techniques

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite