A Survey of Approaches for Hadoop with Clustering Techniques

Authors

  • Prof. Abhishek Pandey  Takshshila Institute of Engineering and Technology, Jabalpur, Madhya Pradesh, India
  • Ankita Malviya  Takshshila Institute of Engineering and Technology, Jabalpur, Madhya Pradesh, India

Keywords:

Big Data, Clustering, Data Mining

Abstract

Data mining environment creates a lot of information, that should be investigated, examples must be removed from that to pick up learning. In this new period with blast of information both organized and unstructured, in the field of genomics, meteorology, science, ecological examination and numerous others, it has gotten to be hard to process, oversee and break down examples utilizing customary databases and architectures. Thus, a legitimate structural engineering ought to be comprehended to pick up information about the Big Data.This paper presents a review of various algorithms from necessary for handling such large data set. These algorithms define various structures and methods implemented to handle Big Data, also in the paper are listed various tool that were developed for analyzing them.

References

  1. "Big Data for Development: Challenges and Opportunities", Global Pulse, May 2016
  2. Joseph McKendrick, "Big Data, Big Challenges, Big Opportunities: 2012 IOUG Big Data Strategies Survey", IOUG, Sept 2016
  3. Nigel Wallis, "Big Data in Canada: Challenging Complacency for Competitive Advantage", IDC, Dec 2017
  4. Ivanka Valova, Monique Noirhomme, "Processing Of Large Data Sets: Evolution, Opportunities And Challenges", Proceedings of PCaPAC17
  5. Neha Saxena, Niket Bhargava, Urmila Mahor, Nitin Dixit, "An Efficient Technique on Cluster Based Master Slave Architecture Design", Fourth International Conference on Computational Intelligence and Communication Networks, 2016
  6. Edmon Begoli, James Horey, "Design Principles for Effective Knowledge Discovery from Big Data", Joint Working Conference on Software Architecture & 6th European Conference on Software Architecture, 2017
  7. Kapil Bakshi, "Considerations for Big Data: Architecture and Approach", IEEE, 2017
  8. N. Beckmann, H. -P. Kriegal, R. Schneider, and B. Seeger, "The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles," Proc. ACM SIGMOD, May 2016
  9. S. Arya, D. Mount, N. Netanyahu, R. Silverman, A. Wu, "An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions, " Proc. Fifth Symp. Discrete Algorithm (SODA), 2017, pp. 573-582.
  10. Lawrence 0. Hall, Nitesh Chawla , Kevin W. Bowyer, "Decision Tree Learning on Very Large Data Sets", IEEE, Oct 2016
  11. Zhiwei Fu, Fannie Mae, "A Computational Study of Using Genetic Algorithms to Develop Intelligent Decision Trees", Proceedings of the 2001 IEEE congress on evolutionary computation, 2016.
  12. Mr. D. V. Patil, Prof. Dr. R. S. Bichkar, "A Hybrid Evolutionary Approach To Construct Optimal Decision Trees with Large Data Sets", IEEE, 2016
  13. Guillermo Sinchez-Diaz , Jose Ruiz-Shulcloper, "A Clustering Method for Very Large Mixed Data Sets", IEEE, 2017
  14. Mehmet Koyuturk, Ananth Grama, and Naren Ramakrishnan, "Compression, Clustering, and Pattern Discovery in very High-Dimensional Discrete-Attribute Data Sets", IEEE Transactions On Knowledge And Data Engineering, April 2005, Vol. 17, No. 4
  15. Emily Namey, Greg Guest, Lucy Thairu, Laura Johnson, "Data Reduction Techniques for Large Qualitative Data Sets", 2007
  16. Moshe Looks, Andrew Levine, G. Adam Covington, Ronald P. Loui, John W. Lockwood, Young H. Cho, "Streaming Hierarchical Clustering for Concept Mining", IEEE, 2007
  17. Yen-ling Lu, chin-shyurng fahn, "Hierarchical Artificial Neural Networks For Recognizing High Similar Large Data Sets. ", Proceedings of the Sixth International Conference on Machine Learning and Cybernetics, August 2007
  18. Archana Singh, Megha Chaudhary, Dr (Prof.) Ajay Rana, Gaurav Dubey, "Online Mining of data to Generate Association Rule Mining in Large Databases", International Conference on Recent Trends in Information Systems, 2011
  19. David N. Reshef et al.,"Detecting Novel Associations in Large Data Sets", Science AAAS, 2011, Science 334
  20. Shuliang Wang, Wenyan Gan, Deyi Li, Deren Li "Data Field For Hierarchical Clustering", International Journal of Data Warehousing and Mining, Dec. 2011
  21. Tatiana V. Karpinets, Byung H.Park, Edward C. Uberbacher, "Analyzing large biological datasets with association network", Nucleic Acids Research, 2012
  22. M. Vijayalakshmi, M. Renuka Devi, "A Survey of Different Issues of Different Clustering Algorithms used in Large Data Sets", International Journal of Advanced Research in Computer Science and Software Engineering, March 2012
  23. Subashini S, Dr. Kavitha V, "A Metadata Based Storage Model For Securing Data In Cloud Environment", International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2011
  24. Vanja Kontak, Siniša Srblji?, Dejan Škvorc, "Hashing Scheme for Space-efficient Detection and Localization of Changes in Large Data Sets", MIPRO 2012, May 2012
  25. Matthew Smith, Christian Szongott, Benjamin Henne, Gabriele von Voigt, "Big Data Privacy Issues in Public Social Media", IEEE, 2013
  26. "Big data: The next frontier for innovation, competition, and productivity", McKinsey& Company, June 2011 "Challenges and Opportunities with Big Data", 2012
  27. Trevor Hastie, Robert Tibshirani, Jerome Friedman, "The Elements of Statistical Learning: Data Mining, Inference, and Prediction", Springer, 2nd edition, 2008 "2012 Big Data Survey Results", Treasure Data, 2012

Downloads

Published

2018-07-30

Issue

Section

Research Articles

How to Cite

[1]
Prof. Abhishek Pandey, Ankita Malviya, " A Survey of Approaches for Hadoop with Clustering Techniques, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 9, pp.384-389, July-August-2018.