Data mining technology is not only composed by efficient and effective algorithms, executed as standalone kernels. Rather, it is constituted by complex applications articulated in the non trivial interaction among hardware and software components, running on large scale distributed environments. This last feature turns out to be both the cause and the effect of the inherently distributed nature of data, on one side, and, on the other side, of the spatiotemporal complexity that characterizes many DM applications. For a growing number of application fields, Distributed Data Mining (DDM) is therefore a critical technology. In this research paper, after reviewing the open problems in DDM, we describe the DM jobs on Grid environments. We will introduce the design of Knowledge Grid System.
Vishal Bhemwala, Bhavesh Patel, Dr. Ashok Patel
Data Mining, Knowledge Grid, Distributed Data Mining
- M. Cannataro, C. Mastroianni, D. Talia, and Trunfio P. Evaluating and enhancing the use of the gridftp protocol for efficient data transfer on the grid. In Proc. of the 10th Euro PVM/MPI Users’ Group Conference, 2003.
- A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. The Data Grid: towards an architecture for the distributed management and analysis of large scientific datasets. J. of Network and Comp. Appl., (23):187–200, 2001.
- I. Foster and C. Kasselman. The Grid: blueprint for a future infrastructure. Morgan Kaufman, 1999.
- Bart Goethals. Efficient Frequent Itemset Mining. PhD thesis, Limburg University, Belgium, 2003.
- W. Allcock, J. Bester, J. Bresnahan, A. Chervenak, L. Liming, S. Meder, and S. Tuecke. Gridftp protocol specification. Technical report, GGF GridFTP Working Group Document, 2002.
- R. L. Grossman and R. Hollebeek. Handbook of Massive Data Sets, chapter The National Scalable Cluster Project: Three Lessons about High Performance Data Mining and Data Intensive Computing. Kluwer Academic Publishers, 2002.
- H. Kargupta, W. S. K. Huang, and E. Johnson. Distributed clustering using collective principal components analysis. Knowledge and Information Systems Journal, 2001.
- H. Kargupta, B. Park, E. Johnson, E. Sanseverino, L. Silvestre, and D. Hershberger. Collective data mining from distributed vertically partitioned feature space. In Proc. of Workshop on distributed data mining, International Conference on Knowledge Discovery and Data Mining, 1998.
- M. Marzolla and P. Palmerini. Simulation of a grid scheduler for data mining. Esame per il corso di dottorato in informativa, Universita’ Ca’ Foscari, Venezia, 2002.
- C. L. Parkinson and R. Greenstonen, editors. EOS Data Products Handbook. NASA Goddard Space Flight Center, 2000.
- A. L. Prodromidis, P. K. Chan, and S. J. Stolfo. Meta-learning in distributed data mining systems: Issues and approaches. In Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT Press, 2000.
|Published in :
||Volume 2 | Issue 1 | January-Febuary - 2016
|Date of Publication
Cite This Article
Vishal Bhemwala, Bhavesh Patel, Dr. Ashok Patel, "Distributed Data Mining: Implementing Data Mining Jobs on Grid Environments", International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 2, Issue 1, pp.327-332, January-Febuary-2016.
URL : http://ijsrset.com/IJSRSET162168.php