FIDOOP-DP : Implementation of Data Partitioning in Frequent Itemset on Bigdata using Hadoop Pseudo Distributed Environment

V. R. B. Rohini; Dr. G. P. Saradhi Varma

doi:10.32628/IJSRSET1738198

Authors

V. R. B. Rohini PG Scholar (M.Tech), Department of information technology, Sagi Ramakrishnam Raju Engineering College. Bhimavaram, Andhra Pradesh, India
Dr. G. P. Saradhi Varma Professor, Department of information technology, Sagi Ramakrishnam Raju Engineering College, Bhimavaram, Andhra Pradesh, India

Keywords:

Big Data, Data Mining , Frequent Itemset ,Machine Learning, MapReduce

Abstract

Generally FIM is one of primary concerns in data mining. Whereas problems of FIM have been studied, that standard and better solutions scale. This is generally the case when i) the sum of data tend to be extremely large and/or ii) A MinSup threshold is very low. In this paper, I propose a highly measurable and parallel frequent item set mining (PFIM) algorithm that is Parallel Absolute Top Down. PATD algorithm renders the mining process of very large amount of databases (Terabytes of data) easy and compact. Its mining process is completed for just parallel jobs, which dramatically reduce the mining runtime, communication cost and energy power utilization overhead, in a disseminated computational platform. Based on an intellectual and efficient data partitioning approach describe IBDP, PATD algorithm mines every data partition separately, relying on entire minimum support (A MinSup) as of a Relative one. PATD contain extensively evaluated using real-world data sets. My experimental results advise that PATD algorithm is considerably more capable as well as scalable than alternative approaches.

References

Yaling Xun, Jifu Zhang, Xiao Qin, FiDoop-Dp Data Partitioning in Frequent Itemset Mining on Hadoop clusters, 2016.
I.Pramudiono and M.Kitsuregawa,"Fp-tax: Tree structure based generalized association rule mining,"in Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery.ACM, 2004, pp.60–63.
X.Lin, Mr-apriori: Association rules algorithm based on mapreduce, a in Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on.IEEE, 2014, pp.141"144.
S.Hong, Z.Huaxuan, C.Shiping, and H.Chunyan, aoeThe study of improved fp-growth algorithm in mapreduce, in 1st International Workshop on Cloud Computing and Information Security.Atlantis Press, 2013.
M.Liroz-Gistau, R.Akbarinia, D.Agrawal, E.Pacitti, and P.Valduriez, aoeData partitioning for minimizing transferred data in mapreduce,a in Data Management in Cloud, Grid and P2P Systems.Springer, 2013, pp.1a"12.
Y.Xun, J.Zhang, and X.Qin, Fidoop: Parallel mining of frequent itemsets using mapreduce, IEEE Transactions on Systems, Man, and Cybernetics: Systems, doi: 10.1109/TSMC.2015.2437327, 2015.
W.Lu, Y.Shen, S.Chen, and B.C.Ooi, Efficient processing of k nearest neighbor joins using mapreduce,a Proceedings of the VLDB Endowment, vol.5, no.10, pp.1016a"1027, 2012.
J.Leskovec, A.Rajaraman, and J.D.Ullman, Mining of massive datasets.Cambridge University Press, 2014.
B.Bahmani, A.Goel, and R.Shinde, Efficient distributed locality sensitive hashing,a in Proceedings of the 21st ACM international conference on Information and knowledge management.ACM, 2012, pp.2174a"2178.
P.Uthayopas and N.Benjamas, Impact of i/o and execution scheduling strategies on large scale parallel data mining, Journal of Next Generation Information Technology (JNIT), vol.5, no.1, p.78, 2014.

FIDOOP-DP : Implementation of Data Partitioning in Frequent Itemset on Bigdata using Hadoop Pseudo Distributed Environment

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite