Detecting Duplicate Records - A Case Study

T. Parimalam; R. Deepa; R. Nirmala Devi; P. Yamuna Devi

doi:10.32628/IJSRSET151563

Authors

T. Parimalam Department of Computer Science, Nandha Arts and Science College, Erode, Tamil Nadu, India
R. Deepa Department of Computer Science, Nandha Arts and Science College, Erode, Tamil Nadu, India
R. Nirmala Devi Department of Computer Science, Nandha Arts and Science College, Erode, Tamil Nadu, India
P. Yamuna Devi Department of Computer Science, Nandha Arts and Science College, Erode, Tamil Nadu, India

Keywords:

Database, Duplicate Detection, Records

Abstract

Databases play an important role in today's IT based economy. Many industries and systems depend on the accuracy of databases to carry out operations. Therefore, the quality of the information stored in the databases, can have significant cost implications to a system that relies on information to function and conduct business. Often, in the real world, entities have two or more representations in databases. Duplicate detection is the process of identifying multiple representations of same real world entities. The purpose of this paper is to provide a thorough study on different methods used for detecting duplicate records. And also this paper discussed about the different duplication detection tools in detail.

References

Newcombe, Howard B. James M. Kennedy and S.J. Axford and A.P. James (1959). "Automatic Linkage of Vital Records". Science 130 (3381): 954-959.
Cochinwala, Munir; Verghese Kurien and Gail Lalk and Dennis Shasha (2001). "Efficient data reconciliation". Information Sciences 137 (1-4): 1-15.
Bilenko, Mikhail; Raymond J. Mooney and William Weston Cohen and Pradeep Ravikumar and Stephen E. Fienberg (2003). "Adaptive Name Matching in Information Integration". IEEE Intelligent Systems 18 (5): 16-23.
Bansal, Nikhil; Avrim Blum and Shuchi Chawla (2004). "Correlation Clustering". Machine Learning 56 (1-3): 89–113.
Cohen, William Weston; Jacob Richman (2002). "Learning to Match and Cluster Large High-Dimensional Data Sets For Data Integration". Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002).
Sarawagi, Sunita; Anuradha Bhamidipaty (2002). "Interactive Deduplication using Active Learning". Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). pp. 269-278.
Fellegi, Ivan Peter; Alan B. Sunter (1969). "A theory for record linkage". Journal of the American Statistical Association 64 (328): 1183-1210.
Monge, Alvaro E.; Charles P. Elkan (1996). "The Field Matching Problem: Algorithms and Applications". Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). pp. 267-270.
Wang, Y. Richard; Stuart E. Madnick (1989). "The Inter-Database Instance Identification Problem in Integrating Autonomous Systems". Proceedings of the Fifth IEEE International Conference on Data Engineering (ICDE 1989). pp. 46-55.

Detecting Duplicate Records - A Case Study

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite