Detecting Duplicate Records - A Case Study
Keywords:
Database, Duplicate Detection, RecordsAbstract
Databases play an important role in today's IT based economy. Many industries and systems depend on the accuracy of databases to carry out operations. Therefore, the quality of the information stored in the databases, can have significant cost implications to a system that relies on information to function and conduct business. Often, in the real world, entities have two or more representations in databases. Duplicate detection is the process of identifying multiple representations of same real world entities. The purpose of this paper is to provide a thorough study on different methods used for detecting duplicate records. And also this paper discussed about the different duplication detection tools in detail.
References
- Newcombe, Howard B. James M. Kennedy and S.J. Axford and A.P. James (1959). "Automatic Linkage of Vital Records". Science 130 (3381): 954-959.
- Cochinwala, Munir; Verghese Kurien and Gail Lalk and Dennis Shasha (2001). "Efficient data reconciliation". Information Sciences 137 (1-4): 1-15.
- Bilenko, Mikhail; Raymond J. Mooney and William Weston Cohen and Pradeep Ravikumar and Stephen E. Fienberg (2003). "Adaptive Name Matching in Information Integration". IEEE Intelligent Systems 18 (5): 16-23.
- Bansal, Nikhil; Avrim Blum and Shuchi Chawla (2004). "Correlation Clustering". Machine Learning 56 (1-3): 89–113.
- Cohen, William Weston; Jacob Richman (2002). "Learning to Match and Cluster Large High-Dimensional Data Sets For Data Integration". Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002).
- Sarawagi, Sunita; Anuradha Bhamidipaty (2002). "Interactive Deduplication using Active Learning". Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). pp. 269-278.
- Fellegi, Ivan Peter; Alan B. Sunter (1969). "A theory for record linkage". Journal of the American Statistical Association 64 (328): 1183-1210.
- Monge, Alvaro E.; Charles P. Elkan (1996). "The Field Matching Problem: Algorithms and Applications". Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). pp. 267-270.
- Wang, Y. Richard; Stuart E. Madnick (1989). "The Inter-Database Instance Identification Problem in Integrating Autonomous Systems". Proceedings of the Fifth IEEE International Conference on Data Engineering (ICDE 1989). pp. 46-55.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.