An Enhanced Common Data Cleaning Framework for Data mining

Agusthiyar. R; Dr. K. Narashiman

doi:10.32628/IJSRSET1625110

Authors

Agusthiyar. R Research Scholar, Anna University, Chennai, Tamil Nadu India
Dr. K. Narashiman Professor & Director, AUTVS Center for Quality Management, Anna University, Chennai, Tamil Nadu India

Keywords:

Data cleaning, Data mining, Extract, Transform and Load (ETL), Extensible Markup Language (XML), Enhanced Common Data Cleaning (ECDC)

Abstract

In this information era, there is a huge availability of data but the information is not enough to meet the requirements. This creates an urgent need for data cleaning and data cleaning solutions become highly important for data mining users. Normally, data cleaning deals with detecting, eliminating errors and inconsistencies in large data sets. For any real world data set, doing this task manually is very cumbersome as it involves huge amount of human resource and time. This means several organizations spend millions of dollars per year to detect data errors. Due to this wide range of possible data inconsistencies and the sheer data volume, data cleaning is considered to be one of the biggest problems in data warehousing. Normally the data cleaning is required when multiple data sources need to be integrated. In this research work an Enhanced Common Data Cleaning (ECDC) framework has been developed and proposed.

References

Elgamal, F, Mosa, NA & Amasha, NA 2014, ‘Application of Framework for Data Cleaning to Handle Noisy Data in Data Warehouse’, International Journal of Soft Computing and Engineering (IJSCE) vol.3, issue 6, pp.226-231.
Elmagarmid, AK, Ipeirotis, PG & Verykios, VS 2007, ‘Duplicate Record Detection’, A Survey. IEEE TKDE, vol.19, no.1, pp.1-16.
Erhard Rahm & Hong-Hai Do 2000, ‘Data Cleaning: Problems and Current Approaches’, IEEE Bulletin of the Technical Committee on Data Engineering, vol.23, no.4, pp.1-10.
Eshref Januzaj & Visar Januzaj 2009, ‘An Application of Data Mining to Identify Data Quality Problems’, In IEEE Proceedings of International Conference on Advanced Engineering Computing and Applications in Sciences, pp.17-22.
Ezeife, CI & Timothy E Ohanekwu 2005, ‘Use of Smart Tokens in Cleaning Integrated Warehouse Data’, the International Journal of Data Warehousing and Mining (IJDW), vol. 1, no.2, pp. 1-22, Ideas Group Publishers.
Ezeife, CI 2001, ‘Selecting and materializing horizontally partitioned warehouse views’, Elsevier Journal of Data and Knowledge Engineering, vol.36, no.2, pp.185-210.
Jebamalar Tamilselvi, J & Saravanan, V 2008, ‘A Unified Framework and Sequential Data Cleaning Approach for a Data warehouse’, IJCSNS International Journal of Computer Science and Network Security, vol.8, no.5, pp. 117-121.
Jebamalar Tamilselvi, J & Saravanan, V 2008, ‘Handling Noisy data using Attribute Selection and Smart Tokens’, International Conference on Computer -Science and Information Technology, IEEE pp.770-774.
Jebamalar Tamilselvi, J & Saravanan, V 2010, ‘Token-based method of blocking records for large data warehouse’, Advances in Information Mining, ISSN: 0975–3265, vol.2, pp.05-10.
Kavitha Kumar, R & Chandrasekaran, RM 2011, ‘Attribute correction-data cleaning using Association rule and clustering methods’, International Journal of Data Mining & Knowledge Management Process (IJDKP) vol.1, no.2, pp.22-32.
Kavitha, PT & Sasipraba, T 2011, ‘Performance Evaluation of Algorithms using a Distributed Data Mining Frame Work based on Association Rule Mining’, International Journal on Computer Science and Engineering (IJCSE), vol. 3 no. 12 , pp.3845 – 3853
Sumon Shahriar & Sarawat Anam 2009, ‘Towards Data Quality and Data Mining using Constraints in XML’, International Journal of Database Theory and Application, vol. 2. no. 1, 1-8.
Taghi M Khoshgoftaar & Jason Van Hulse2009,‘ Empirical Case Studies in Attribute Noise Detection’, IEEE transactions on systems, man, and cybernetics—part c: applications and reviews, vol. 39, no. 4.
Taoxin Peng 2008, ‘A Framework for Data Cleaning in Data Warehouses’, Napier university, Edinburgh, UK, 1-6. Proc. of the 10th International Conference on Enterprise Information Systems (ICEIS), pp.473-478.
Wei Wei 2001, ‘Data mining using Neural Networks for Large Credit Card Record Sets’, A MS-Thesis of New Jersey Institute of Technology, New York, pp.1-83.
Wei–Sen Chen & Yen-Kuan Du 2009, ‘Using Neural Networks and Data mining Techniques for the financial distress prediction model’, Expert System with applications, Elsevier, vol.36, pp.4075-4086.
Xianjun Ni 2008, ‘Research of Data Mining Based on Neural Networks’, World Academy of Science, Engineering and Technology.

Yashpal Singh & Alok Singh Chauha (2005 – 2009), ‘Neural Networks In Data Mining’,Journal of Theoretical and Applied Information Technology pp. 37- 42.

An Enhanced Common Data Cleaning Framework for Data mining

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite