A Review - Web Scrapper Tool for Data Extraction
Keywords:
Web Data Extraction, Multiple Tree Merging, Schema, Vision-based Page Segmentation, Web page, Wrapper generation, Web Mining.Abstract
Web databases contain a huge amount of structured data which are easily obtained via their query interfaces only. The query results are presented in dynamically generated web pages, usually in the form of data records, for human use. The automatic web data extraction is critical in web integration. A number of approaches have been proposed. The early work are most based on the source code or the tag tree of the page. Recent approaches use the visual feature to extract data information, which are better than the previous work. However, these approaches still have inherent limitation. In this, we propose a novel approach that make use of visual features to extract data information from web page, including the data records and the data items. The results of this experiment tests on a large set of query result pages in different domain show that the proposed approach is highly effective.
References
- Zhai, Y. and Liu, B. Web Data Extraction Based on Partial Tree Alignment. Proceedings of the 14th International Conference on World Wide Web (WWW), Japan, pp. 76-85, 2005.
- Weifeng Su, Jiying Wang, Frederick H. Lochovsky , Combining Tag and Value Similarity for Data Extractionand Alignment, IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No.7,pp. 1186- 1200, July 2012.
- Manuel Alvarez,Alberto Pan,Finding and Extracting Data Records from Web Pages.Journal of Signal Processing Systems,Volume 59 Issue 1, April 2010 .pp.123-137
- Lidong Bing,Wai Lam,Towards a Unied Solution: Data Record Region Detection and Segmentation.CIKM 2011, page 1265-1274.
- P.V.Praveen Sundar,Towards Automatic Data Extraction Using Tag and Value Similarity Based on Structural-Semantic Entropy.IJARCSSE 2013, Volume 3 Issue 4, pp.226-231.
- H. Zhao, W. Meng, Z. Wu, V. Raghavan and C. Yu, Fully automatic wrapper generation for search engines, WWW2005, pp.66-75.
- K. Simon and G. Lausen, ViPER: Augmenting Automatic Information Extraction with Visual Perceptions, Proc. Conf.Information and Knowledge Management (CIKM), pp. 381- 388, 2005.
- Liu, W., Meng, X.F., Meng, W.Y.: ViDE: A Vision-Based Approach for Deep Web Data Extraction. IEEE Trans. on Knowl.and Data Eng. 22(3), 447-460(2010).
- Neil Anderson,JunHong.Visually Extracting Data Records from the Deep Web. WWW2013, pp.1233-1238.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.