IJSRSET calls volunteers interested to contribute towards the scientific development in the field of Science, Engineering and Technology

Home > IJSRSET1623140                                                     

A Review - Web Scrapper Tool for Data Extraction


Dhanse Sufyan, Malik Arjumand, Khan Abdul Qayume, Prof. Murkute P. K., Prof. Naved Raza Q.Ali.
  • Abstract
  • Authors
  • Keywords
  • References
  • Details
Web databases contain a huge amount of structured data which are easily obtained via their query interfaces only. The query results are presented in dynamically generated web pages, usually in the form of data records, for human use. The automatic web data extraction is critical in web integration. A number of approaches have been proposed. The early work are most based on the source code or the tag tree of the page. Recent approaches use the visual feature to extract data information, which are better than the previous work. However, these approaches still have inherent limitation. In this, we propose a novel approach that make use of visual features to extract data information from web page, including the data records and the data items. The results of this experiment tests on a large set of query result pages in different domain show that the proposed approach is highly effective.

Dhanse Sufyan, Malik Arjumand, Khan Abdul Qayume, Prof. Murkute P. K., Prof. Naved Raza Q.Ali.

Web Data Extraction, Multiple Tree Merging, Schema, Vision-based Page Segmentation, Web page, Wrapper generation, Web Mining.

  1. Zhai, Y. and Liu, B. Web Data Extraction Based on Partial Tree Alignment. Proceedings of the 14th International Conference on World Wide Web (WWW), Japan, pp. 76-85, 2005.
  2. Weifeng Su, Jiying Wang, Frederick H. Lochovsky , Combining Tag and Value Similarity for Data Extractionand Alignment, IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No.7,pp. 1186- 1200, July 2012.
  3. Manuel Alvarez,Alberto Pan,Finding and Extracting Data Records from Web Pages.Journal of Signal Processing Systems,Volume 59 Issue 1, April 2010 .pp.123-137
  4. Lidong Bing,Wai Lam,Towards a Unied Solution: Data Record Region Detection and Segmentation.CIKM 2011, page 1265-1274.
  5. P.V.Praveen Sundar,Towards Automatic Data Extraction Using Tag and Value Similarity Based on Structural-Semantic Entropy.IJARCSSE 2013, Volume 3 Issue 4, pp.226-231.
  6. H. Zhao, W. Meng, Z. Wu, V. Raghavan and C. Yu, Fully automatic wrapper generation for search engines, WWW2005, pp.66-75.
  7. K. Simon and G. Lausen, ViPER: Augmenting Automatic Information Extraction with Visual Perceptions, Proc. Conf.Information and Knowledge Management (CIKM), pp. 381- 388, 2005.
  8. Liu, W., Meng, X.F., Meng, W.Y.: ViDE: A Vision-Based Approach for Deep Web Data Extraction. IEEE Trans. on Knowl.and Data Eng. 22(3), 447-460(2010).
  9. Neil Anderson,JunHong.Visually Extracting Data Records from the Deep Web. WWW2013, pp.1233-1238.

Publication Details

Published in : Volume 2 | Issue 1 | January-February - 2016
Date of Publication Print ISSN Online ISSN
2016-02-28 2395-1990 2394-4099
Page(s) Manuscript Number   Publisher
614-620 IJSRSET1623140   Technoscience Academy

Cite This Article

Dhanse Sufyan, Malik Arjumand, Khan Abdul Qayume, Prof. Murkute P. K., Prof. Naved Raza Q.Ali., "A Review - Web Scrapper Tool for Data Extraction", International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 2, Issue 1, pp.614-620, January-February-2016.
URL : http://ijsrset.com/IJSRSET1623140.php