A Review - Web Scrapper Tool for Data Extraction

Authors(5) :-Dhanse Sufyan, Malik Arjumand, Khan Abdul Qayume, Prof. Murkute P. K., Prof. Naved Raza Q.Ali.

Web databases contain a huge amount of structured data which are easily obtained via their query interfaces only. The query results are presented in dynamically generated web pages, usually in the form of data records, for human use. The automatic web data extraction is critical in web integration. A number of approaches have been proposed. The early work are most based on the source code or the tag tree of the page. Recent approaches use the visual feature to extract data information, which are better than the previous work. However, these approaches still have inherent limitation. In this, we propose a novel approach that make use of visual features to extract data information from web page, including the data records and the data items. The results of this experiment tests on a large set of query result pages in different domain show that the proposed approach is highly effective.

Authors and Affiliations

Dhanse Sufyan
Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India
Malik Arjumand
Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India
Khan Abdul Qayume
Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India
Prof. Murkute P. K.
Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India
Prof. Naved Raza Q.Ali.
Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India

Web Data Extraction, Multiple Tree Merging, Schema, Vision-based Page Segmentation, Web page, Wrapper generation, Web Mining.

  1. Zhai, Y. and Liu, B. Web Data Extraction Based on Partial Tree Alignment. Proceedings of the 14th International Conference on World Wide Web (WWW), Japan, pp. 76-85, 2005.
  2. Weifeng Su, Jiying Wang, Frederick H. Lochovsky , Combining Tag and Value Similarity for Data Extractionand Alignment, IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No.7,pp. 1186- 1200, July 2012.
  3. Manuel Alvarez,Alberto Pan,Finding and Extracting Data Records from Web Pages.Journal of Signal Processing Systems,Volume 59 Issue 1, April 2010 .pp.123-137
  4. Lidong Bing,Wai Lam,Towards a Unied Solution: Data Record Region Detection and Segmentation.CIKM 2011, page 1265-1274.
  5. P.V.Praveen Sundar,Towards Automatic Data Extraction Using Tag and Value Similarity Based on Structural-Semantic Entropy.IJARCSSE 2013, Volume 3 Issue 4, pp.226-231.
  6. H. Zhao, W. Meng, Z. Wu, V. Raghavan and C. Yu, Fully automatic wrapper generation for search engines, WWW2005, pp.66-75.
  7. K. Simon and G. Lausen, ViPER: Augmenting Automatic Information Extraction with Visual Perceptions, Proc. Conf.Information and Knowledge Management (CIKM), pp. 381- 388, 2005.
  8. Liu, W., Meng, X.F., Meng, W.Y.: ViDE: A Vision-Based Approach for Deep Web Data Extraction. IEEE Trans. on Knowl.and Data Eng. 22(3), 447-460(2010).
  9. Neil Anderson,JunHong.Visually Extracting Data Records from the Deep Web. WWW2013, pp.1233-1238.

Publication Details

Published in : Volume 2 | Issue 1 | January-February 2016
Date of Publication : 2016-02-28
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 614-620
Manuscript Number : IJSRSET1623140
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Dhanse Sufyan, Malik Arjumand, Khan Abdul Qayume, Prof. Murkute P. K., Prof. Naved Raza Q.Ali., " A Review - Web Scrapper Tool for Data Extraction, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 2, Issue 1, pp.614-620, January-February-2016.
Journal URL : http://ijsrset.com/IJSRSET1623140

Follow Us

Contact Us