A Review - Web Scrapper Tool for Data Extraction

Authors

  • Dhanse Sufyan  Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India
  • Malik Arjumand  Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India
  • Khan Abdul Qayume  Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India
  • Prof. Murkute P. K.  Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India
  • Prof. Naved Raza Q.Ali.  Al-Ameen College of Engineering,Koregaon Bhima,Savitribai Phule Pune University, Pune, India

Keywords:

Web Data Extraction, Multiple Tree Merging, Schema, Vision-based Page Segmentation, Web page, Wrapper generation, Web Mining.

Abstract

Web databases contain a huge amount of structured data which are easily obtained via their query interfaces only. The query results are presented in dynamically generated web pages, usually in the form of data records, for human use. The automatic web data extraction is critical in web integration. A number of approaches have been proposed. The early work are most based on the source code or the tag tree of the page. Recent approaches use the visual feature to extract data information, which are better than the previous work. However, these approaches still have inherent limitation. In this, we propose a novel approach that make use of visual features to extract data information from web page, including the data records and the data items. The results of this experiment tests on a large set of query result pages in different domain show that the proposed approach is highly effective.

References

  1. Zhai, Y. and Liu, B. Web Data Extraction Based on Partial Tree Alignment. Proceedings of the 14th International Conference on World Wide Web (WWW), Japan, pp. 76-85, 2005.
  2. Weifeng Su, Jiying Wang, Frederick H. Lochovsky , Combining Tag and Value Similarity for Data Extractionand Alignment, IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No.7,pp. 1186- 1200, July 2012.
  3. Manuel Alvarez,Alberto Pan,Finding and Extracting Data Records from Web Pages.Journal of Signal Processing Systems,Volume 59 Issue 1, April 2010 .pp.123-137
  4. Lidong Bing,Wai Lam,Towards a Unied Solution: Data Record Region Detection and Segmentation.CIKM 2011, page 1265-1274.
  5. P.V.Praveen Sundar,Towards Automatic Data Extraction Using Tag and Value Similarity Based on Structural-Semantic Entropy.IJARCSSE 2013, Volume 3 Issue 4, pp.226-231.
  6. H. Zhao, W. Meng, Z. Wu, V. Raghavan and C. Yu, Fully automatic wrapper generation for search engines, WWW2005, pp.66-75.
  7. K. Simon and G. Lausen, ViPER: Augmenting Automatic Information Extraction with Visual Perceptions, Proc. Conf.Information and Knowledge Management (CIKM), pp. 381- 388, 2005.
  8. Liu, W., Meng, X.F., Meng, W.Y.: ViDE: A Vision-Based Approach for Deep Web Data Extraction. IEEE Trans. on Knowl.and Data Eng. 22(3), 447-460(2010).
  9. Neil Anderson,JunHong.Visually Extracting Data Records from the Deep Web. WWW2013, pp.1233-1238.

Downloads

Published

2016-02-28

Issue

Section

Research Articles

How to Cite

[1]
Dhanse Sufyan, Malik Arjumand, Khan Abdul Qayume, Prof. Murkute P. K., Prof. Naved Raza Q.Ali., " A Review - Web Scrapper Tool for Data Extraction, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 2, Issue 1, pp.614-620, January-February-2016.