A Review study on Designing of Focused Crawler

Authors

  • Bhagyashri Shambharkar Shankar  M.Tech Scholar, Department of Computer Science and Engineering TulsiramjiGaikwad-Patil College of Engineering and Technology Nagpur, Maharashtra, India
  • Prof. Jayant Adhikari  Department of Computer Science and Engineering TulsiramjiGaikwad-Patil College of Engineering and Technology Nagpur, Maharashtra, India
  • Prof. Rajesh Babu  Department of Computer Science and Engineering TulsiramjiGaikwad-Patil College of Engineering and Technology Nagpur, Maharashtra, India

Keywords:

Focused Crawler, Genetic algorithm, PageRank, Best First Search.

Abstract

In today's world web has gained popularity due to its own as well as internet development due to which there is a much more need of the method by which we can increase the efficiency of locating the deep-web interface. There is a method which surfs the World Wide Web in automatic way known as a web crawler. Deep web databases are regularly inadequately distributed, and keep consistently changing. To solve this problem, work done beforehand gives two sorts of crawler: generic crawlers and focused crawlers. Focused crawling has drawn a lot of attention from researchers in the past decade. Focused crawler searches the specific term or topic on internet. Vertical search is done very presizely and good searching strategies helps to improve the accuracy so Best-First search strategy is utilized but it falls into local optimization. So for improving global search we presented focused crawler with improved genetic algorithm also called as global search algorithm. Here, fitness function concede topic correlationand topic importance. Topic correlation is analyzed by vector spacemodel and topic importance is estimated by improved PageRankalgorithm. Genetic operations are optimized based on browsing behavior of user. Selection operation chooses webpages withgreater fitness, crossover operation sorts links by topic importanceand mutation operation searches combined keywords withsearch engine. Compared with previous genetic algorithms, theexperimental results show that improved genetic algorithm canincrease precision and recall of focused crawler and enlarge the search scope of the crawler. Conducted evaluation experiments to examine the effectiveness of our approach.

References

  1. Wei Yan and Li Pan” Designing Focused Crawler Based On Improved Genetic Algorithm”, 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI) March 29-31, 2018, Xiamen, China.
  2. SoumenChakrabarti, Martin van den Berg 2, Byron Domc, ‘”Focused crawling: a new approach to topic-specific Web resource discovery”, Published by Elsevier Science B.V. All rights reserved in 1999
  3. Kevin Chen-Chuan Chang, Bin He, Chengkai Li, Mitesh Patel, and Zhen Zhang. Structured databases on the web: Observations and implications. ACM SIGMOD Record, 33(3):61-70, 2004.
  4. SoumenChakrabarti, KunalPunera, and MallelaSubramanyam. Accelerated focused crawling through online relevance feedback. In Proceedings of the 11th international conference on World Wide Web, pages 148-159, 2002.
  5. SriramRaghavan and Hector Garcia-Molina. Crawling the hidden web. In Proceedings of the 27th International Conference on Very Large Data Bases, pages 129-138, 2000.
  6. JayantMadhavan, Shawn R. Jeffery, Shirley Cohen, Xin Dong, David Ko, Cong Yu, and Alon Halevy. Web-scale data integration: You can only afford to pay as you go. In Proceedings of CIDR, pages 342-350, 2007.
  7. Jared Cope, Nick Craswell, and David Hawking. Automated discovery of search interfaces on the web. In Proceedings of the 14th Australasian database conference- Volume 17, pages 181-189. Australian Computer Society, Inc., 2003.
  8. C. C. Yang, J. Yen and H. Chen, “Intelligent Internet Searching Engine based on Hybrid Simulated Annealing,” in Proc. of HICSS, 1998.
  9. H. Chen, Y. Chung, M. Ramsey, and C. Yang, “A Smart Itsy-Bitsy Spider for the Web,” JASIS, 49(7), pp. 604-618, 1998.
  10. X. Yang, B. Pan, J. A. Evans, and B. Lv, “Forecasting chinese tourist volume with search engine data,” Tourism Management, vol. 46, pp. 386-397, 2015.
  11. Y. U. Juan and Q. Liu, “Survey on topic-focused crawlers,” Computer Engineering & Science, 2015.
  12. S. Guo, W. Bian, Y. Liu, and H. U. Tai, “Research on the application of svm-based focused crawler for space intelligence collection,” ElectronicDesign Engineering, 2016.
  13. N. Liu and R. Yao, “The crawling strategy of shark-search algorithm based on multi granularity,” in International Symposium on ComputationalIntelligence and Design, 2016.
  14. W. Zhang and Y. Chen, “Bayes topic prediction model for focused crawling of vertical search engine,” in Computing, Communications andIt Applications Conference, 2015, pp. 294-299.
  15. R. Prajapati and S. Kumar, “Enhanced weighted pagerank algorithm based on contents and link visits,” in International Conference onComputing for Sustainable Global Development, 2016.
  16. Z. L. Jiang, X. U. Xue-Ke, and L. I. Shuai, “Hits-based topic sensitive crawling method,” Journal of Computer Applications, vol. 28, no. 4, pp. 942-941, 2008.
  17. L. Qiu, Y. Lou, and M. Chang, “Research on theme crawler based on shark-search and pagerank algorithm,” in International Conference onCloud Computing and Intelligence Systems, 2016, pp. 268-271.

Downloads

Published

2019-03-30

Issue

Section

Research Articles

How to Cite

[1]
Bhagyashri Shambharkar Shankar, Prof. Jayant Adhikari, Prof. Rajesh Babu, " A Review study on Designing of Focused Crawler, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 6, Issue 2, pp.08-14, March-April-2019.