Title: Review of web crawlers
Authors: S.R. Sreeja; Sangita Chaudhari
Addresses: Department of Computer Science, A.C. Patil College of Engineering, Sector 4, Kharghar, Navi Mumbai, Maharashtra, 410210, India ' Department of Computer Science, A.C. Patil College of Engineering, Sector 4, Kharghar, Navi Mumbai, Maharashtra, 410210, India
Abstract: The web is a repository of large amount of data. Information available in the web is organised in the form of pages. Due to the presence of unlimited amount of information, searching and finding out appropriate information from the web is a task which needs expertise. Web crawlers are programmes that assist search engines by automating the task of visiting web pages and downloading their contents. They also help in ranking the downloaded web pages. Thus, the search engines can produce a list of web pages ordered by their relevance and can display this list as a result of the search. Crawling also helps to validate web pages, analyse them, notify about page-updation, visualise web pages and sometimes for collecting e-mail addresses for spam purposes. They can be of different types, each one using different strategies and techniques to crawl web pages. This paper presents a review of various types of web crawlers.
Keywords: deep web crawlers; focused crawlers; web forums; forum crawlers; web intelligence; web crawler review.
DOI: 10.1504/IJKWI.2014.065035
International Journal of Knowledge and Web Intelligence, 2014 Vol.5 No.1, pp.49 - 61
Received: 31 Oct 2013
Accepted: 06 Feb 2014
Published online: 25 Oct 2014 *