Web toolkit: an agent based scalable search engine using cellular automata based classification for duly ranked retrieved data Online publication date: Tue, 21-Oct-2014
by Anirban Kundu, Debajyoti Mukhopadhyay
International Journal of Intelligent Information and Database Systems (IJIIDS), Vol. 5, No. 2, 2011
Abstract: Web page classification is a major issue for categorising web documents to facilitate indexing, search and retrieval of web pages for search engine. Different crawling techniques have been utilised to accumulate web pages of different domains under separate databases depending on practical scenario. Downloaded web pages are being parsed for further processing. A classifier is designed dynamically using single cycle multiple attractor cellular automata for mapping downloaded web pages of different domains into specific structure. This paper proposes alternate technique for automatic categorisation of web pages into different domains. Retrieved web pages have been ranked automatically at the time of classifier formation. Typically, our system consists of crawling, ranking and storage parts created in a different way. Hierarchical concept has been used over parallel crawler. GF(2P) concept is introduced in ranking. The concept of SMACA has been utilised in indexing storage. Overall, a search engine module has been created using agent-based method.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Intelligent Information and Database Systems (IJIIDS):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com