Home Guide Tools Listing News Background Search About Us

SearchTools.com

Search Indexing Robots: Books and Articles


InfoSpiders: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery (ARACHNID) October, 2001
University of Iowa work on issues of intelligent agents and adaptive spiders. Examples as Java Applets.
 
White Paper : The robots.txt file and the robots meta tag SearchMechanics / eBrandManagement.com, September 2000.
Practical descriptions for the webmaster on how the robots instructions are treated by search engine robots and other crawlers.
Programming Bots, Spiders and Intelligent Agents in MS Visual C++ David Pallmann, Microsoft Press, 1999
Provides context for proper use of robots on the Web, C++ and MFC examples for various kinds of agents, including site-indexing, advanced topics include multithreading, adaptation, logging, notification, etc. Knowledge of network programming and Internet protocols not required: relies on waning and MSIE heavily. Get the book from Amazon and give this site the affiliate fee.
 
Mercator: A Scalable, Extensible Web Crawler World Wide Web, volume 2 (1999), number 4 (December) by Allan Heydon and Marc Najork
Describes the design and architecture of a scalable multi-server robot crawler, modularization, including filtering by type, extracting links, queuing, testing for duplicates, domain name resolution and alias host names, testing for multiple links to the same page, threading and synchronous I/O, session IDs, and more.
 
 
Robots and Spiders and Crawlers Ultraseek White Paper, September 1999
Detailed discussion of how search engine indexing robots follow links and read Web pages to store the information in search indexes. Includes coverage of problem areas such as image maps, frames, JavaScript and dynamic data. Notes describe how the Ultraseek Spider handles these problems.
 
Controlling Search Engines ZDnet devhead / Interactive Designer, January 25, 1999
Nice article about using META tags.
 
Brace Your Site for the Onslaught of Bots ZDnet devhead, November 1, 1997 by David S. Linthicum
Information for Web site managers about site-spidering robots, including IE 4's subscription bot and robots.txt.

The official guidelines were written up in 1996 or so:

For more information on robots on the SearchTools Site:

Robots Information Page
Summary of the most important things about web crawling robots
Robots.txt Page
Instructions for using the Robots.txt file to direct robot crawlers and spiders away from sections of a web site.
META Robots Tag Page
Describes the META Robots tag contents and implications for search indexing robots.
Indexing Robot Checklist
A list of important items for those creating robots for search indexing.
List of Robot Source Code
Links to free and commercial source code for robot indexing spiders
List of Robot Development Consultants
Consultants who can provide services in this area.
SearchTools Robots Testing
Testbed for search indexing robots.
 
Page Updated 2002-12-18

Home
Guide
Tools Listing
News
Search
About Us
SearchTools.com - Copyright © 1998-2007 Search Tools Consulting
This work is provided under a Creative Commons Sampling Plus 1.0 License.