SearchTools.com
Search Indexing Robots: Books and Articles
- InfoSpiders: Adaptive
Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery
(ARACHNID) October, 2001
- University of Iowa work on issues of intelligent agents and adaptive
spiders. Examples as Java Applets.
- White
Paper : The robots.txt file and the robots meta tag SearchMechanics
/ eBrandManagement.com, September 2000.
- Practical descriptions for the webmaster on how the robots instructions
are treated by search engine robots and other crawlers.
- Programming
Bots, Spiders and Intelligent Agents in MS Visual C++ David Pallmann,
Microsoft Press, 1999
- Provides context for proper use of robots on the Web, C++ and MFC
examples for various kinds of agents, including site-indexing, advanced
topics include multithreading, adaptation, logging, notification, etc.
Knowledge of network programming and Internet protocols not required:
relies on waning and MSIE heavily. Get the book from Amazon
and give this site the affiliate fee.
- Mercator:
A Scalable, Extensible Web Crawler World Wide Web, volume 2 (1999),
number 4 (December) by Allan Heydon and Marc Najork
- Describes the design and architecture of a scalable multi-server robot
crawler, modularization, including filtering by type, extracting links,
queuing, testing for duplicates, domain name resolution and alias host
names, testing for multiple links to the same page, threading and synchronous
I/O, session IDs, and more.
-
- Robots and Spiders and Crawlers Ultraseek White Paper, September
1999
- Detailed discussion of how search engine indexing robots follow links
and read Web pages to store the information in search indexes. Includes
coverage of problem areas such as image maps, frames, JavaScript and
dynamic data. Notes describe how the Ultraseek Spider handles these
problems.
- Controlling
Search Engines ZDnet devhead / Interactive Designer, January
25, 1999
- Nice article about using META tags.
- Brace
Your Site for the Onslaught of Bots ZDnet devhead, November 1,
1997 by David S. Linthicum
- Information for Web site managers about site-spidering robots, including
IE 4's subscription bot and robots.txt.
Page Updated 2002-12-18