As of January, 2012, this site is no longer being updated, due to work and health issues
SearchTools.com uses tests to evaluate how well search indexing robots can handle robot rules and complex linking. Many robots (also known as crawlers and spiders) are easily confused by anything beyond a simple URL, so these tests help us identify the ones with more and less sophistication.
In addition, these tests will tell us how many robots can handle text in ALT and Comment tags, HTML header tags such as Meta Keywords, and more.
To try out this system, we've coded each page with "RTest", and more specifically, with "RTestGood", for successful indexing and "RTestProblem" for pages which should not be indexed.
Following Robot Standards
- Robots Tests - test whether indexing robots honor robots.txt and the META Robots tag
Following Links
- Frames - check how well indexing robots can index framed documents and noframes, and how they display them when found
- Image Maps - some indexing robots will not recognize links in client-side image maps (server-side maps are even worse, and no one will test them for links.
- JavaScript Menu - see if indexing robots recognize JavaScript href menus or follow noscript links
- JavaScript Document.Write - test whether indexing robots can handle complex JavaScript
- Redirect - see how well the indexing robots follow server redirects and META Refresh redirect links.
- Beyond ".html" - will search robots follow links to text pages with different file suffixes, such as .txt, .asp, .cf, .pl, .ssi, .shtml, and .xml?
- Non-text pages - testing whether indexing robots will index binary files such as Acrobat and Microsoft Word (.pdf, .doc, .xls, etc.)
- Non-alphabetic Characters in URLs - links to pages with characters such as !, (), and ~.
- Directory Listings - links to files in a folder automatically generated by the server.
- Relative Links - following both standard and strange relative links.
- Directory Link Depth - how deep into a site will an indexer go? Does it matter whether the directory name is different or the same? I have 20 levels of test documents, each very slightly different.
- Password-Protected pages - some pages may be allowed to search indexing robots if the search admin gives them the right password (realm: protectallow, user name: robot, password: allow). Others are both disallowed and have secured passwords, so they should never be indexed.
Indexing Text
- Image Alt Tag Test - see which indexing robots index text in Alt tags for better descriptions of images.
- Comment Test - test if indexing robots index text in comments or follow links in comments.
- Dates - examples of problem dates (old, future, dynamic) and some attempts to insert correct dates for search indexing.
- Extended Character Codes Test - examples and text of non-English characters in Unicode, such as diacritical letters.
- Meta Tag Special Tags Test - some engines can index and retrieve words in the HTML Meta tags, Dublin Core, and more.
- Detecting Duplicate Pages - some indexing systems will recognize duplicate pages and only display one copy. The problem is that some small differences are vital (contract dates, name spellings) while others are irrelevant (auto-inserted dates, copyrights).
- MP3 File MetaData - for a summary of the issues, see our MP3 Search Report.
- NoIndex tags
Retrieval and Relevance
- Relevance Ranking - looking at how search engines perform relevance ranking -- number of matches, length of file, position on page, meta tags, title and header tags, etc.
- Anchors - testing whether external anchor text is used to index a target page, whether text in anchors is ranked higher than other instances of that text, and whether there's any way to see the closest anchor text as part of search results.
Now testing the www.jrank.org hosted search service
Comment on These Tests
Page Created: 2005-05-11