As of January, 2012, this site is no longer being updated, due to work and health issues
In addition to server-wide robot control using robots.txt, web page creators can also specify that certain pages should not be indexed by search engine robots, or that the links on the page should not be followed by robots. The Robots META tag, placed in the HTML <HEAD> section of a page, can specify either or both of these actions.
As of June, 2008, the revised Robots Exclusion Protocol added NOARCHIVE, NOODP and NOSNIPPET to the list of supported values supported by Google and MSN Live Search, and Yahoo added NOYDIR.
The default values are now assumed to be INDEX, FOLLOW, ARCHIVE, ODP, SNIPPET and YDIR. There is no actual need to include these, unless someone on your internal web team needs reminding.
These values are usually combined into one line for all robots. If they don't understand a directive, they will just ignore it.
In general, it's better to use the same directives for all robots. While it's possible to include several lines with several robot crawler User-agent names, that might be an indicator of the bad kind of search cloaking (hiding the real page text from the search engine).
If you add Robots META tags to a framed site, be sure to include them on both the enclosing page and the frame content pages. The frameset could have NOINDEX, FOLLOW to avoid picking up any stray text on the frameset page.
Examples<HEAD> <title>Should Not Be Indexed Or Followed</title> <META name="robots" content="NOINDEX,NOFOLLOW" /> </HEAD><HEAD> <title>Rapidly-changing content, search result might be misleading</title> <META name="googlebot" content="NOARCHIVE, NOODP, NOSNIPPET" /> <META name="slurp" content="NOARCHIVE, NOYDIR, NOSNIPPET" /> </HEAD>
(from my robots test suite)
One SEO site had tested various punctuation (commas, semicolons, spaces)inside the meta robots value, and comma followed by a space between the commands seemed to be the best delimiter. It's also human-readable, always a plus. If anyone knows who they are, please tell me, so I can credit them.
Task Entry Notes Indexer: ignore content;
Robot: follow links
<META name="ROBOTS" content="NOINDEX">
Use this for pages with many links on them, but no useful data, such as a site map. Because "follow" is the default, you don't have to include it. Indexer: include content;
Robot: do not follow links
<META name="ROBOTS" content="NOFOLLOW, INDEX ">
Use this for pages which have useful content but outdated or problematic links. Indexer: ignore content;
Robot: do not follow links
<META name="ROBOTS" content="NOINDEX,NOFOLLOW">
This is for sections of a site that shouldn't be indexed and shouldn't have links followed. Putting access control, such as a password, is much better for security. Indexer: include content;
Robot: follow links
<META name="ROBOTS" content="INDEX,FOLLOW">
This is the default behavior: you don't have to include these. Search results pages should not show "cache" link <META name="ROBOTS" content="NOARCHIVE"> Useful if the content changes frequently: headlines, auctions, etc. The search engine still archives the information, but won't show it in the results. Search results pages should not display the Open Directory Project (ODP) title and description for the page.
<META name="ROBOTS" content="NOODP">
Danny Sullivan provides good examples of how outdated descriptions and even titles show up when the ODP content is used for search results.
Encourages search engines to use the page title tag, and match term in context, or META Description tag content instead of the ODP content, which may be misleading or outdated. Search results pages should not display the Yahoo Directory title and description for the page
<META name="ROBOTS" content="NOYDIR">
(Yahoo Slurp robot only)
Same as above, only for the Yahoo directory, and the other search indexers will ignore it. Search results pages should not display any description or text context for this page. Title only, I guess. <META name="ROBOTS" content="NOSNIPPET"> Encourages the search engines to use the title only, and to suppress the "cache" link. Might be useful if the site has special plus box listings in search results, but otherwise, not so much.
For more information, see the original HTML Author's Guide to the Robots META tag and the HTML 4.01 specification, Appendix B.4.1
Robots Exclusion Protocol, New Agreement, June 2008
For more information about robots on the SearchTools Site:
- Robots Information Page
- Summary of the most important things to know about web crawling robots
- META Robots Tag Page
- Describes the META Robots tag contents and implications for search indexing robots.
- Indexing Robot Checklist
- A list of important items for those creating robots for search indexing.
- List of Robot Source Code
- Links to free and commercial source code for robot indexing spiders
- List of Robot Development Consultants
- Consultants who can provide services in this area.
- Articles and Books on Robots and Spiders
- Overview articles and technical discussions of robot crawlers, mainly for search engines.
- SearchTools Robots Testing
- Test cases for common robot problems.
Page Updated: 2008-07-03
Avi Rappoport of Search Tools Consulting can help you evaluate your search engine, whether it's on a site, portal, intranet, or Enerprise. Please contact SearchTools for more information.
This information copyright © 2000-2011 Avi Rappoport, Search Tools Consulting. Some Rights Reserved, under the Creative Commons Attribution-Share Alike 3.0 United States License. Always attribute copied content to the page's full URL. Permissions beyond the scope of this license are available upon request.