As of January, 2012, this site is no longer being updated, due to work and health issues
Document Date Issues for Search Indexing
Search indexing spiders (aka robots and crawlers) follow links in HTML pages to find new pages. They also check known indexed pages to see if the content has changed. Generally, they do this by either getting the whole page again (HTTP GET) or to be more efficient, just the header (HTTP HEAD) or, even more so, send an "IF-MODIFIED-SINCE" (Conditional Get) request to get the whole page only if it's been updated since they last asked about it.
If the date reported is far past, the future, or the instant the indexer requests it, it makes the indexer waste cycles re-indexing unchanged content. Worse, it lies to searchers about the content currency, which is a vital element in assessing the value of a search result. Dates on web servers are not reliable which is one reason Google and Yahoo's Web Search results rarely even even show page date. Enterprise search can do better, if you can make the required changes on the server or publishing side.
Types of Date Problems
When misconfigured, some servers send ludicrously incorrect dates, from before the Web (1969, or 1920, for example) or in the future. Pages indexed from these servers display dates that simply don't make sense.
Even file modification dates are not truly reliable: many systems will reset that date when the file is copied, or opened just for reading.
In addition, many dynamic webservers, including database-backed sites, PHP files, CGIs and SSI, create a new page every time a browser client, or robot, requests a page. These systems set the "modified date" to the current date and time, which is technically correct. That's fine for truly dynamic data (such as airline fares or auction prices), but for more static information, it can be a serious problem. The dynamic date misrepresents the publication and/or modification dates. This means that old material seems new, and new material is lost in the shuffle
Store and Serve Correct Content Update Dates
The best solution is to keep track of significant changes and to store the date in which the content was significantly updated in the file modification date, text, CMS, and/or metadata, and sure that date is sent by the server when the page is requested.
Significant updates include changes to the file name, URL, text (including correcting spelling), internal links, authorship, affiliations and so on. Insignificant changes include styles, layouts, incoming links, and invisible metadata.
If the server cannot be forced to send the dates correctly, adding an HTTP Equiv tag or a metadata tag will give smart search engines the information they need to store a correct date.
While not explicitly part of the HTTP/HTML recommendations, some servers will extract HTTP EQUIV tags from pages and include them in headers sent back to the requesting client. You can use these commands to send a more useful date as part of the system response, although it may not get processed correctly for "HEAD" requests. (This solution suggested by "The Sad State of Dates" in the New Idea Engineering Newsletter.)
There are examples of schema Date meta tags in the HTML recommendations, and support for the Dublin Core date tags, which are standard across the Web. Again, these will probably not be sent with "HEAD" requests, but enterprise search engines can extract them during indexing for date accuracy.
When displaying dates, W3C recommends the ISO 8601 format, YYYY-MM-DD. The Dublin Core example looks like this:
<meta name="DC.date" content="2001-07-18" />
SearchTools Date Tests
We have added a simple date test suite at searchtools.com, allowing search engine developers to point the robot spider at the files and see what happens.
Search engine administrators should look at the dates of their pages and see if they are accurate or if they need some work.
Search Tools With Date Solutions
- Google Search Appliance (and Mini) - Interface to get dates from meta tags or text
- Ultraseek - Interface to extract dates from meta tags
Search Tools Consulting's principal analyst, Avi Rappoport, may be available to help you with selection, indexing and search log analysis, as well as relevance evaluation, user experience testing, and functional search engine work. Please contact us for more information.
SearchTools.com - copyright © 2005-2009 by Search Tools Consulting.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.