As of January, 2012, this site is no longer being updated, due to work and health issues

SearchTools Survey - October 2000

File Formats and Search Engines

HTML is the basic file format of the Web, but we found that half of the sites in the survey are serving some files that are neither HTML or plain text. Many sites serve cross-platform standard formats such as PDF, PostScript and XML, while others serve office productivity files, including Microsoft Word, PowerPoint, Excel and WordPerfect.

There was some confusion among our survey respondents about file formats: some noted that they serve pages generated by server-side processing (JSP or ColdFusion) or by backend databases. Most site search engines can handle these because they are HTML pages by the time they reach the client, whether it's a browser or a robot indexing crawler. But the formats below are true binary files and cannot be read by browsers.

Some site search engines will index complex file formats: they may serve them by sending them to the client and allowing the browser to launch the creating application or they may attempt to convert them to HTML and serve them in that way.

A few search engines will index image, audio and video file metadata, such as the file name. Virage and Excalibur can index the multimedia data itself, although this requires a significant investment in time and resources.

Note that web-wide search engines such as AltaVista, Inktomi and Google will not currently index anything beyond HTML and text files.

Formats without search with search
HTML 414 259
PDF 171 152
text 147 119
Word 115 82
PowerPoint 57 65
Excel 60 56
XML 41 49
PostScript 20 29
WordPerfect 12 17
Lotus 1-2-3 4 11
Flash 1 2
multimedia 1 2
SGML 0 2
QuickTime 1 1
AVI 1 1
RTF 1 1
Zip 0 1
Brad 0 1
RealAudio 0 1
chemical formats 0 1
Applix 0 1
StarOffice 0 1
Quark 0 1
WordPro 0 1
FFT 0 1
RFT 0 1
icl 0 1
compressed files 0 1
email files 1 0
downloading EXE files 1 0
HKE 1 0
Domino .nsf 1 0
audio 1 0
VIV (Vivo) 1 0
MPEG 1 0
publisher 98 1 0
af3 (ABC Flowchart) 1 0
dot (GML) 1 0
PTML 1 0
WAV 1 0
MP3 1 0
October 2000 Survey Results

Sites & Search
 - Why Install
 - Why Not Installed
 - Site Sizes
 - Update Rate
 - Server Location
 - Languages
 - Multilingual Sites
 - File Formats
 - Summary
 - Popular
 - Custom
 - Others

This survey is copyright © 1998-2003 by Search Tools Consulting, and all rights are reserved. The survey was designed, analyzed and reported by Avi Rappoport. Personal information in the survey will be kept private at all times. For reprint permissions or survey aggregate data purchase, please contact Search Tools Consulting.

Home Guide Tools Listing News Info Search Contact

Avi Rappoport of Search Tools Consulting can help you evaluate your search engine, whether it's on a site, portal, intranet, or Enerprise. Please contact SearchTools for more information.

Creative Commons License  This information copyright © 2000-2011 Avi Rappoport, Search Tools Consulting. Some Rights Reserved, under the Creative Commons Attribution-Share Alike 3.0 United States License. Always attribute copied content to the page's full URL. Permissions beyond the scope of this license are available upon request.