As of January, 2012, this site is no longer being updated, due to work and health issues
Overview Books, Articles and Reports
Annotated links to books, reports and articles which provide general information
about web site, intranet and portal search engines, including how to evaluate
and choose software. See also Background Information listings, Links, Training, Newsgroups & Mailing Lists, Comparative
Reviews, and Tools Listings.
For a local overview, see the Guide to Search
- Autonomy, Endeca rate among top enterprise search vendors NetworkWorld, July 2, 2008 by Jon Brodkin.
Provides summary of a Forrester Enterprise Search report ($795)
- Updated 2008 Enterprise Search Vendor Roundup by Miles Kehoe and Mark Bennett, Enterprise Search Blog, by New Idea Engineering, January 10, 2008
Short descriptions of enterprise search vendors, includes information about the acquisition of FAST Search and Transfer by Microsoft.
- Inverted files for text search engines by Justin Zobel and Alistair Moffat
in ACM Computing Surveys. 2006;38(2) (56 pages).
Available in PDF format: http://doi.acm.org/10.1145/1132956.1132959 It's $10 if you don't have an ACM account and you have to register on the site. There also seems to be a copy of the PDF file on CiteSeer.
- This is great: exhaustive and detailed, with both practical and theoretical information, including finding that inverted indexes are both significantly faster to search and easier to maintain than relational database management systems, signature files and suffix arrays. It also has a thorough annotated bibliography. Best of all, Zobel and Moffat agree with me on lowercasing all words in the index and including stopwords, which they say "have an important role in phrase queries".
- Search Moves Well Beyond Google Information Week, June 12, 2006 by Thomas Claburn
Describes how FirstGov has switched from a FAST implementation by IBM to a combination of MSN Search and Vivisimo Search and Clustering, which now brings back results from local jurisdictions as well as the federal government. The administrator in charge likes the clustered category links. Mentions Google Search Appliance market share, and Autonomy's powerful but complex features. One search administrator points out the need to invest resources to make best use of search systems.
- A sharp eye for details: Enterprise search systems must scour more than ever before Government Computer News, May 22, 2006 by S. Michael Gallagher
Describes some of the specific demands of government business processes, and how some features such as video search are being incorporated into enterprise search by companies like FAST. Divides search into four categories: departmental; general-purpose enterprise search; customizable specialized search; and document management systems with search. Describes federated search from OpenText and Documentum, Vivisimo as well as Google's Search Appliance OneBox technology. In contrast, EasyAsk emphasizes multiple query and navigation depending on the data. The article also describes the value of entity extraction and taxonomy to improve search results.
ignites search technology Network World, September 13, 2004 by Ann
An overview of the problems enterprises have with finding information, including
complex queries and multiple repositories. Search administrators recommend
trying before buying, defining problems such as dynamic updates, repetitive
language, context, scale and query options. Search engines mentioned include Ultraseek, Autonomy,
the Google Search Appliance, and iPhrase.
- Unstructured Information Management Report Infosphere, March, 2003
by Magnus Stensmo and Mikael Thorson, $325/€295
for a single PDF license
Long analysis of the newly-named Unstructured Information Management
sector, which includes sophisticated search, categorization and visualization
tools. Case studies include medical and patent research. Describes text analysis
processes for information discovery, including rules-based vs. machine learning
techniques, used for assigning documents to taxonomic categories and generating
- Report analyzes 40 vendors, using the criteria of Search & Retrieval,
Information Extraction, Categorization, Clustering, Taxonomy Management,
Visualization, Other Apps, Completeness of Vision and Ability to Execute.
- Highest scores to Autonomy and Verity in the search market, Stratify,
nStein and Inxight in categorization,
Kartoo and Antarctica in visualization.
- Vendors covered: 80-20, APR
(SmartLogik), ClearForest, Convera, Endeca, Endymion, Entrieva, FAST, Google Search Appliance, Hapax,
IBM, Insightful, Intelligenxia, Intelliseek, Inxight, Jeeves
Solutions, Kartoo, LingoMotors, Megaputuer, Microsoft,
Mohomine, Nstein, Open Text, Oracle,
Readware, Semagix, SER, Speed of Mind, SPSS, SRA, Stratify, Temis Group,
Teragram, Text Analysis International, Triplehop, Verity,
Are Here - Still lost? A cadre of new companies want to show you the way. New Architect January 2003, by David Howard
Discusses new approaches to site search, surveys search technologies with
sample vendors. Describes the SinglePoint approach, which uses classification,
term frequency, inverse database frequency, length, timeliness, keyword prominence
and positioning, and ways for users to disambiguate queries. iPhrase and InQuira offer natural-language processing
of search queries, although it's difficult to get users to enter full sentences.
Mentions VIMA's image-recognition software, search engine optimization techniques
and email search.
On Information Week, January 20, 2003 by Tony Kontzer
Discusses the need for useful search engines in corporate intranets. Describes
experiences in Ford's Learning Network using Autonomy;
Bank One and Kaiser Permanente using the Google
Search Appliance for simplicity, low cost and speed; KMPG UK's use of Verity K2 for sophisticated taxonomy
and social networking; Gateway's implementation of iPhrase for support technicians; and EDS using Recommind for role-related search results.
- Searching for Order Government Executive Magazine, January 15, 2003
by Karen D. Schwartz
Government agencies have significant information needs, especially intelligence
analysts investigating possible attacks. Article covers both simple and
complex search and retrieval approaches. The US Defense Technical Information
Center uses software from Thunderstone, Convera and Verity K2, which has
personalization features designed predict what users will ned based on previous
searches. These retrieval engines often integrate with security systems.
Alternately, some US military and intelligence organizations use web-like
search engine software, including the Google Search Appliance, which can
index external sites. Tim Hoechst of Oracle recommends CMS for internal
core documents and a secondary search engine for additional content. This
may address the problems with finding data in various repositories within
an agency, and across agencies. The US State Department is using Convera
RetrievalWare to provide a single point of access to a distributed system.
Describes the GLIS (Government Information Locator Services) standard for
databases and searching, which provides metadata and standardization.
25 E-Commerce Search Engines $99 from 37Signals , January 2003
Research firm performed systematic evaluation on searching online stores.
Criteria were accuracy and relevance for simple searches, handling misspellings,
responding to "mixed" specifications (such as color, size and
material in the same search), automatically expanding to synonyms and related
terms, providing options for sorting and filtering results, and handling
failed searches where no matches were found. They found that 92% of the
commerce sites (including Lands' End, Amazon, Wine.com, QVC and the Apple
Store) found relevant results for standard searches, but most had significant
problems with the other tests. Includes detailed analysis and screenshots
of the results, and rating for each site.
still haven't found what I'm looking for... Search engine technology works
both ways Newspapers & Technology, December 2002 by Hays Goodman
Describes webwide search engine submission issues, and site search tool
for news sites. Examples are Atomz at
Cincinnati.com, which replaced Netscape and Excite search engines, and was
so successful that it is now being used at many of the Gannett newspaper
sites. The Deseret News has used Convera RetrievalWare since 1998, apparently on both internal and public sites.
Data Management: the elephant in the corner (customer access required) the451 Report, November 2002 by Nick Patience and Rachel Chalmers
Describes the huge amounts of unstructured data in enterprise computer networks,
wasted time re-creating this information, and the lack of tools equivalent
to data mining, business intelligence and OLAP. Identifies four application
sectors: content, document and knowledge management; search and retrieval;
categorization, taxonomy and data visualization; XML databases. They evaluate
these sectors from a business perspective, defining strengths, trends and
leading vendors. Point out that the Web expanded the size of the search market
but did not sustain it, while the categorization market is volatile, with
many small and recently-acquired companies. Analysts believe that the leading
relational database vendors (IBM, Oracle and Microsoft) may be able to lead
the unstructured data market as well. Describes Verity
K2 as an integrated search and taxonomy system, preferred over Autonomy,
which is a "Rolls Royce" company in knowledge management and collaboration,
and Ultraseek (Inktomi Enterprise Search),
even with the Quiver taxonomy engine. In Categorization and Visualization, they feel that InXight
is the best among the field including Antarctica, Applied Semantics, ClearForest,
IBM Lotus Discovery, Mohomine, Entrieva (Semio), Stratify and The Brain.
for Value in Search Technology ($40) Gilbane Report Vol 10, Num 7, September 2002 by Sebastian Holt
Analysis of the search market starts with the insight that search is ubiquitous,
but can never come up to user expectations. As a business, search has become
a commodity, so search engine vendors are trying to both improve algorithms
and show the meaningful value to customers. Describes various taxonomy and
clustering techniques, strategies to make search better such as personalization,
examples, taxonomies, and browsing results. Compares manual editing, mainly
for categorization and taxonomies, to automatic processes. Predicts that vendors
will extend search to related topics such as contextual advertising, business
intelligence and domain name suggestions. Highlights innovative search engine
approaches including Albert natural-language
search, ClearForest entity extraction,
and solution providers Autonomy comprehensive
information management, Applied
Semantics ontology and automated classification, and DreMedia video indexing.
- In Search of Search Solutions ($25) Gilbane Report Vol 10, Num 3, April 2002 by Sebastian Holt
Considering the issue of search engines for finding information, this
report attributes the main factors driving growth in the market to significant
increases in the amount and variety of content (now including multimedia,
metadata and structured data), and to the number and variety of users of search
engines. Describes the process of gathering data and converting formats for
indexing, including structural and semantic analysis. Summarizes query processing,
retrieval, relevance ranking, and metasearch (federated search). Provides
a table based on content type and sophistication of search algorithm, and
marks the general capabilities of 32 search engines and taxonomy categorization
tools. Refers to the "principal of least surprise" to recommend
predictable software over products which may be spectacular but unreliable.
Recommends that purchasers test search engines with real data and understand
the weaknesses of the software.
& Content Classification: Delphi Group Report (guest or customer access
required) Delphi, April 11, 2002
Defines the increasing need for tools to organize information and avoid overload,
especially with ambiguous words. Covers the evolution of technologies from
simple to more complex search engines with extreme recall, metadata search,
link ranking, and taxonomy-creation software for hierarchical structures.
Describes how people find information, both for known item searches and for
discovery about a topic, where arrangements of subject categories trigger
associations and relationships. Interactive and iterative browsing within
categories helps locate dynamic data which otherwise might be hard to find.
Searching within categories or zones helps avoid false matches. Taxonomies
can integrate with internal applications such as proposal-generation, CRM
and data mining. Provides a useful checklist for analyzing taxonomy tools.
Survey results of 450 executives, IT or managers at large enterprise organizations
with at least 50,000 documents shows: most spend more than 2 hours per day
searching for information, 73% say that finding information is difficult,
the main impediments were "bad tools" (28%) and "data changes"
(35%). Describes the principals of automatic classification and taxonomy-building,
with details on eleven products. NOTE: vendors paid Delphi to be included
in this report.
Searching Is No Longer Enough. Internet Retailer; April 2002 by Kurt
Discusses online store search engines: Mercado, EasyAsk, Endeca, and Netrics,
referring to the 2001 Forrester report. Describes
customer needs for technology that allows both searching and browsing by category
or product attribute. Includes evidence of the good results at Tower Records
with Endeca, but finds few examples of faceted metadata usage. Includes additional
references to studies of e-commerce search engine demands and limits, suggesting
significant changes from the traditional "relevance ranking" approach.
Changes include recognizing the problems of a long search result, redirecting
searches for brands or products not carried to similar available products,
sorting and drilling down on product attributes, spelling correction and synonyms.
Wave Of Innovations Heats Up Site Search (customer access required) Forrester
TechStrategy Brief, March 1, 2002 by Harley Manning with Christopher Mines
Describes several new features of site search including better relevance as pioneered by the public Google search engine and the Google
Search Appliance (does not cover whether it works in controlled link environments), improved forgiveness to handle misspellings and mismatched vocabularies
as implemented by Netrics, automated
intelligence - classification and categorization
engines such as Autonomy and the new ClearForest, and creative pricing, mentioning EasyAsk,
which in one installation, received a percentage of increased revenue. Describes
the market difficulties including the high number of vendors and shrinking
corporate site budgets.
Future of Search e-doc Enterprise Content Management at Work; January-February
2002 by Dan Agan
Convera marketing director describes the capabilities of enterprise search
engines in increasing recall for search results. Also covers future advances,
such as searching databases, multimedia, email, web and binary documents at
the same time: correlating information from one source with other sources
such as customer transactions, email, and research reports. Advanced computational
linguistics and document vectors could make new connections that were unknown
when the original text was written. Visualization tools could organize results,
show concept clusters and link knowledge within an organization. Multimedia
searching will make video, audio and image files much more useful.
the Firewall - Buying the Proper Search Solution for Your Intranet EContent Magazine; February 2002, by Martin White
Begins with the evolution of search engines from the 1970s through the Web
and to today's Intranets. Describes the high expectations for Intranet search,
the problems of expressing a search in useful vocabulary, defining the appropriate
search engine and presentation of the results to the user in a meaningful
relevance order. Covers the difficulties of evaluating a search engine with
a standard test set or a subset of content: recommends using staff such as
librarians, and expert consultants to help locate a workable system.
Your Site's Search (guest or customer access may be required) Forrester
Report, $595, December 2001 by Kyle Johnson
In covering site search, this report recommends distinguishing ecommerce
product search from customer service / tech support search from information
fulltext search, and choosing a search engine for each function separately.
Describes current state of site search, including interviews with search
admins, discussion of common problems, recommendations for cleaning up content
such as page titles and no-matches pages,
and improving focus. Includes general descriptions of many search engines.
Them to What They Want User Interface Engineering Report, $24, November
2001 by Erik Ojakaar and Jared M. Spool
Provocative research report describes the problems with badly-designed search
engines, such as commerce searches that do not include site information
(such as an Amazon.com search for "return policy"), and those
which are not tolerant of spelling and vocabulary variations.
the Right Search Engine WebTechniques, September 2001 by Steven Champeon
Good background on the considerations required in choosing a search engine,
from size and location of indexing through customization, search zones,
field and metadata searching. Also covers complex search functions, proximity,
fuzzy, concept, synonyms, stemming and regular expressions.
Seeking Search Technology [Commentary]BusinessWeek Online, September
24, 2001 by Robert D. Hof
Quotes eminent analysts from Jupiter, Patricia Seybold Group and Forrester
to support the value of a good search engine on commerce web sites in particular.
Recommends ultrafast updates, as exemplified by FAST on eBay, tolerance
of misspellings [and typos], synonym recognition such as EasyAsk on LandsEnd
and search fields on every page, like Ritz Interactive. Also suggests using
Amazon-like recommendations and providing information stored in private
product databases to web search indexers such as Google.
far and wide for the right data InfoWorld, August 27 / September
3, 2001 by Cathleen Moore.
Describes the value of search engines and categorization as essential elements
of corporate portal infrastructures, to handle the "deluge" of
information within enterprises. Quotes Aberdeen analyst Guy Creese who points
out that without a good way to search, corporations would be "blowing
their investment in the content". Covers recent announcements of search
and categorization features by Autonomy, Verity, AltaVista, iPhrase, and Smartlogik
Portals: The Current Big Thing [Survey Results] InformationWeek,
July 23, 2001
Describes results of a survey of 100 IT and business professionals, who's
companies hope for better productivity and efficiency using a portal approach.
The most search-related aspect is the desire to "improve decision-making"
(around 75% report this as a goal). Enterprise Portals are used mainly by
employees, but half are used by customers and/or business partners, so security
must be a major element. Budgets for enterprise portals are low, 1% to 5%
of overall IT spending, and very few companies are delivering "richer"
content (multimedia, presumably) due to cost, privacy and bandwidth concerns.
Visible and Simple Useit.com Alertbox, May 13, 2001 by Jakob Nielsen
Provides simple rules for search interfaces, with results of research showing
that users are impatient and quick to give up searching when they encounter
problems, type very short words, and rarely look beyond the first page of
Search For Success InternetWeek, March 29, 2001 by Jody Dodson
An expert in customer service for Internet business points out that fixing
the site search may be much more cost-effective than complex CRM solutions.
He recommends considering outsourcing using Ask Jeeves or a similar service;
providing a content directory as well as a search engine; matching search
form to function; knowing the audience and providing the appropriate search
capabilities (such as product codes); and explaining the search functionality
Up the Search Engines to Keep the E-Aisles Clear New York Times,
February 28, 2001 by Lisa Guernsey (registration may be required to read
Discusses the difficulty of locating items in online stores, referring to
the Forrester report of last spring (see below).
Describes the use of thesaurus tools for synonym searching and taking advantage
of database structure in online stores. Quotes the vendors Mercado,
which provides search for WebVan and Tower Records, and EasyAsk,
as well as the chief scientist at Verity.
Web Sites With Depth Web Techniques, February 2001 by Jakob Nielsen
and Marie Tahir
In a discussion of e-commerce sites, these analysts point out that search
engines are an area that could be a strength of online business, but are
generally a waste of time. They recommend making sure that the search engine
covers the "nonproduct needs" such as how to pay, check a gift
registry, and return items. They suggest designing thoughtful results, especially
when there is no item that matches a search exactly (see our report
on No Matches Pages). Another way to reduce the number of results is winnowing, allowing users to narrow down the list.
of Quality Search Clickz, January 18, 2001 by Gerry McGovern
Excellent summary of the basics of a site search engine: covers metadata,
search form interfaces, and search results pages.
Search of Intelligent Search Interactive Week; November 6, 2000 by
Excellent discussion of the problems with standard search engines, including
short critiques of search engines on various sites, including REI, the FBI,
the White House, Office Depot. Also covers how the webwide search engines
deal with search relevance, how categories can help, the Forrester search report and future
search engines get smarter eWEEK; October 27 2000 by Grant DuBois
Short description of two search engines (Mercado IntuiFind and empolis orenge) which work with e-commerce
catalogs to provide better results for searches.
User Loyalty by Improving Search Capability ClickZ Today October
18, 2000 by Paul Bruemmer
Inspired by the Forrester search report, this article
includes marketing information from several search engine developers, including
AltaVista, Mohomine, and Twirlix.
- Search Engines:
The Hunt Is On Network Computing Magazine: October 16, 2000 by Avi
In-depth discussion of search engines for e-commerce and other web sites
covers features and future trends, software vs. services, database vs. text
searching, and open-source
search engines covering ht://Dig and mnoGoSearch (formerly UdmSearch).
The testing included indexing over 150,000 pages, and covered administration
tools, customization, search features, relevance ranking and search logs.
Products were Ultraseek (then Inktomi
Search) (which won Editor's Choice), AltaVista
Search, and Excalibur RetrievalWare,
services were Atomz Enterprise Search and Searchbutton Corporate, which
has since addressed some of the shortcomings reported. Also included an email
poll of Network Computing readers.
Search Of... If you want good search on your site, commit to doing it right
Ð unless you want to alienate visitors Industry Standards, October
9, 2000 by G. Patrick Pawling
Describes problems with site search, such as searches which expand names
too far (Seger to Segarra for example) and summarizes the Forrester Report on Search. Reports that one company
put "buy" buttons on search results pages and found 30% of its
orders came from there. Mentions Mercado options to adjust search results for e-commerce purposes. Describes Mohomine
automated summarization and categorization tools. Quotes Jupiter Communications
analyst Lydia Loizides as estimating the cost at between $50,000 and $2
million. Describes alternate approaches, such as a conversational or interview-driven
search, and choosing an area, such as multimedia, to reduce the number of
For A Better Match Inter@active Week, July 31, 2000 by Charles Babcock
Describes problems with searching and several approaches to improving
results, from implementing better sets of keywords and structured data to
using Information Retrieval algorithms
to define concepts and clusters.
'Search and Split' Real Estate Online, July 21 2000 by Jeff Linnell
Looking at site searching for real estate web sites -- recommends MondoSearch for results categorization features.
Site Search Engines Byte.com WebTools, July 20 2000 by Bruce Stewart
Nice description of the process of registering and implementing a remote
search service, with a listing of seven services and summary of their features.
Search Stink? Forrester Report, June 2000 by Paul Hagen
Focused on ecommerce and B2B sites, this report describes the importance
of site searching, and the problems with standard search engines. They emphasize
that a simple term frequency algorithm for relevance rankings will often
fail to return the best matches at the top of the list. It also points out
how important content management, metadata and information architecture
is for good search results.
- Recommendations include building a vocabulary and synonym listings so
that searches for a specific term will find pages with all variants and
equivalent terms, improving content management, and implementing good
user interfaces to the search engine. They even have a section on the
benefits of fixing search, showing how it makes bottom-line sense.
- Cost of search, mainly for e-commerce sites, is given at $150,000 for
a search engine, $150,000 to integrate with existing databases and $60,000
for user interface and testing, along with an estimate of $4 per page
or item for page titles, descriptions, removing duplicates and creating
a controlled vocabulary.
- SearchTools Analysis of the Forrester Report
- I like this report a lot: it's clear on how important site search is
and how traditional algorithms fail to retrieve and sort the results well.
However, they don't emphasize the special issues that may arise in searching
structured data (such as product catalogs), and they mix up retrieving relevant items with ranking (so that the relevant items appear on
the first page). In addition, they don't understand how much people like
a simple search field -- it's a Web convention that is now inescapable.
And the cost is appropriate for a large e-commerce site, rather than a smaller
or simpler site, such as an online magazine, small store, or corporate site.
keep users happy, ensure the search tool is matched to your audience's needs Infoworld, May 5, 2000 by Laura Wonnacott
Nice summary of the importance of implementing a usable search engine,
including the forms of the queries, results options, search failure and
Search to Your Site webmonkey, March 1, 2000 by Avi Rappoport
Describes remote search hosting services,
how they work, how to set up your site and customize the look and feel.
Older but still useful listings: overviews
of search engines from 1996-1999.
Page Updated 2008-09-17