As of January, 2012, this site is no longer being updated, due to work and health issues
Search Tools Product Report
Google Search Appliance (GSA) and Mini Search Appliance
Google Search Appliance product info
Mini Search Appliance product info
See also: Google Custom Search services
Platform: Google's customized Linux on supplied hardware
Prices: (note that appliances are not supported after two years: customers are expected to replace them)
Mini Appliance: 1 year with tech support: $2,000 for up to 50,000 documents; $3,000 for up to 100,000; $6,000 for up to 200,00; $9,000 for up to 300,000. Additional single year of tech support: $1000.
$30,000 for up to 500,000 documents and two years of tech support, no additional pricing given
GB-5005 2007 pricing not given (old pricing: $230,000, includes one or two collections
of up to 1.5 million documents each, secure system crawling)
GB-8008 2007 pricing not given (old pricing: $450,000 for an an 8u server rack with
secure system crawling, additional load balancing features, and capacity for
up to 5 collections of 4 million documents each.
- Educational discounts are available
- Finding content
Fast and flexible link crawler follows web links using Google robot technology.
Excellent control of exactly which servers, hosts, directories and URLs are
indexed or excluded, with wildcard and regular expression support, along with interactive including testing.
- Supports proxies and redirection.
- File system crawling using directory browsing (aka web-enabled file system) using Microsoft IIS, SMB and CIFS (Common Internet File System) over TCP/IP.
Note that file system crawling does not include security and access controls.
- Can be scheduled full crawls, or continuous incremental crawls with options to specify which URL patterns should get frequent or rare revisits. Adaptive recrawling, based on change interval, is betwen 2 days and 20 days.
- Excellent introductory document on the Google site: Administering Crawl
Crawls password protected sites, including Basic Authentication (HTML passwords), forms-based authentication, NTLM versions 1 and 2.
- Multiple "collections" are subsets of the main index, for searching within zones, directories, or other url-based patterns.
- Real-time reporting of indexing progress, including special listings
for locations and problems.
- Offers a drill-down display only, no overall report
- Note: the Mini seems to have a 20-minute delay before crawling or recrawling any specified URL.
- GSA special features:
- Advanced security and access control
- Automated feed system pushes URLs and content to the search indexer
- Web feeds - a list of URLs and optional metadata for the web crawler
- Content feeds - a list of URLs, their associated content, and optional metadata (the normal crawler does not read the actual content at that URL). This is great for database indexing.
- Feeds give search admins much more control over what gets indexed and exactly when.
- Feed format can be HTML or XML, using HTML forms, pythion, C#, etc., and must come from a trusted IP address.
- Feed Protocol Developer Guide
- Database Crawler
- Sends SQL queries and indexes results in Oracle, IBM DB2, Microsoft SQLServer, MySQL, and Sybase.
- Connector Framework and Third-party connectors for SharePoint, Lotus Notes/Domino, Documentum and LiveLink.
Handles over 200 file types including HTML, PDF, PostScript, RTF,
Microsoft Word, Excel, PowerPoint, WordPerfect, and Lotus (using their own converters for common formats and the Oracle (Stellant) Outside-In file format readers for the obscure ones).
- Note: in version 4.6.4, Microsoft Office 2007 is not yet supported.
Identifies 28 language and most current character sets, and stores that info with the document.
- Indexes metatag data and document properties
- Does not index punctuation
- Indexes HTML documents up to 2.5 MB, ignores the rest.
- Non-HTML documents up to 30 MB: converts to HTML, if shorter than approx. 4MB, indexes the first 2 MB. If larger than 4 MB, ignores the document.
- Documents larger than 30 MB: ignores.
- Query Processing
Defaults matching all words in the query
- Matches without case-sensitivity
- Uses the Google query language, including Internet Query Operators - (minus) and "" (quotes) , along with the Boolean OR
- Advanced search includes special fields such as URL and title, just like the google.com search
- Metadata search options include exact matching, partial matching and whether the metadata field exists
- Displays spellchecker "did you mean?" for misspelled or mistyped words.
- GSA: Query stemming and synonym lists for English, French, Italian, German, Spanish, and Portuguese.
Retrieves all pages matching every query term
- Includes duplicate pages, but hides them by default, with a parameter to override this
- Search Suggestions are called "KeyMatches" and look like Google ads -- there's a nice web-based interface to edit them, or the appliances will import them from comma delimited files.
- Synonyms for acronyms and jargon, displayed as suggested alternative queries
- "One Box" option for real-time queries to databases
- LDAP and other directories, Cognos Business Intelligence modules, Lotus Domino modules, NetSuite, MS Exchange, SAS reports, Salesforce modules, etc.
- OneBox Developers Guide
- Relevance ranking uses all the Google algorithms, including PageRank. While this is much less important for sites and intranets, the other algorithms provide high-quality relevance ranking
- GSA: relevance weight can be adjusted via the "Source Biasing" interface.
- Search form and Results UI
- Multiple "front ends" can specify page layout, search suggestions and synonyms, limit to specified domain, language, file type or meta tag value, and URLs to be ignored.
Default looks like the Google web search results.
- Document dates may not appear if the search engine does not trust the server dates
- Hides duplicate pages based on snippet similarity and/or host, server or directory co-occurrance
- Checks pages for authorized access before including in results list. May be slow if many pages have many kinds of authorization.
- Basic results page customization: logo, text
and link colors.
- Option to use XSLT and set many variables: admins can go further and write XSLT code, but it's not designed as a scripting or layout language.
- Complete XML search results -- can use a scripting language or presentation program to manipulate and display them.
- Metrics and Search Analysis
- Google offers up to 25 queries per second per server.
- Search status shows recent queries per second, per collection
Search log reports provide a monthly, weekly or daily snapshot of search activity, segmented by collection
- For each time period, the report shows the top 100 queries, top no match searches, traffic by day and hour, etc.
- Note: search logs may be deleted after two months, to avoid filling up the server disk
- Crawl status shows documents served, crawling rate and errors.
- Crawl reporting provides interactive drilldown through directories to see the status of each page
- GSA version G70 has a "list format" which displays all the crawled URLs and status
- Completely done via web browser admin.
- SSH option for remote support from Google staff.
- Admin delegation allows others access to all or limited set of features, collections and front ends.
- Note: These Mini can do a remote shut-down but require a physical power cycle to start up again.
Articles and Reviews
- Taylor Woodrow Construction And Google Team Up To Improve Employee Intranet Press Release, August 9, 2007.
This UK-based construction company needed to search the intranet beyond their CMS content. Installing the GSA provided this functionality, users noticed the improved search immediately, and the HR department reports reduced calls to the support department by 25% (5 per day).
- CAMH tackles indexing with Google search appliances Business.ca, June 7, 2007 by Rafael Ruffolo
Short case study of GSAs at the Canadian Centre for Addiction and Mental Health. They use two search appliances (one for internal documents, the other external), looking for simplicity and Google's relevance ranking. The GSA compares favorably to previous search engines in both quality and configuration -- having a separate appliance avoids competition for server resources. The search administrator had some installation glitches, wishes for automated email reporting, but likes the speed, easy maintenance, and reduction in negative feedback on search make it a success.
- Review: Google Mini 2.2 THE Journal, May 31, 2007, by Dave Nagel
Covers setup, features, upgrading from 2.0, and general recommendations.
- Google - Mini 2.0 review IT Reviews, November 28, 2006
Informative review mentions version 2.0 features, and describes problems with indexing Windows network shares, unlimited collections, facilities to set crawl frequency on specific sections, limitations of synonyms, results customization, and network file security issues. Verdict is positive, prices in UK pounds.
- Search Moves Well Beyond Google Information Week, June 12, 2006 by Thomas Claburn
Describes how FirstGov has switched from a FAST implementation by IBM to a combination of MSN Search and Vivisimo Search and Clustering, which now brings back results from local jurisdictions as well as the federal government. The administrator in charge likes the clustered category links. Mentions Google Search Appliance market share, and Autonomy's powerful but complex features. One search administrator points out the need to invest resources to make best use of search systems.
- Search Yourself: A Review of the Google Mini SmallBusinessComputing.com, May 24, 2006 by Aaron Weiss
Describes installing and configuring a Google Mini Appliance, and the functions offered by the appliance. Mainly describes the features, with a note about how loud the blue pizza-box hardware was.
ignites search technology Network World, September 13, 2004 by Ann
An overview of the problems enterprises have with finding information, including
complex queries and multiple repositories. Search administrators recommend
trying before buying, defining problems such as dynamic updates, repetitive
language, context, scale and query options. Search engines mentioned include
the Google Search Appliance, and iPhrase.
- City Ogles
Google Impact information Week, January 22, 2003 by Tony Kontzer
The City of San Diego was unhappy with their search engine, an old version
of Verity bundled with another program. Instead of a costly upgrade to Verity
K2, the city chose to install the Google Search Appliance, to great appreciation
from employees and citizens. It also indexes Documentum, Sun One and database
applications, improving access and simplifying tasks. The article is slightly
misleading, the Verity application was not an appliance, and the Google Search
Appliance crawls the application data via their Web interfaces, which is not
explained in the article.
- CareerBuilder Passes Google For Another Search Engine InformationWeek,
January 22, 2003 by Tony Kontzer
CareerBuilder, a recruitment and job-finding service, was limited by its original
search engine, Microsoft Index Server.
After assessing a number of search engines, including the Google Search
Appliance, they choose Fast Data Search,
mainly so they could adjust search results and provide pay-for-position services.
The speed of the FAST search engine, up to 140 searches per second, and near-real-time
index updates were other considerations in the choice.
On Information Week, January 20, 2003 by Tony Kontzer
Discusses the need for useful search engines in corporate intranets. Describes
experiences in Ford's Learning Network using Autonomy
Bank One and Kaiser Permanente using the Google Search Appliance
for simplicity, low cost and speed; KMPG UK's use of Verity
for sophisticated taxonomy and social networking; Gateway's implementation
for support technicians;
and EDS using Recommind
Takes Aim At The Enterprise InternetWeek.com, June 18 2002, by Richard
Reports from Google Search Appliance users -- National Semiconductor
says it's easy to deploy and end-users are predisposed to like it. Sur La
Table says the Google product was significantly cheaper than other similar
search engines, and delivers results in XML for programmatic processing
using open standards.
- Google Search Appliance
version 3: SearchTools Analysis SearchTools.com, September 30, 2002
by Avi Rappoport
Updated review with new features.
- Google Search Appliance version
1: SearchTools Analysis SearchTools.com, March 23, 2002 by Avi Rappoport
Detailed review of the features of the search engine with screenshots, covering
robot crawling, indexing features, search features, results customizing and
reports. Conclusion: This is an excellent search engine for HTTP-accessible
content, with comprehensive administration tools, wonderful reports, familiar
search features and powerful customization options. However, it doesn't have
the significant advantages over the competition that the public Google search
does: relevance ranking is similar to other high-quality search engines. If
you want more control over crawling schedules or relevance weighting, or you
need to integrate with enterprise security systems, textual databases, content
or document management systems, this version can't accommodate you. But it's
effective, fast and particularly well-priced for Intranets or web sites with
millions of documents.
Wave Of Innovations Heats Up Site Search (customer access required) Forrester
TechStrategy Brief, March 1, 2002 by Harley Manning with Christopher Mines
Describes several new features of site search including better relevance
as pioneered by the public Google search engine and the Google Search Appliance
(does not cover whether it works in controlled link environments), improved
forgiveness to handle misspellings and mismatched vocabularies as implemented
by Netrics, automated intelligence - classification
and categorization engines such as Autonomy
and the new ClearForest, and creative pricing, mentioning EasyAsk,
which in one installation, received a percentage of increased revenue. Describes
the market difficulties including the high number of vendors and shrinking
corporate site budgets.
- Google's enterprising
search CNET News.com February 12, 2002 by Whit Andrews, Gartner Analyst
Describes the Google Search Appliance features, including competitive pricing
and "appealing" features for searching common document formats.
Points out some problems with access to databases, security and access control,
and PageRank limitations in the structured design of business sites and intranets.
Also mentions difficulty integrating with business intelligence and customer
relationship management applications.
Page Updated 2007-10-03