As of January, 2012, this site is no longer being updated, due to work and health issues
Price: Free (open source) under the GNU License
Platforms: Any Unix with C++. STL and GNU make, and Windows 95/NT/2000
(under Cygwin)
Features
- Indexes local files, and remote web sites using a robot spider based on wget
- Can index and search meta tags including Dublin Core, ALT, attributes
Indexes a variety of file types such as mail, news, Unix manual pages, PDFs, Postscript, LaTeX and RTF documents, ID3 tags for MP3 files, and Microsoft Office docs. - Can exclude text within HTML or XHTML elements, such as headers and footers.
- Automatically excludes frequent terms
The indexing architecture is modular and incremental, including on-the-fly filters before indexing. - Heavily geared for English
- Queries can use Boolean And, Or, Not, right truncation with wildcard character, stemming.
- Can run as a standalone search server.
- Results display shows title, file URL, size and relevance rank score
- XML DTD and schema formats for search results