As of January, 2012, this site is no longer being updated, due to work and health issues
Search Tools Code Library Report
Lucene Search Engine
now part of Apache Jakarta
Platform: Java (designed for cross-platform use), ported to every unix flavor and many other languages
Price: free, open source, Apache Software License
- Version 2.9 features below
- Very fast indexing, minimal RAM required
- Index compression to 30% of original text
- Indexes text and HTML, document classes available for XML, PDF and RTF
- Search supports phrase and Boolean queries, plus, minus and quote marks,
- Allows single and multiple character wildcards anywhere in the search words,
fuzzy search, proximity
- Will search for punctuation such as + or ?
- Field searches for title, author, etc., and date-range searching
- Supports most European languages
- Option to store and display full text of indexed documents
- Search results in relevance order
- APIs for file format conversion, languages and user interfaces
Articles & Reviews
- JavaGuru Lucene FAQ jguru.com,
updated as of July 2002 by Otis Gospodnetic
Helpful information for indexing, searching, updates, configuration, etc.
Lucene search engine: Powerful, flexible, and free JavaWorld, September
2000 by Brian Goetz
Thoughtful description of implementing the Lucene search engine for searching
Eyebrowse email archives, which are stored in a mySQL database. Discusses
the features, including the powerful indexing and updating scheme in some
detail, and includes code snippets for calling the code.
Lucene 2.9 features - Sept. 24, 2009
- "Near real-time" search: a new way to search the current in-memory segment before the index has been written to disk.
- FieldCache - takes advantage of the fact that most segments of the index are static, only processes the parts that change, save on time and memory. Also improved efficiency.
- NumericField and NumericRangeQuery - (previously called TrieRange). This improves the Lucene number indexing, and is faster for searching numbers, geo-locations, and dates, faster for sorting, and hugely faster for range searching.
- Faster wildcard and prefix searching, and a reverse string filter to enable leading wildcards
- Lucene Local (Contrib / Spatial) - can limit queries based on geographic location
- Faster searching over multiple segments
- Better and faster term vector highlighting of match terms in context on results page.
- New Query Parser framework, supports additional syntaxes
- Improvements to Payloads (metadata about index terms)
- TokenStream strong typing options
- Improved transaction processing
- Better Chinese, Arabic, and Persian support
Backward and Forward Compatibility
There are significant changes in version 2.9 - described in the changes.txt file or the web site (change log). A very few items are not backward compatible and several classes are deprecated.
All applications should re-compile against the new Lucene 2.9 JAR 2.and test carefully.
Version 3.0 will no longer support Java 1.4 and deprecated classes.
As soon as Lucene 2.9 is released,
Carrot2 3.1.0 will come out with bug fixes Solr 1.4 will use Lucene 2.9 JAR, coming soon, few weeks they hope
Note: this content extracted painfully by Avi from the Lucene site/wiki/JIRA/mailing list archive, and clarified by Grant Ingersoll's webcast sponsored by Lucid Imagination. I will be happy to fix mistakes and clarify confusion, just comment or send a message and I'll fix it.
Last Update: 2009-09-24