As of January, 2012, this site is no longer being updated, due to work and health issues
Faceted Metadata Search and Browse
Metadata is information about information: more precisely, it's structured information about resources. This can be a single set of hierarchical subject labels, such as a Yahoo or Open Directory Project category. More often, the metadata has several facets: attributes in various orthogonal sets of categories. This is often stored in database record fields and tables, especially for product catalogs.
Examples of faceted metadata include:
- Music catalog: songs have attributes such as artist, title, length, genre, date...
- Recipes: cuisine, main ingredients, cooking style, holiday...
- Travel site: articles have authors, dates, places, prices...
- Regulatory documents: product and part codes, machine types, expiration dates...
- Image collection: artist, date, style, type of image, major colors, theme...
In all these cases, there is no single way to provides navigation for everyone: users have such disparate needs. One person might want to look through all the U2 albums, while another is looking for classical guitar or 1940s jazz releases.
Traditional Approaches to Structured Data Access
Traditional field-based or parametric search engines for structured data have used a command line or provided a form to fill out:
AU:rosenfeld TI:web PB:oreilly
These require a lot of knowledge on the searcher's side: they have to know the values or choose from a popup menu. If they include too many parameters, they will probably not find any records that match their requirements -- a dead end. The possible values are hidden from the searcher, so all the work the editorial staff has done in defining and assigning attributes is lost.
Full text search engines can index all HTML metadata or gather data from multiple database fields or tables. Full text search wipes out the value of the metadata: a number 3 is just a number, not a size, price, product ID or other meaningful number, as it is in context of the tagged page or database record. Similarly, it's hard to know whether a recipe, for example, has chili pepper as a significant ingredient or minor flavoring. While many searches are just fine without that information, there are other cases where providing that context would be extremely helpful.
Faceted Metadata Search Solution
A good solution to these problems involves exposing the facets in dynamic taxonomies, so that the search user can see exactly the options they have available at any time. They can switch easily between searching and browsing, using their own terminology for search while recognizing the organization and vocabulary of the data.
Features for metadata search include
- Displaying aspects of the current results set in multiple categorization schemes
- Showing only populated categories, no dead-ends (links leading to empty lists)
- Displaying a count of the contents of each category, warning the user how many more choices they will see
- Generating groupings on the fly, such as size, price or date
- Drill down by facet, so a diamond buyer could choose price, clarity, size and setting.
- Adding special facets within categories: a Yellow Pages site would want to show cuisine and location for restaurant listings but not plumbers.
|Tower Records (Endeca)||BeachHouse.com (Siderean)|
|Do a search for your favorite artist or record title, and you'll see a list of search results, and on the left, a set of options including Genre, album feature, price range, format and more.||After doing a search, the mid-right listing shows options for the matching articles, allowing travelers to choose the one that is most likely to answer their questions.||In this case, a search for beach houses which have internet connections finds some results, and the interface allows vacationers to search and browse by the country, cost, number of bedrooms, and other criteria.|
Applying Faceted Metadata Approaches to Unstructured Text
Most site and intranet documents don't have such rich tagging. They may have a title, modification date, and author, and almost all have a location and file size. However, there are tools available to perform entity extraction and external tagging, recognizing companies, people, products and other standard text. Using these tags, even unstructured documents can be approached with faceted metadata searching.
Faceted Metadata Search Resources
- NCSU Adds Faceted Navigation to Library Catalog
- The North Carolina State University Library OPAC (Online Public Access Catalog) is now powered by Endeca. Traditionally, these catalogs were either homegrown or based on DBMS systems, neither of which were able to provide much context or relevance ranking. Searching for a very general topic such as civil war (distressingly common) brings up both a list of 9,179 results and two sets of facets: the Library of Congress Subject Headings, and such options as Subject Topic, genre, format, location, region, era, language and author. Each of these has a preview count, so a searcher knows that choosing "History: America" will limit the results to 2,726, while there are only 49 works in Spanish, 5 in the Textiles Library, and 23 written by Stephen Crane. The power of this approach is in exposing the options rather than hiding them behind a form, and this is an excellent implementation of it. (link found via pixelcharmer)
- Libraries and Faceted Metadata
- Presentation by Avi Rappoport of SearchTools about the likely value of using the faceted metadata approach for library catalogs. Presented at Internet Librarian 2004.
- Flamenco Project
- UC Berkeley professor Marti Hearst's seminal research on how faceted metadata can provide a dynamic information-architecture context for browsing and searching on web sites. She reports on extensive usability studies done with both textual and image databases.
- Peter Merholz on Faceted Metadata Search (early 2002)
- An eminent information architect explains the value of creating and searching faceted metadata.
- SearchTools Report on Metadata
- Faceted metadata is not just for search: it's a way of describing content in its many aspects. It is more flexible and extensible than traditional hierarchical organizations, because it does not attempt to put things in one category only. There are many systems for creating and maintaining faceted classifications.
Facets and Multiple Angles of Access Information Flow Newsletter, August 2002 by Ramana Rao
- Insights into faceted metadata from the founder of Inxight, starting with how an information seeker might look for a resource. Describes the actual challenges of developing a faceted system with a diverse collection of documents. Points out that the value in this approach is by removing limits to accessing resources.
- When Search Is Not Enough: Guided Navigation from Endeca IDC Bulletin, May 2002 by Mary Flanagan and Susan Feldman
- A commissioned report, describing the problems of information overload in search results and the faceted metadata search implemented by Endeca's Navigation Engine. Provides business background, competitive analysis and software overviews. Uses Tower Records as a case study.
- Dynamic Taxonomies: A model for Large Information Bases (link to PDF) IEEE Transactions on Knowledge and Data Engineering, May/June 2000 by Giovanni M. Sacco
- Academic paper starts with the problems of searching and browsing huge data sets, and of expressing multiple taxonomies. Describes dynamic taxonomies derived from documents by analyzing their concepts, and a visual framework to browse the content. This allows users to find appropriate documents in a few clicks, even in very large data sets. Because the system only displays categories containing documents that fit the criteria, the users will never be in a situation where there are no documents. (See Universal Knowledge Processor page).
Faceted Metadata Search Engines
- Atomz (WebSideStory) - faceted search for online stores and catalogs as a remote service
- Convera - mainly faceted by topic
- Endeca - online stores, directories, and intranets
- FAST - very large directories, stores, etc.
- Flamenco - working systems as described by Prof. Hearst of UC Berkeley
- i411 - directories, online content, stores
- Inxight - dynamic clustering also provides faceted access to intranets with semi-structured data, such as pharmaceuticals
- Mercado - online stores
- Siderean Seamark - online stores, content sites
- Solr - free open source Java-based faceted metadata search and browse
- Universal Knowledge Processor - examples include online catalog, image database, and newspaper