As of January, 2012, this site is no longer being updated, due to work and health issues
SearchTools.com: Related Topics
Distributed Indexing with SOIF and RDM
Harvest and WAIS pioneered distributed indexing technology, allowing local servers to gather and index data and then pass it on to search servers. This allows indexes to work together and update as needed, rather than forcing each search indexer to crawl each site separately. This can improve site searching of very large sites, as well as reducing server overhead.
Harvest's data format and syntax is called SOIF (Summary Object Interchange Format). Netscape spent some time creating the Resource Description Messaging (RDM) mechanism, which uses SOIF as it's syntax to define data and HTTP to transmit it. RDM also supports a schema to describe each specific SOIF, an RDM Server Description and a flexible mechanism to browser or search the data without query language dependence.
This was a hot topic in late 1996 and early 1997, but has a very low profile
now. It's not clear to me how SOIF and RDM work with XML, RDF and other new
standards.
- Task Force on Cooperative
Hierarchical Indexing Coordination
- European effort to provide distributed indexing, including the ROADS
and DESIRE projects.
- TF-CHIC:
Library of Distributed Indexing-Related Documents and Sites
- Impressive list of links with some annotations.
- Standards
in a Distributed Indexing Architecture 1998
- Interesting information from the group on Cooperative Hierarchical Indexing
Coordination, covering search brokering, indexing and gathering, to be
presented at the TERENA (Trans-European Research and Education Networking
Association) Networking Conference
in October, 1998.
- CIP (Common Indexing Protocol) Working Group of the IETF (probably in abeyance)
- Web Collections using
XML Submission 1997
- Draft document from a group at Microsoft for W3C on using XML to provide
hierarchical structure for web collection data. Mentions SOIF but does not
elaborate on the relationship.
Followup from the author: We more or less concluded that you can just
use XML to easily represent structured data and that the Web Collections
spec wasn't needed. Part of it lives on in CDF (the bit that talked about
a schema for describing a web site), part lives on in the WebDAV protocol
(being able to get the properties of Collections on web sites, set them,
etc.), and part is in XML-Data (which gives a structure for specifying schemas
for structured data).
- W3C
Distributed Indexing Workshop: RDM/SOIF 1996.
- Short summary of the terms from a Netscape workshop with W3C.
- The Common Indexing
Protocol 1997
- A proposal from the IETF's FIND working group allowing servers to use
SOIF and other standards for global index server interchange. This allows
servers to provide context and format information, dynamic data, and meta
data about their own information. This data can be distributed via email
or HTTP, "pushed" when the data is updated, and controlled for
compensation purposes.
- Example
Usage: IBM Grand Central Station
- Long article about IBM's use of SOIF in their Grand Central Station
search-improvement project.