As of January, 2012, this site is no longer being updated, due to work and health issues
Concordances, KWIC and KWOC
The idea of showing the words with their surrounding words comes from hand-created concordances, leading to the Information Science concept of KWIC (Key Word In Context), permuting all content in a text so that each word appears in a central column, alphabetically. This was developed in the late 1950s by Hans Peter Luhn of IBM and used for projects including automating concordances of Shakespeare, chemical listings, and library catalogs.
Here is an example, showing lines from the English and Scottish ballads collected by Francis James Child):
lime (14) 79[C.10] 4 /Which was builded of lime and sand;/Until they came to 247A.6 4 /That was well biggit with lime and stane. 303A.1 2 bower,/Well built wi lime and stane,/And Willie came 247A.9 2 /That was well biggit wi lime and stane,/Nor has he stoln 305A.2 1 a castell biggit with lime and stane,/O gin it stands not 305A.71 2 is my awin,/I biggit it wi lime and stane;/The Tinnies and 79[C.10] 6 /Which was builded with lime and stone. 305A.30 1 a prittie castell of lime and stone,/O gif it stands not 108.15 2 /Which was made both of lime and stone,/Shee tooke him by 175A.33 2 castle then,/Was made of lime and stone;/The vttermost 178[H.2] 2 near by,/Well built with lime and stone;/There is a lady 178F.18 2 built with stone and lime!/But far mair pittie on Lady 178G.35 2 was biggit wi stane and lime!/But far mair pity o Lady 2D.16 1 big a cart o stane and lime,/Gar Robin Redbreast trail it
There were also KWOC (key word out of context) systems, which puts the key word at the start, and were found useful for automatically creating left-aligned alphabetical listings.
Showing the word in context in search results
From its beginning, Google has displayed results items with search terms in context, which they call "snippets", but they have never given much credit to KWIC or its inventor, Earlier web search engines, such as AltaVista and HotBot, had used the contents of the HTML page "description" meta tag, auto-summarize the text, and/or extracted content from the beginning of a page, attempting to describe the page as a whole. Google went a different direction, with what they called a "sneak preview" of the found documents, bolding all the matched search terms, as far back as 2001:
There's even a video from 2007, describing the snippet extraction process.
Other major web search engines, such as Yahoo and Microsoft Live search, and many enterprise search engines use the match term in context as well -- it's not just a Google thing. Any snippet of text taken from the page is also much less likely to include search-spam (words designed to be found and ranked in results, but not terribly meaningful to page readers). So nearly all webwide search engines use match words in context, in simple self-defense.
Each search engine has their own algorithm to define the best text to display. Factors in the decision about which context phrases to show may include:
- if the search terms are matched as phrases, show at least one phrase match
- text where search terms are in close proximity may provide best context
- try to show an instance of each search term
- closer to the beginning on a page tends to be more important use of words
Benefits of showing match terms in context
Displaying the search terms in the source text context is a useful way to leverage human/search engine interaction. It makes the result items much more transparent: showing why the search engine matched those terms within that document. This process accords with the concepts in Information Foraging Theory, which describes the psychological processes of making choices when faced with large chunks of information.
Page Created: 2008-10-21