Consider the increasingly common situation in which a library wants to expose its digital content to its users. Suppose it knows that its users prefer search engines that search the contents of many sites simultaneously, rather than site-specific engines such as the one on the library's Web site. In order to support the preferences of its users, this library must make its contents accessible to search engines of the first type.The easiest way to do this is for the library to convert its contents to HTML pages and let the harvesting search engines such as Google and Yahoo! collect those pages and provide searching on them. However, a serious problem with harvesting search engines is that they place limits on how much data they will collect from any one site. Google and Yahoo! will not harvest a 3-million-record book catalog, even if the library can figure out how to turn the catalog entries into individual Web pages.An alternative to exposing library content to harvesting search engines as HTML pages is to provide a local search interface and let a metasearch engine combine the results of searching the library's site with the results from searching many other sites simultaneously. Users of metasearch engines get the same advantage that users of harvesting search engines get (i.e., the ability to search the contents of many sites simultaneously) plus those users get access to data that the harvesting search engines do not have. The issue for the library is determining how much functionality it must provide in its local search engine so that the metasearch engine can, in turn, provide acceptable functionality to its users. The amount of functionality that the library provides will determine which metasearch engines will be able to access the library's content.Metasearch engines, such as A9 and Vivísimo, are search engines that take a user's query, send it to other search engines, and integrate the responses.1 The level of integration usually depends on the metasearch engine's ability to understand the responses it receives from the various search engines it has queried. If the response is HTML intended for display on a browser, then the metasearch engine developers have to write code to parse through the HTML looking for the content. In such a case, the perceived value of the content determines the level of effort that the metasearch engine developers put into the parsing task; low-value content will have a low priority for developer time and will either suffer from poor integration or be excluded.For metasearch engines to work, they need to know how to send a search to the local search engine and how to interpret the results. Metasearch engines such as Vivísimo and A9 have staffs of programmers who write code to translate the queries they get from users into queries that the local search engines can accept. Metasearch engines also have to develop code to convert all the responses returned by the local search engines into some common format so that those results can be combined and displayed to the user. This is te...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.