Abstract-The major emphasis of this paper is on analytical techniques for predicting the performance of various collection fusion scenarios. Knowledge of analytical models of information retrieval system performance, both with single processors and with multiple processors, increases our understanding of the parameters (e.g., number of documents, ranking algorithms, stemming algorithms, stop word lists, etc.) affecting system behavior. While there is a growing literature on the implementation of distributed information retrieval systems and digital libraries, little research has focused on analytic models of performance. We analytically describe the performance for single and multiple processors, both when different processors have the same parameter values and when they have different values. The use of different ranking algorithms and parameter values at different sites is examined.
When do information retrieval systems using two document clusters provide better retrieval performance than systems using no clustering? We answer this question for one set of assumptions and suggest how this may be studied with other assumptions. The "Cluster Hypothesis" asks an empirical question about the relationships between documents and user-supplied relevance judgments, while the "Cluster Performance Question" proposed here focuses on the when and why of information retrieval or digital library performance for clustered and unclustered text databases. This may be generalized to study the relative performance of m versus n clusters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.