Abs t ractThe general model and simulation algorithms for bibliographic retrieval systems presented in an earlier paper I are expanded.The new model integrates the physical as well as the logical and semantic elements of these systems.A modified algorithm is developed for the simulation of user relevance judgments, and is validated, by means of recall-precision curves and a Kolmogorov-Smirnov test of recall, for two test collections.Other approaches to goodness-of-fit testing are suggested.Zeigler 2 defines a real system as a part of the world which is a source of behavioral data, a model as a set of instructions for generating such behavioral data, and a computer simulati'on as the computational process which, by means of a suitable encoding of the model instructions, can actually generate the data.The real systems described by the model and simulation algorithms in this paper are bibliographic retrieval systems, i.e., systems which provide data in the form of references or document descriptions relating to an informational query.These systems presently exist in a variety of commercial, experimental, automated, and non-automated forms.The purpose in modeling and simulating such systems is to attempt to optimize certain aspects of their operation, notably the effectiveness of the document and query representations and the efficiency of the accessing algorithms and associated data structures.Simulation permits a controlled variation of such parameters as indexing exhaustivity, vocabulary size, document set size, query exhaustivity, and search expression structure,
Two dynamic models of library circulation, the Markov model originally proposed by Morse and the mixed Poisson model proposed by Burrell and Cane, are applied to a large eleven‐year university circulation data set. Goodness of fit tests indicate that neither model fits the data. In both cases, the set of non‐circulating items is larger than that predicted by the model.
Since the introduction of the Zipf distribution, many functions have been suggested for the frequency of words in text. Some of these models have also been a p piled to the distribution of Index terms in a set of documents. The models are of two forms: rank-frequency and frequency-size. The former serve well to describe the distribution of high-frequency terms; the latter the distribution of low-frequency terms. In this article, a split model is proposed, which uses both a rank function for the high frequency terms and a size function for the low frequency terms, with the point of transition being determined either empirically or by rule. This model is fitted to the marginal empirical term distributions for four document datasets. Distributions to describe index term exhaustivity and term co-occurrence are also considered briefly.
The general model and simulation algorithms for bibliographic retrieval systems presented in an earlier paper are expanded. The new model integrates the physical as well as the logical and semantic elements of these systems. A modified algorithm is developed for the simulation of user relevance judgments, and is validated, by means of recall-precision curves and a Kolmogorov-Smirnov test of recall, for two test collections. Other approaches to goodness-of-fit testing are suggested.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.