A novel probabilistic retrieval model is presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates the local relevance decision-making for every location of a document, and combines all of these "local" relevance decisions as the "documentwide" relevance decision for the document. The significance of interpreting TF-IDF in this way is the potential to: (1) establish a unifying perspective about information retrieval as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models. Our novel retrieval model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the term-frequency factor of the ranking formula can be rendered into different term-frequency factors of existing retrieval systems. In the basic ranking formula, the remaining quantity − log p(r|t ∈ d ) is interpreted as the probability of randomly picking a nonrelevant usage (denoted byr) of term t. Mathematically, we show that this quantity can be approximated by the inverse document-frequency (IDF). Empirically, we show that this quantity is related to IDF, using four reference TREC ad hoc retrieval data collections.
This paper presents a new perspective of the probability ranking principle (PRP) by defining retrieval effectiveness in terms of our novel expected rank measure of a set of documents for a particular query. This perspective is based on preserving decision preferences, and it imposes weaker conditions on PRP than the utility-theoretic perspective of PRP.
We propose a novel probabilistic retrieval model which weights terms according to their contexts in documents. The term weighting function of our model is similar to the language model and the binary independence model. The retrospective experiments (i.e., relevance information is present) illustrate the potential of our probabilistic context-based retrieval where the precision at the top 30 documents is about 43% for TREC-6 data and 52% for TREC-7 data.
A new principles framework is presented for retrieval evaluation of ranked outputs. It applies decision theory to model relevance decision preferences and shows that the Probability Ranking Principle (PRP) specifies optimal ranking. It has two new components, namely a probabilistic evaluation model and a general measure of retrieval effectiveness. Its probabilities may be interpreted as subjective or objective ones. Its performance measure is the expected weighted rank which is the weighted average rank of a retrieval list. Starting from this measure, the expected forward rank and some existing retrieval effectiveness measures (e.g., top n precision and discounted cumulative gain) are instantiated using suitable weighting schemes after making certain assumptions. The significance of these instantiations is that the ranking prescribed by PRP is shown to be optimal simultaneously for all these existing performance measures. In addition, the optimal expected weighted rank may be used to normalize the expected weighted rank of retrieval systems for (summary) performance comparison (across different topics) between systems. The framework also extends PRP and our evaluation model to handle graded relevance, thereby generalizing the discussed, existing measures (e.g., top n precision) and probabilistic retrieval models for graded relevance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.