We consider the problem of evaluating retrieval systems using incomplete judgment information. Buckley and Voorhees recently demonstrated that retrieval systems can be efficiently and effectively evaluated using incomplete judgments via the bpref measure [6]. When relevance judgments are complete, the value of bpref is an approximation to the value of average precision using complete judgments. However, when relevance judgments are incomplete, the value of bpref deviates from this value, though it continues to rank systems in a manner similar to average precision evaluated with a complete judgment set. In this work, we propose three evaluation measures that (1) are approximations to average precision even when the relevance judgments are incomplete and (2) are more robust to incomplete or imperfect relevance judgments than bpref. The proposed estimates of average precision are simple and accurate, and we demonstrate the utility of these estimates using TREC data.
We consider the problem of large scale retrieval evaluation. Recently two methods based on random sampling were proposed as a solution to the extensive effort required to judge tens of thousands of documents. While the first method proposed by Aslam et al.[1] is quite accurate and efficient, it is overly complex, making it difficult to be used by the community, and while the second method proposed by Yilmaz et al., infAP [14], is relatively simple, it is less efficient than the former since it employs uniform random sampling from the set of complete judgments. Further, none of these methods provide confidence intervals on the estimated values.The contribution of this paper is threefold: (1) we derive confidence intervals for infAP, (2) we extend infAP to incorporate nonrandom relevance judgments by employing stratified random sampling, hence combining the efficiency of stratification with the simplicity of random sampling, (3) we describe how this approach can be utilized to estimate nDCG from incomplete judgments. We validate the proposed methods using TREC data and demonstrate that these new methods can be used to incorporate nonrandom samples, as were available in TREC Terabyte track '06.
Personalization of search results offers the potential for significant improvements in Web search. Among the many observable user attributes, approximate user location is particularly simple for search engines to obtain and allows personalization even for a first-time Web search user. However, acting on user location information is difficult, since few Web documents include an address that can be interpreted as constraining the locations where the document is relevant. Furthermore, many Web documentssuch as local news stories, lottery results, and sports team fan pages -may not correspond to physical addresses, but the location of the user still plays an important role in document relevance. In this paper, we show how to infer a more general location relevance which uses not only physical location but a more general notion of locations of interest for Web pages. We compute this information using implicit user behavioral data, characterize the most location-centric pages, and show how location information can be incorporated into Web search ranking. Our results show that a substantial fraction of Web search queries can be significantly improved by incorporating location-based features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.