Searching an organization's document repositories for experts provides a cost effective solution for the task of expert finding. We present two general strategies to expert searching given a document collection which are formalized using generative probabilistic models. The first of these directly models an expert's knowledge based on the documents that they are associated with, whilst the second locates documents on topic, and then finds the associated expert. Forming reliable associations is crucial to the performance of expert finding systems. Consequently, in our evaluation we compare the different approaches, exploring a variety of associations along with other operational parameters (such as topicality). Using the TREC Enterprise corpora, we show that the second strategy consistently outperforms the first. A comparison against other unsupervised techniques, reveals that our second model delivers excellent performance
a b s t r a c tStatistical language models have been successfully applied to many information retrieval tasks, including expert finding: the process of identifying experts given a particular topic. In this paper, we introduce and detail language modeling approaches that integrate the representation, association and search of experts using various textual data sources into a generative probabilistic framework. This provides a simple, intuitive, and extensible theoretical framework to underpin research into expertise search. To demonstrate the flexibility of the framework, two search strategies to find experts are modeled that incorporate different types of evidence extracted from the data, before being extended to also incorporate co-occurrence information. The models proposed are evaluated in the context of enterprise search systems within an intranet environment, where it is reasonable to assume that the list of experts is known, and that data to be mined is publicly accessible. Our experiments show that excellent performance can be achieved by using these models in such environments, and that this theoretical and empirical work paves the way for future principled extensions.
Searching is inherently an interactive process usually requiring numerous iterations of querying and assessing in order to find the desired amount of relevant information. Essentially, the search process can be viewed as a combination of inputs (queries and assessments) which are used to "produce" output (relevance). Under this view, it is possible to adapt microeconomic theory to analyze and understand the dynamics of Interactive Information Retrieval. In this paper, we treat the search process as an economics problem and conduct extensive simulations on TREC test collections analyzing various combinations of inputs in the "production" of relevance. The analysis reveals that the total Cumulative Gain obtained during the course of a search session is functionally related to querying and assessing. Furthermore, this relationship can be characterized mathematically by the Cobbs-Douglas production function. Subsequent analysis using cost models, that are grounded using cognitive load as the cost, reveals which search strategies minimize the cost of interaction for a given level of output. This paper demonstrates how economics can be applied to formally model the search process. This development establishes the theoretical foundations of Interactive Information Retrieval, providing numerous directions for empirical experimentation that are motivated directly from theory.
Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings.
We investigate how the cost associated with querying in the context of information retrieval affects how users interact with a search system. Microeconomic theory is used to generate the cost-interaction hypothesis that states as the cost of querying increases, users will pose fewer queries and examine more documents per query. A between-subjects laboratory study with 36 undergraduate subjects was conducted, where subjects were randomly assigned to use one of three search interfaces that varied according to the amount of physical cost required to query: Structured (high cost), Standard (medium cost) and Query Suggestion (low cost). Results show that subjects who used the Structured interface submitted significantly fewer queries, spent more time on search results pages, examined significantly more documents per query, and went to greater depths in the search results list. Results also showed that these subjects spent longer generating their initial queries, saved more relevant documents and rated their queries as more successful. These findings have implications for the usefulness of microeconomic theory as a way to model and explain search interaction, as well as for the design of query facilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.