Hypergeometric language models for republished article finding

Tsagkias, Manos; Rijke, Maarten de; Weerkamp, Wouter

doi:10.1145/2009916.2009983

Cited by 12 publications

(15 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our task concerns both. Similar tasks exist, such as summarizing social media in real time [182] and finding replications of news articles while they appear [212]. We believe that our linking and dynamic query modeling (DQM) approaches is applicable to those tasks too.…”

Section: Theme 2-struggling and Success In Web Searchmentioning

confidence: 90%

See 1 more Smart Citation

Context & Semantics in News & Web Search

Odijk

2016

SIGIR Forum

View full text Add to dashboard Cite

This thesis presents research towards a core aim of information retrieval (IR): providing users with easy access to information. Three research themes guide the research presented in this thesis, contributing to three aspects of IR research: the domain in which an IR system is used, the users interacting with the system, and the different access scenarios in which these users engage with an IR system. Central to these research themes is the aim to gain insights into the behavior of searchers and develop algorithms to support them in their quest, whether it is a researcher exploring or studying a large collection, a web searcher struggling to find something, or a television viewer searching for related content. The first research theme is motivated by the information seeking tasks of researchers exploring and studying large collections. To enable their search on a larger scale, we propose computational methods to connect collections and to infer the perspective offered in a news story. Motivated by how historians select documents for close reading, we propose novel methods for connecting collections using automatically extracted temporal references. To illustrate how these algorithms can be used to automatically create connections between collections, we introduce a novel search interface to explore and analyze the connected collections. The interface highlights different perspectives and requires little domain knowledge. Based on how communication scientists study framing in news, we propose an automatic thematic content analysis approach. The second research theme is addressed in a mixed-methods study on how web searchers behave when they cannot find what they are looking for. Based on large-scale log analysis, crowd-sourced labeling, and predictive modeling we show behavioral differences given task success and failure. Based on these findings we propose ways in which systems can reduce struggling in search. To support searchers, we propose and evaluate algorithms that accurately predict the nature of future actions and their anticipated impact on search outcomes. Our findings have implications for the design of search systems that help searchers struggle less and succeed more. In the third and final research theme, we consider a pro-active search scenario, specifically in a live television setting. We propose algorithms that leverage contextual information to retrieve diverse related content for a leaned-back TV viewer. While watching television, people increasingly consume additional content related to what they are watching. Two methods to automatically retrieve content based on subtitles are introduced, one using entity linking, and one that uses reinforcement learning to generate effective queries for finding related content. Both methods are highly efficient and are currently used in a live television setting in near real time. Each research chapter in this thesis provides insights and algorithms that help searchers when using IR applications. For varying domains, users, and access scenarios, the research presented in this thesis improves the ease of access to information.

show abstract

Section: Theme 2-struggling and Success In Web Searchmentioning

confidence: 90%

“…The work we present in Chapter 6 combines the ad hoc search and document filtering tasks in searching for background information based on a textual stream. Other examples of such tasks include summarizing social media in real time [182] and finding replications of news articles while they appear [212].…”

Section: Beyond Document Retrievalmentioning

confidence: 99%

Context & Semantics in News & Web Search

Odijk

2016

SIGIR Forum

View full text Add to dashboard Cite

show abstract

“…The query generation can be based on any language model [12,11,2,19,10,9,16] . So far, using a multinomial distribution [11,2,19] for θD has been most popular and most successful, which is also adopted in our paper.…”

Section: Query Likelihood Methodsmentioning

confidence: 99%

Query likelihood with negative query generation

Zhai

2012

Proceedings of the 21st ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

The query likelihood retrieval function has proven to be empirically effective for many retrieval tasks. From theoretical perspective, however, the justification of the standard query likelihood retrieval function requires an unrealistic assumption that ignores the generation of a "negative query" from a document. This suggests that it is a potentially non-optimal retrieval function.In this paper, we attempt to improve the query likelihood function by bringing back the negative query generation. We propose an effective approach to estimate the probabilities of negative query generation based on the principle of maximum entropy, and derive a more complete query likelihood retrieval function that also contains the negative query generation component. The proposed approach not only bridges the theoretical gap in the existing query likelihood retrieval function, but also improves retrieval effectiveness significantly with no additional computational cost.

show abstract

“…The body of the news article itself is an important source of information for training language models that represent it [20,24,27], as witnessed from the successful previous work in probabilistic modeling for retrieval. We follow [18,25] and use entire contents of article body, and title for training a unigram language model.…”

Section: Article Modelsmentioning

confidence: 99%

Language intent models for inferring user browsing behavior

Tsagkias

Blanco

2012

Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

Modeling user browsing behavior is an active research area with tangible real-world applications, e.g., organizations can adapt their online presence to their visitors browsing behavior with positive e↵ects in user engagement, and revenue. We concentrate on online news agents, and present a semisupervised method for predicting news articles that a user will visit after reading an initial article. Our method tackles the problem using language intent models trained on historical data which can cope with unseen articles. We evaluate our method on a large set of articles and in several experimental settings. Our results demonstrate the utility of language intent models for predicting user browsing behavior within online news sites.

show abstract

Hypergeometric language models for republished article finding

Cited by 12 publications

References 40 publications

Context & Semantics in News & Web Search

Context & Semantics in News & Web Search

Query likelihood with negative query generation

Language intent models for inferring user browsing behavior

Contact Info

Product

Resources

About