Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval 2011
DOI: 10.1145/2009916.2009983
|View full text |Cite
|
Sign up to set email alerts
|

Hypergeometric language models for republished article finding

Abstract: Republished article finding is the task of identifying instances of articles that have been published in one source and republished more or less verbatim in another source, which is often a social media source. We address this task as an ad hoc retrieval problem, using the source article as a query. Our approach is based on language modeling. We revisit the assumptions underlying the unigram language model taking into account the fact that in our setup queries are as long as complete news articles. We argue th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 12 publications
(15 citation statements)
references
References 40 publications
0
15
0
Order By: Relevance
“…Our task concerns both. Similar tasks exist, such as summarizing social media in real time [182] and finding replications of news articles while they appear [212]. We believe that our linking and dynamic query modeling (DQM) approaches is applicable to those tasks too.…”
Section: Theme 2-struggling and Success In Web Searchmentioning
confidence: 90%
See 1 more Smart Citation
“…Our task concerns both. Similar tasks exist, such as summarizing social media in real time [182] and finding replications of news articles while they appear [212]. We believe that our linking and dynamic query modeling (DQM) approaches is applicable to those tasks too.…”
Section: Theme 2-struggling and Success In Web Searchmentioning
confidence: 90%
“…The work we present in Chapter 6 combines the ad hoc search and document filtering tasks in searching for background information based on a textual stream. Other examples of such tasks include summarizing social media in real time [182] and finding replications of news articles while they appear [212].…”
Section: Beyond Document Retrievalmentioning
confidence: 99%
“…The query generation can be based on any language model [12,11,2,19,10,9,16] . So far, using a multinomial distribution [11,2,19] for θD has been most popular and most successful, which is also adopted in our paper.…”
Section: Query Likelihood Methodsmentioning
confidence: 99%
“…The body of the news article itself is an important source of information for training language models that represent it [20,24,27], as witnessed from the successful previous work in probabilistic modeling for retrieval. We follow [18,25] and use entire contents of article body, and title for training a unigram language model.…”
Section: Article Modelsmentioning
confidence: 99%