Proceedings of the 37th International ACM SIGIR Conference on Research &Amp; Development in Information Retrieval 2014
DOI: 10.1145/2600428.2609485
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating answer passages using summarization measures

Abstract: Passage-based retrieval models have been studied for some time and have been shown to have some benefits for document ranking. Finding passages that are not only topically relevant, but are also answers to the users' questions would have a significant impact in applications such as mobile search. To develop models for answer passage retrieval, we need to have appropriate test collections and evaluation measures. Making annotations at the passage level is, however, expensive and can have poor coverage. In this … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
17
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 29 publications
(18 citation statements)
references
References 11 publications
1
17
0
Order By: Relevance
“…We found that the average term‐level kappa ratio between two different human summaries is 0.33. This agreement is comparable with previous studies, which reported a kappa of approximately 0.35 and 0.39, respectively, for manual summaries of news reports and columns (Hori, Hirao, & Isozaki, ) and 0.38 for manually annotated answer passages (Keikha, Park, & Croft, ).…”
Section: Experiments Results and Analysissupporting
confidence: 90%
“…We found that the average term‐level kappa ratio between two different human summaries is 0.33. This agreement is comparable with previous studies, which reported a kappa of approximately 0.35 and 0.39, respectively, for manual summaries of news reports and columns (Hori, Hirao, & Isozaki, ) and 0.38 for manually annotated answer passages (Keikha, Park, & Croft, ).…”
Section: Experiments Results and Analysissupporting
confidence: 90%
“…There have been previous efforts on developing benchmark data sets for non-factoid question answering or answer passage retrieval [4,7,20]. Perhaps the closest prior research to our work is the WebAP data set created by Keikha et al [7,20]. Compared to WebAP, WikiPassageQA has a two significant differences: (1) the number of questions in WikiPassageQA is significantly larger than that of WebAP (4165 v.s.…”
Section: Existing Related Datasetsmentioning
confidence: 99%
“…Currently, there is only one collection specifically created for retrieving answer passages in documents, WebAP [7], where contiguous sentences of a document are labeled as relevant to a query.…”
Section: Introductionmentioning
confidence: 99%
“…The manual inspection was done on the 7 Like for example "Can someone explain the theory of e = mc 2 ?" 8 We increased the previous assignment limit to 10,000 for annotating the test set. 20% of each worker's submission as well as the QA pairs with no agreement.…”
Section: Relevance Assessmentmentioning
confidence: 99%
“…arxiv ' Despite the widely-known importance of studying answer passage retrieval for non-factoid questions [1,2,8,20], the research progress for this task is limited by the availability of high-quality public data. Some existing collections, e.g., [8,14], consist of few queries, which are not sufficient to train sophisticated machine learning models for the task. Some others, e.g., [1], significantly suffer from incomplete judgments.…”
Section: Introductionmentioning
confidence: 99%