Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.157
|View full text |Cite
|
Sign up to set email alerts
|

CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding

Abstract: Scientific document understanding is challenging as the data is highly domain specific and diverse. However, datasets for tasks with scientific text require expensive manual annotation and tend to be small and limited to only one or a few fields. At the same time, scientific documents contain many potential training signals, such as citations, which can be used to build large labelled datasets. Given this, we present an in-depth study of cite-worthiness detection in English, where a sentence is labelled for wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…A number of more recent papers also leverage or study individual citation context, including Cohan et al [29]; Jebari et al [30]; Wright & Augenstein [31]; Lauscher et al [32]. We note that annotating individual citations is more costly than annotating the abstract of a paper, as we do.…”
Section: Related Workmentioning
confidence: 82%
“…A number of more recent papers also leverage or study individual citation context, including Cohan et al [29]; Jebari et al [30]; Wright & Augenstein [31]; Lauscher et al [32]. We note that annotating individual citations is more costly than annotating the abstract of a paper, as we do.…”
Section: Related Workmentioning
confidence: 82%
“…The authors argue that these trends imply that NLP has become a rapid discovery science (Collins, 1994), i.e., a particular shift a scientific field can undergo when it reaches a high level of consensus on its research topics, methods, and technologies, and then starts to continually improve on each other's methods. A number of recent papers also leverages or studies citation context, including ; Jebari et al (2021); Wright and Augenstein (2021); Lauscher et al (2021). Our approach differs from Jurgens et al (2018) in several ways: for example, we do not analyze individual citations, but directly evaluate the stance of a complete paper (as measured by its framing in the paper's abstract); most importantly, we are particularly interested in negative stances, which as relation is absent in the classification scheme of Jurgens et al (2018).…”
Section: Related Workmentioning
confidence: 99%
“…We exploit citation relationships to generate claims paired with potential evidence, using citances from the CiteWorth dataset (Wright and Augenstein, 2021) as source citances for generation. Supports claims are produced by directly pairing a generated claim with the abstracts of documents cited by the source citance.…”
Section: Rq1: Fact Checking Performancementioning
confidence: 99%
“…We develop an initial set of guidelines for the annotators and conduct two rounds of pilot annotations to improve instructions and increase agreement. For the final evaluation, we generate claims on a set of 100 citances sampled from the CiteWorth dataset (Wright and Augenstein, 2021), which contains citations in context for over 1M citances spanning 10 domains. We limit the citances to those from papers in biology and medicine to match the domain of Sci-Fact.…”
Section: Rq2: Claim Quality Evaluationmentioning
confidence: 99%