CORWA: A Citation-Oriented Related Work Annotation Dataset

Li, Xiangci; Mandal, Biswadip; Ouyang, Jessica

doi:10.18653/v1/2022.naacl-main.397

Cited by 1 publication

(9 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, PDF parsing/extraction techniques were applied in 65% (n=15) of studies, the remaining studies applied extraction to other document formats (e.g., journal articles available online in HTML format; see . While similar methods, which additionally take into account syntactic structure, including chunking and dependency parsing were less frequently applied (Angrosh et al, 2014;Li et al, 2022;Nayak et al, 2021;Pertsas & Constantopoulos, 2018). Tagging methods, including PoS tagging (assigning grammatical categories, e.g., noun, verb), followed by concept tagging (e.g., semantic annotation), or sequence tagging, where labels were assigned based on order of appearance, were used in 43% (n=15) of studies.…”

Section: Data Preprocessing and Feature Engineeringmentioning

confidence: 99%

“…Bidirectional Encoder Representations from Transformers (BERT) and other BERT-based language models made up the majority of transformer-based approaches. Specifically BERT (Aumiller et al, 2020;Shen et al, 2022) and SciBERT (Goldfarb-Tarrant et al, 2020;Li et al, 2022) were the most utilized for tasks relevant to extracting data from research in social sciences. Others language models included BioBERT (Chen et al, 2020) and distilBERT (Goldfarb-Tarrant et al, 2020).…”

Section: Model Architectures and Componentsmentioning

confidence: 99%

“…Others language models included BioBERT (Chen et al, 2020) and distilBERT (Goldfarb-Tarrant et al, 2020). We identified a recent application of the Hugging Face LED model (Li et al, 2022), a pretrained longformer model developed to address length limitations associated with other transformer-based approaches (see Beltagy et al, 2020).…”

Section: Model Architectures and Componentsmentioning

confidence: 99%

“…Five studies provided description of user feedback and other ratings. User feedback (among other metrics) was reported by Li et al (2022) who conducted expert human comparative assessment to assess fluency, relevance, coherence, and overall quality of model citation span/sentence generation outputs. This category also included evaluation metrics not listed in the sources we adapted when developing our protocol (see O' Mara-Eaves et al, 2015, p. 3, Table 1;Schmidt et al, 2021, pp.…”

Section: Evaluation Metricsmentioning

confidence: 99%

“…SysRev (Bozada et al, 2021) was also the only tool cataloged in the SR Toolbox (Marshall et al, 2022). Six of the twenty-three studies (26%) made source code openly available (Chen et al, 2021;Denzler et al, 2021;Goldfarb-Tarrant et al, 2020;Iwatsuki et al, 2017;Li et al 2022). Article references and corresponding repositories are detailed in Table 3.…”

Section: Availability Accessibility and Transferabilitymentioning

confidence: 99%

See 4 more Smart Citations

(Semi)automated approaches to data extraction for systematic reviews and meta-analyses in social sciences: A living review

Legate,

Nimon,

Noblin

2024

F1000Res

View full text Add to dashboard Cite

Background An abundance of rapidly accumulating scientific evidence presents novel opportunities for researchers and practitioners alike, yet such advantages are often overshadowed by resource demands associated with finding and aggregating a continually expanding body of scientific information. Data extraction activities associated with evidence synthesis have been described as time-consuming to the point of critically limiting the usefulness of research. Across social science disciplines, the use of automation technologies for timely and accurate knowledge synthesis can enhance research translation value, better inform key policy development, and expand the current understanding of human interactions, organizations, and systems. Ongoing developments surrounding automation are highly concentrated in research for evidence-based medicine with limited evidence surrounding tools and techniques applied outside of the clinical research community. The goal of the present study is to extend the automation knowledge base by synthesizing current trends in the application of extraction technologies of key data elements of interest for social scientists. Methods We report the baseline results of a living systematic review of automated data extraction techniques supporting systematic reviews and meta-analyses in the social sciences. This review follows PRISMA standards for reporting systematic reviews. Results The baseline review of social science research yielded 23 relevant studies. Conclusions When considering the process of automating systematic review and meta-analysis information extraction, social science research falls short as compared to clinical research that focuses on automatic processing of information related to the PICO framework. With a few exceptions, most tools were either in the infancy stage and not accessible to applied researchers, were domain specific, or required substantial manual coding of articles before automation could occur. Additionally, few solutions considered extraction of data from tables which is where key data elements reside that social and behavioral scientists analyze.

show abstract

Section: Data Preprocessing and Feature Engineeringmentioning

confidence: 99%