Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.511
|View full text |Cite
|
Sign up to set email alerts
|

SciNLI: A Corpus for Natural Language Inference on Scientific Text

Abstract: Existing Natural Language Inference (NLI) datasets, while being instrumental in the advancement of Natural Language Understanding (NLU) research, are not related to scientific text. In this paper, we introduce SCINLI, a large dataset for NLI that captures the formality in scientific text and contains 107, 412 sentence pairs extracted from scholarly papers on NLP and computational linguistics. Given that the text used in scientific literature differs vastly from the text used in everyday language both in terms … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 31 publications
0
5
0
Order By: Relevance
“…We name the experiment settings as HEALTHVER pred and HEALTHVER truth respectively. We also experiment on another dataset SciNLI (Sadat and Caragea, 2022) without any annotation. The structures in SciNLI are derived from the parsing model trained on HEALTHVER.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We name the experiment settings as HEALTHVER pred and HEALTHVER truth respectively. We also experiment on another dataset SciNLI (Sadat and Caragea, 2022) without any annotation. The structures in SciNLI are derived from the parsing model trained on HEALTHVER.…”
Section: Methodsmentioning
confidence: 99%
“…We proceed to train a joint entity and relation extraction model for extracting the sentence structures. Additionally, we utilize the extraction model to conduct experiments on another dataset SciNLI (Sadat and Caragea, 2022).…”
Section: Graph Representation and Reasoningmentioning
confidence: 99%
See 1 more Smart Citation
“…To bolster the data set’s comprehensiveness, existing NLI data sets were incorporated. Specifically, the Stanford Natural Language Inference (SNLI) data set 27 and the SciNLI data set 28 were integrated. These data sets contributed a diverse range of general NLI instances, enriching the model’s ability to handle a wider spectrum of language structures and inferences.…”
Section: Data Set Preparation For Training and Validationmentioning
confidence: 99%
“…The task is essential in many NLP applications, e.g. in discourse relation recognition (Chan et al, 2023), scientific document classification (Sadat and Caragea, 2022), or e-commerce product categorization (Shen et al, 2021). In practice, documents might be tagged with multiple categories that can be organized in a concept hierarchy, such as a taxonomy of a knowledge graph (Pan et al, 2017b,a), cf.…”
Section: Introductionmentioning
confidence: 99%