Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.78
|View full text |Cite
|
Sign up to set email alerts
|

COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences

Abstract: Commonsense reasoning is intuitive for humans but has been a long-term challenge for artificial intelligence (AI). Recent advancements in pretrained language models have shown promising results on several commonsense benchmark datasets. However, the reliability and comprehensiveness of these benchmarks towards assessing model's commonsense reasoning ability remains unclear. To this end, we introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sampl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 41 publications
0
17
0
Order By: Relevance
“…However, these methods risk ignoring target culture complexity or forcing the source culture concepts on the target culture. For example, English common sense datasets (Singh et al, 2021) include culture-specific concepts such as food ingredients, 3 rituals and celebrations, 4 and societal expectations. 5 Translating this dataset into other languages will require making decisions about how, and whether, to modify these items to make them more intelligible in the target culture.…”
Section: Data Collectionmentioning
confidence: 99%
“…However, these methods risk ignoring target culture complexity or forcing the source culture concepts on the target culture. For example, English common sense datasets (Singh et al, 2021) include culture-specific concepts such as food ingredients, 3 rituals and celebrations, 4 and societal expectations. 5 Translating this dataset into other languages will require making decisions about how, and whether, to modify these items to make them more intelligible in the target culture.…”
Section: Data Collectionmentioning
confidence: 99%
“…We use a combination of 17 datasets for our largestscale training data retrieval. The datasets include αNLI , SWAG (Zellers et al, 2018), RACE (Lai et al, 2017) (we only use the middle-school subset), CODAH ), RiddleSense (Lin et al, 2021, SciTail , Com2Sense (Singh et al, 2021), AI2 Science Questions (Clark et al, 2019), Wino-Grade , CommonsenseQA (Talmor et al, 2019), CommonsenseQA2.0 (Talmor et al, 2021, ASQ (Fu et al, 2019), OBQA (Mihaylov et al, 2018), PhysicalIQA (Bisk et al, 2020), SocialIQA (Sap et al, 2019b), CosmosQA (Huang et al, 2019) and HellaSWAG (Zellers et al, 2019). We present details of the datasets that we use for training data retrieval in Table 6.…”
Section: A Datasetsmentioning
confidence: 99%
“…That is, progress will be made by focusing on the reasoning and inferential part of common sense statement understanding and generation. Very recently, other scholars released benchmarks and data sets in this pursuit: WinoWhy (Zhang et al, 2020), COM2SENSE (Singh et al, 2021), UNICORN (Lourie et al, 2021) and CommonSenseQA 2.0 (Talmor et al, 2021). We hope that with this wealth of benchmarks and data sets substantial progress can be unlocked.…”
Section: Benchmarksmentioning
confidence: 99%