COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences

Singh, Shikhar; Wen, Nuan; Hou, Yu; Alipoormolabashi, Pegah; Wu, Te-Lin; Ma, Xuezhe; Peng, Nanyun

doi:10.18653/v1/2021.findings-acl.78

Cited by 15 publications

(17 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these methods risk ignoring target culture complexity or forcing the source culture concepts on the target culture. For example, English common sense datasets (Singh et al, 2021) include culture-specific concepts such as food ingredients, 3 rituals and celebrations, 4 and societal expectations. 5 Translating this dataset into other languages will require making decisions about how, and whether, to modify these items to make them more intelligible in the target culture.…”

Section: Data Collectionmentioning

confidence: 99%

Challenges and Strategies in Cross-Cultural NLP

Hershcovich¹,

Frank²,

Lent³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogous to cross-lingual and multilingual NLP, cross-cultural and multicultural NLP considers these differences in order to better serve users of NLP systems. We propose a principled framework to frame these efforts, and survey existing and potential strategies.

show abstract

Section: Data Collectionmentioning

confidence: 99%

Challenges and Strategies in Cross-Cultural NLP

Hershcovich¹,

Frank²,

Lent³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…We use a combination of 17 datasets for our largestscale training data retrieval. The datasets include αNLI , SWAG (Zellers et al, 2018), RACE (Lai et al, 2017) (we only use the middle-school subset), CODAH ), RiddleSense (Lin et al, 2021, SciTail , Com2Sense (Singh et al, 2021), AI2 Science Questions (Clark et al, 2019), Wino-Grade , CommonsenseQA (Talmor et al, 2019), CommonsenseQA2.0 (Talmor et al, 2021, ASQ (Fu et al, 2019), OBQA (Mihaylov et al, 2018), PhysicalIQA (Bisk et al, 2020), SocialIQA (Sap et al, 2019b), CosmosQA (Huang et al, 2019) and HellaSWAG (Zellers et al, 2019). We present details of the datasets that we use for training data retrieval in Table 6.…”

Section: A Datasetsmentioning

confidence: 99%

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

Yi‐chong¹,

C²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Most of today's AI systems focus on using self-attention mechanisms and transformer architectures on large amounts of diverse data to achieve impressive performance gains. In this paper, we propose to augment the transformer architecture with an external attention mechanism to bring external knowledge and context to bear. By integrating external information into the prediction process, we hope to reduce the need for ever-larger models and increase the democratization of AI systems. We find that the proposed external attention mechanism can significantly improve the performance of existing AI systems, allowing practitioners to easily customize foundation AI models to many diverse downstream applications.In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities. The proposed system, Knowledgeable External Attention for commonsense Reasoning (KEAR), reaches human parity on the open CommonsenseQA research benchmark with an accuracy of 89.4% in comparison to the human accuracy of 88.9%.

show abstract

“…That is, progress will be made by focusing on the reasoning and inferential part of common sense statement understanding and generation. Very recently, other scholars released benchmarks and data sets in this pursuit: WinoWhy (Zhang et al, 2020), COM2SENSE (Singh et al, 2021), UNICORN (Lourie et al, 2021) and CommonSenseQA 2.0 (Talmor et al, 2021). We hope that with this wealth of benchmarks and data sets substantial progress can be unlocked.…”

Section: Benchmarksmentioning

confidence: 99%

CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models

Frohberg¹,

Binder²

2021

Preprint

View full text Add to dashboard Cite

We introduce the CRASS (counterfactual reasoning assessment) data set and benchmark utilizing questionized counterfactual conditionals as a novel and powerful tool to evaluate large language models. We present the data set design and benchmark as well as the accompanying API that supports scoring against a crowd-validated human baseline. We test six state-of-the-art models against our benchmark. Our results show that it poses a valid challenge for these models and opens up considerable room for their improvement.

show abstract

COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences

Cited by 15 publications

References 41 publications

Challenges and Strategies in Cross-Cultural NLP

Challenges and Strategies in Cross-Cultural NLP

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models

Contact Info

Product

Resources

About