Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1454
|View full text |Cite
|
Sign up to set email alerts
|

Social IQa: Commonsense Reasoning about Social Interactions

Abstract: We introduce SOCIAL IQA, the first largescale benchmark for commonsense reasoning about social situations. SOCIAL IQA contains 38,000 multiple choice questions for probing emotional and social intelligence in a variety of everyday situations (e.g., Q: "Jordan wanted to tell Tracy a secret, so Jordan leaned towards Tracy. Why did Jordan do this?" A: "Make sure no one else could hear"). Through crowdsourcing, we collect commonsense questions along with correct and incorrect answers about social interactions, usi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
301
2

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 290 publications
(308 citation statements)
references
References 29 publications
5
301
2
Order By: Relevance
“…The model generates the correct stereotypes when there is high lexical overlap with the post (e.g., examples d and e). This is in line with previous research showing that large language models rely on correlational patterns in data (Sap et al, 2019c;Sakaguchi et al, 2020).…”
Section: Classification Shown Insupporting
confidence: 93%
“…The model generates the correct stereotypes when there is high lexical overlap with the post (e.g., examples d and e). This is in line with previous research showing that large language models rely on correlational patterns in data (Sap et al, 2019c;Sakaguchi et al, 2020).…”
Section: Classification Shown Insupporting
confidence: 93%
“…Based on the protocols we introduce, we show that performing at state-of-theart on these datasets does not necessarily imply strong common-sense reasoning capability. We are happy to see a rising interest in the WSC in the community, including very recent work by Ruan et al (2019) and Sap et al (2019), which reinforces the need for proper evaluation protocols. With the release of an increasing number of finegrained inference tasks aimed at these abilities (Roemmele et al, 2011;Morgenstern et al, 2016;Wang et al, 2018;Rashkin et al, 2018;McCann et al, 2018), the issue of experimental validity in CSR will also become even more important.…”
Section: Resultsmentioning
confidence: 80%
“…But despite these impressive performance improvements in a variety of NLP tasks, it remains unclear whether these models are performing complex reasoning, or if they are merely learning complex surface correlation patterns (Davis and Marcus, 2015;Marcus, 2018). This difficulty in measuring the progress in commonsense reasoning using downstream tasks has yielded increased efforts at developing robust benchmarks for directly measuring commonsense capabilities in multiple settings, such as social interactions (Sap et al, 2019b;Rashkin et al, 2018a) and physical situations (Zellers et al, 2019;Talmor et al, 2019).…”
Section: Type Of the Tutorialmentioning
confidence: 99%
“…In response, recent work has focused on using crowdsourcing and automatic filtering to design large-scale benchmarks while maintaining negative examples that are adversarial to machines (Zellers et al, 2018). We will review recent benchmarks that have emerged to assess whether machines have acquired physical (e.g., Talmor et al, 2019;Zellers et al, 2019), social (e.g., Sap et al, 2019b), or temporal commonsense reasoning capabilities (e.g., , as well as benchmarks that combine commonsense abilities with other tasks (e.g., reading comprehension; Ostermann et al, 2018;.…”
Section: Descriptionmentioning
confidence: 99%
See 1 more Smart Citation