2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00688
|View full text |Cite
|
Sign up to set email alerts
|

From Recognition to Cognition: Visual Commonsense Reasoning

Abstract: Why is [person4 ] pointing at [person1 ]? a) He is telling [person3 ] that [person1 ] ordered the pancakes. b) He just told a joke. c) He is feeling accusatory towards [person1 ]. d) He is giving [person1 ] directions. a) [person1 ] has the pancakes in front of him. b) [person4 ] is taking everyone's order and asked for clarification. c) [person3 ] is looking at the pancakes and both she and [person2 ] are smiling slightly. d) [person3 ] is delivering food to the table, and she might not know whose order is wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
626
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 653 publications
(626 citation statements)
references
References 90 publications
(187 reference statements)
0
626
0
Order By: Relevance
“…knowledge and leverage more sophisticated reasoning mechanisms (Zhang et al, 2018;Ostermann et al, 2018), showing that the previous stateof-the-art models often struggle to solve these newer tasks reliably. As a result, commonsense has received a lot of attention in other areas as well, such as natural language inference (Zellers et al, 2018b(Zellers et al, , 2019 and visual question answering (Zellers et al, 2018a). Despite the importance of commonsense knowledge, however, previous work on QA methods takes a coarse-grained view of commonsense, without considering the subtle differences across the various knowledge types and resources.…”
Section: Introductionmentioning
confidence: 99%
“…knowledge and leverage more sophisticated reasoning mechanisms (Zhang et al, 2018;Ostermann et al, 2018), showing that the previous stateof-the-art models often struggle to solve these newer tasks reliably. As a result, commonsense has received a lot of attention in other areas as well, such as natural language inference (Zellers et al, 2018b(Zellers et al, , 2019 and visual question answering (Zellers et al, 2018a). Despite the importance of commonsense knowledge, however, previous work on QA methods takes a coarse-grained view of commonsense, without considering the subtle differences across the various knowledge types and resources.…”
Section: Introductionmentioning
confidence: 99%
“…Commonsense knowledge and reasoning. There is a recent surge of novel large-scale datasets for testing machine commonsense with various focuses, such as situation prediction (SWAG) (Zellers et al, 2018), social behavior understanding (Sap et al, 2019a,b), visual scene comprehension (Zellers et al, 2019), and general commonsense reasoning (Talmor et al, 2019), which encourages the study of supervised learning methods for commonsense reasoning. Trinh and Le (2018) find that large language models show promising results in WSC resolution task (Levesque, 2011), but this approach can hardly be applied in a more general question answering setting and also not provide explicit knowledge used in inference.…”
Section: Related Workmentioning
confidence: 99%
“…Empowering machines with the ability to perform commonsense reasoning has been seen as the bottleneck of artificial general intelligence (Davis and Marcus, 2015). Recently, there have been a few emerging large-scale datasets for testing machine commonsense with various focuses (Zellers et al, 2018;Sap et al, 2019b;Zellers et al, 2019). In a typical dataset, CommonsenseQA (Talmor et al, 2019), given a question like "Where do adults use glue sticks?…”
Section: Introductionmentioning
confidence: 99%
“…Commonsense Reasoning (VCR, visualcommonsense.com, Zellers et al 2019) is a corpus that contains a sample of stills from movies. Questions and answers revolve around conclusions or assumptions that require knowledge external to the images.…”
Section: Visualmentioning
confidence: 99%
“…Effect of Textual Model Size. The original VCR work by Zellers et al (2019) made use of BERT-base, while we use BERT-large to initialize our models. To test how much of our improvements are simply due to our model being larger, we retrained B2T2 models using BERT-base and found that we lose 2.9% accuracy.…”
Section: Ablationsmentioning
confidence: 99%