2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01113
|View full text |Cite
|
Sign up to set email alerts
|

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Abstract: LLM chains enable complex tasks by decomposing work into a sequence of sub-tasks. Crowdsourcing workflows similarly decompose complex tasks into smaller tasks for human crowdworkers. Chains address LLM errors analogously to the way crowdsourcing workflows address human error. To characterize opportunities for LLM chaining, we survey 107 papers across the crowdsourcing and chaining literature to construct a design space for chain development. The design space connects an LLM designer's objectives to strategies … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
18
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 63 publications
(24 citation statements)
references
References 153 publications
(650 reference statements)
1
18
0
Order By: Relevance
“…We report the experimental results on the new AGQA 2.0 dataset. Please refer to the original AGQA paper for more details on the Human baseline, the Most Likely baseline, the categories on which we evaluate, and the three evaluated models (HCRN, HME, and PSAC) [2]. Table 1.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…We report the experimental results on the new AGQA 2.0 dataset. Please refer to the original AGQA paper for more details on the Human baseline, the Most Likely baseline, the categories on which we evaluate, and the three evaluated models (HCRN, HME, and PSAC) [2]. Table 1.…”
Section: Resultsmentioning
confidence: 99%
“…As many visual events are a composition of actors interacting with objects over time, computer vision researchers have developed benchmarks to measure models' ability to reason compositionally. Action Genome Question Answering (AGQA) measures compositional reasoning using a Visual Question Answering (VQA) task [2]. AGQA generates questions about videos using natural language templates and ground truth scene graph annotations.…”
Section: Action Genome Question Answeringmentioning
confidence: 99%
See 3 more Smart Citations