2021
DOI: 10.48550/arxiv.2109.04947
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding

Abstract: Large-scale, pre-trained language models (LMs) have achieved human-level performance on a breadth of language understanding tasks. However, evaluations only based on end task performance shed little light on machines' true ability in language understanding and reasoning. In this paper, we highlight the importance of evaluating the underlying reasoning process in addition to end performance. Toward this goal, we introduce Tiered Reasoning for Intuitive Physics (TRIP), a novel commonsense reasoning dataset with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 52 publications
0
1
0
Order By: Relevance
“…113,000 questions Crowd sourcing TG-CSR [116] Question answering 331 questions Expert construction TimeDial [109] Cloze tasks for 1,100 dialogues Crowd sourcing temporal reasoning Torque [103] Order of events 3,200 news stories Crowd sourcing in a news story 21,000 questions Tracie [161] Order of implicit 5500 problems Crowd sourcing and explicit event Triangle COPA [51] Why did that thing 100 examples Expert construction do that? TRIP [129] Which story is more 2100 stories Crowd sourcing plausible? WNLI [136] Entailment 849 sentences Expert construction.…”
Section: Taskmentioning
confidence: 99%
“…113,000 questions Crowd sourcing TG-CSR [116] Question answering 331 questions Expert construction TimeDial [109] Cloze tasks for 1,100 dialogues Crowd sourcing temporal reasoning Torque [103] Order of events 3,200 news stories Crowd sourcing in a news story 21,000 questions Tracie [161] Order of implicit 5500 problems Crowd sourcing and explicit event Triangle COPA [51] Why did that thing 100 examples Expert construction do that? TRIP [129] Which story is more 2100 stories Crowd sourcing plausible? WNLI [136] Entailment 849 sentences Expert construction.…”
Section: Taskmentioning
confidence: 99%