2022
DOI: 10.48550/arxiv.2209.09513
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 0 publications
0
8
0
Order By: Relevance
“…Previous efforts to train models to use explanations (Mishra et al, 2022 ), whether from scratch (Camburu et al, 2018 , Lampinen, Roy, et al, 2022 ), through fine-tuning (Lampinen, Dasgupta, et al, 2022 ), or through conditioning with in-context prompts at evaluation time (Lu et al, 2022 ; Wei et al, 2022 ), have shown improved performance over models without explicit explanations. However, much of the existing literature remains largely empirical with limited theoretic accounts for the phenomenon (Xie et al, 2021 ).…”
Section: Discussionmentioning
confidence: 99%
“…Previous efforts to train models to use explanations (Mishra et al, 2022 ), whether from scratch (Camburu et al, 2018 , Lampinen, Roy, et al, 2022 ), through fine-tuning (Lampinen, Dasgupta, et al, 2022 ), or through conditioning with in-context prompts at evaluation time (Lu et al, 2022 ; Wei et al, 2022 ), have shown improved performance over models without explicit explanations. However, much of the existing literature remains largely empirical with limited theoretic accounts for the phenomenon (Xie et al, 2021 ).…”
Section: Discussionmentioning
confidence: 99%
“…One potential application of multimodal information retrieval is multimodal reasoning. Lu et al (2022a) first introduce ScienceQA, a large-scale multimodal science question dataset annotated with lectures and explanations. Based on this benchmark, propose Multimodal Chain-of-Thought (Multimodal-CoT) which incorporates language and vision modalities into a twostage (rationale generation and answer inference) framework, surpassing GPT-3.5 by a large margin with a much smaller fine-tuned model.…”
Section: Retrieval Augmented Multimodal Reasoningmentioning
confidence: 99%
“…Chain of Thought (CoT) reasoning is inherently tied to VCR, as reasoning paths are highly associated with selecting rationales R. The rise in popularity of CoT techniques for linguistic tasks is highly interconnected with the development of LLMs, which have been proven able to reveal intermediate reasoning steps [57]. There are not yet many works in the VL direction, even though the introduction of novel appropriate datasets with grounded answer rationales highlight the prospects of such an approach [101]. Specifically, [101] tackles VCR by captioning the image, and then feed the caption together with the existing linguistic input to the LLM.…”
Section: Visual Commonsense Reasoning (Vcr)mentioning
confidence: 99%
“…There are not yet many works in the VL direction, even though the introduction of novel appropriate datasets with grounded answer rationales highlight the prospects of such an approach [101]. Specifically, [101] tackles VCR by captioning the image, and then feed the caption together with the existing linguistic input to the LLM. Another promising work in this direction introduces Multimodal-CoT without using language as the mediating modality, proposing a two-stage process to separately infer the answer A and the rationale R, while stating that a LM with less than 1B parameters is adequate for state-of-the-art performance [102].…”
Section: Visual Commonsense Reasoning (Vcr)mentioning
confidence: 99%