2022
DOI: 10.48550/arxiv.2212.09662
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(19 citation statements)
references
References 0 publications
0
19
0
Order By: Relevance
“…As depicted in Figure 3, our approach diverged from LLaVA's methodology of using a fixed set of questions to prompt GPT-4 [9]. We observed a greater diversity in our question set, owing to their generation process conditioned on text context.…”
Section: Dataset Constructionmentioning
confidence: 89%
See 4 more Smart Citations
“…As depicted in Figure 3, our approach diverged from LLaVA's methodology of using a fixed set of questions to prompt GPT-4 [9]. We observed a greater diversity in our question set, owing to their generation process conditioned on text context.…”
Section: Dataset Constructionmentioning
confidence: 89%
“…They are based on the Pix2Struct model, which was pre-trained on website visual understanding (from screenshot to HTML code) [8]. Matcha fine-tuned Pix2Struct on various datasets, such as Github IPython notebooks for [chart > code], a mix of PlotQA, web-crawled data, and Wikipedia tables for [chart > table], and math reasoning datasets for [image > answer] [9]. DePlot, in turn, was further fine-tuned on [chart > table] datasets, including ChartQA, to specialize in converting charts into linearized tables with titles, legends, and interpolated data point values [10].…”
Section: Chart Vqa Expert Systemsmentioning
confidence: 99%
See 3 more Smart Citations