2023
DOI: 10.1609/aaai.v37i11.26634
|View full text |Cite
|
Sign up to set email alerts
|

MPMQA: Multimodal Question Answering on Product Manuals

Abstract: Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents and only retain textual parts. In this work, to emphasize the importance of multimodal contents, we propose a Multimodal Product Manual Question Answering (MPMQA) task. For each question, MPMQA requires the model not only to process multimodal contents but also to provide multimodal answers. To support MPMQA, a large-scale… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…Intuitively, more uniformly distributed prediction (i.e., noisy correspondence) leads to higher estimated energy uncertainty (Zhang et al 2023). Therefore we select the clean samples by applying a threshold τ and the maximum similarity constraint (Qin et al 2022), i.e.,…”
Section: Energy-guided Sample Filtrationmentioning
confidence: 99%
“…Intuitively, more uniformly distributed prediction (i.e., noisy correspondence) leads to higher estimated energy uncertainty (Zhang et al 2023). Therefore we select the clean samples by applying a threshold τ and the maximum similarity constraint (Qin et al 2022), i.e.,…”
Section: Energy-guided Sample Filtrationmentioning
confidence: 99%
“…Text-only Paper Understanding [1,3,4,20,26,27] 21,30], tables [9,25], documents [22,28,29,43] and infographic images [23], etc.…”
Section: Related Workmentioning
confidence: 99%