2018 IEEE International Conference on Multimedia and Expo (ICME) 2018
DOI: 10.1109/icme.2018.8486468
|View full text |Cite
|
Sign up to set email alerts
|

Essay-Anchor Attentive Multi-Modal Bilinear Pooling for Textbook Question Answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 9 publications
0
2
0
1
Order By: Relevance
“…In contrast, F-GCN [12] applies graph convolutional networks [16] on textual contexts and diagrams to build unified graphs that memorize relevant question background information and predicts answers by reasoning over the graphs. EAMB [17] applies the essay-anchor attentive multi-modal bi-linear pooling method to learn the joint representations of text and diagrams. It first builds textual graphs based on textual contexts and then applies bilinear-based MFB [18] model to fuse graph and diagram representations.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast, F-GCN [12] applies graph convolutional networks [16] on textual contexts and diagrams to build unified graphs that memorize relevant question background information and predicts answers by reasoning over the graphs. EAMB [17] applies the essay-anchor attentive multi-modal bi-linear pooling method to learn the joint representations of text and diagrams. It first builds textual graphs based on textual contexts and then applies bilinear-based MFB [18] model to fuse graph and diagram representations.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, different from using the simple label embeddings e 𝑜 and e 𝑟 in SG F , e 𝑜 is the key to connect domains of vision and language. Thus, we introduce the Multi-modal Factorized Bilinear Pooling (MFB) [45] to fuse the region features and label embeddings to augment object representation e 𝑜 , which is known to be effective in multi-modal tasks [13,16]. Node embedding : To encode the SG nodes at a unified representation u = {u 𝑜 , u 𝑎 , u 𝑟 } ∈ R 𝑢 , we introduce the Graph Convolutional Network (GCN ) [18], which can embed the graph structure into vector representations.…”
Section: Scene Graph Encodermentioning
confidence: 99%
“…同时, 也发展了性能良好的概率编程库, 支持贝叶斯深度学习模型的开发和部 署. 例如, 我们团队研制的 "珠算" [82] Figure 9 (Color online) The architecture of image-text question answer system 为例予以说明 [85] . 如图 9 所示, 根据给定的图片, 回答以下问题, "在大陆地壳下面有多少层 (类型)?…”
Section: 深度学习方法的改进unclassified