Proceedings of the on Thematic Workshops of ACM Multimedia 2017 2017
DOI: 10.1145/3126686.3126695
|View full text |Cite
|
Sign up to set email alerts
|

Generative Attention Model with Adversarial Self-learning for Visual Question Answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(19 citation statements)
references
References 10 publications
0
19
0
Order By: Relevance
“…They used the LSTM model for the time series portion of the data and a fine-tuned BioBERT Lee et al (2020) for the clinical record. The multimodal segmentation attention module proposed by Su et al (2020) is able to fuse blocks of features in each channel direction and capture correlations Ilievski and Feng (2017) between feature vectors. The fusion module is designed to be compatible with features of various spatial dimensions and sequence lengths for both cnn and rnn.…”
Section: Multi-modal Learningmentioning
confidence: 99%
“…They used the LSTM model for the time series portion of the data and a fine-tuned BioBERT Lee et al (2020) for the clinical record. The multimodal segmentation attention module proposed by Su et al (2020) is able to fuse blocks of features in each channel direction and capture correlations Ilievski and Feng (2017) between feature vectors. The fusion module is designed to be compatible with features of various spatial dimensions and sequence lengths for both cnn and rnn.…”
Section: Multi-modal Learningmentioning
confidence: 99%
“…Much of the earlier literature uses VGG net [1][2][3][4] for extracting image features. With the availability of computing resources, people have started using ResNet [5][6][7] for image feature extraction which is heavier than VGGnet but provides better and richer features than VGGNet. Now, most of the literature uses the F-RCNN bottom-up approach for extracting object-level image features [12][13][14][15][16][17].…”
Section: Related Workmentioning
confidence: 99%
“…The first stage involves extracting image features using various CNN models. Most of the literature extracts the image features using pre-trained VGGNet [1][2][3][4], ResNet [5][6][7], googleNet [8][9][10][11] CNN models. These models extract global features of the image.…”
Section: Introductionmentioning
confidence: 99%
“…In this article, we propose using deep learning features with two different pictures, which match with related questions to infer the best answer. Through adversarial learning, the model will pay attention to significant parts of the target image that are distinguished from the adversarial image [9].…”
Section: Introductionmentioning
confidence: 99%