2022
DOI: 10.1155/2022/4378553
|View full text |Cite
|
Sign up to set email alerts
|

Multiple Context Learning Networks for Visual Question Answering

Abstract: A novel Multiple Context Learning Network (MCLN) is proposed to model multiple contexts for visual question answering (VQA), aiming to learn comprehensive contexts. Three kinds of contexts are discussed and the corresponding three context learning modules are proposed based on a uniform context learning strategy. Specifically, the proposed context learning modules are visual context learning module (VCL), textual context learning module (TCL), and visual-textual context learning module (VTCL). The VCL and TCL,… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 24 publications
0
1
0
Order By: Relevance
“…Seeing is Knowing (106) , MULAN (107) Faster R-CNN with ResNet-101 GAT (108) , ATH (109) , DMMGR (24) , MCLN (110) , MCAN (111) , F-SWAP (112) , SRRN (35) , TVQA (113) Faster R-CNN with Resnet-152 RA-MAP (114) , MASN (115) , Anamoly based (114) , Vocab based (116) , DA-Net (117) ResNet CNN within Faster R-CNN MuVAM (118) FasterR-CNN with ResNext-152 CBM (119) RCNN (120) Multi-image (89) VGGNet (121) VQA-AID (122) EfficientNetV2 (123) RealFormer (124) YOLO (125) Scene Text VQA (126) CLIPViT-B CCVQA (14) Resnet NFNet (127) Flamingo (128) ViT (129) VLMmed (46) , ConvS2S+ViT (130) , BMT (10) , M2I2 (52) XCLIP with ViT-L/14 CMQR (32) RsNet18, Swin, ViT LV-GPT (43) GLIP (131) REVIVE (132) CLIP (133) KVQAE (30) 2.6.4 VGGNet (121) VGGNet (Visual Geometry Group Network) is a CNN with a small number of layers, achieving good performance in image classification tasks. It is basically known for its simplicity and generalizability to new datasets.…”
Section: Faster Rcnnmentioning
confidence: 99%
“…Seeing is Knowing (106) , MULAN (107) Faster R-CNN with ResNet-101 GAT (108) , ATH (109) , DMMGR (24) , MCLN (110) , MCAN (111) , F-SWAP (112) , SRRN (35) , TVQA (113) Faster R-CNN with Resnet-152 RA-MAP (114) , MASN (115) , Anamoly based (114) , Vocab based (116) , DA-Net (117) ResNet CNN within Faster R-CNN MuVAM (118) FasterR-CNN with ResNext-152 CBM (119) RCNN (120) Multi-image (89) VGGNet (121) VQA-AID (122) EfficientNetV2 (123) RealFormer (124) YOLO (125) Scene Text VQA (126) CLIPViT-B CCVQA (14) Resnet NFNet (127) Flamingo (128) ViT (129) VLMmed (46) , ConvS2S+ViT (130) , BMT (10) , M2I2 (52) XCLIP with ViT-L/14 CMQR (32) RsNet18, Swin, ViT LV-GPT (43) GLIP (131) REVIVE (132) CLIP (133) KVQAE (30) 2.6.4 VGGNet (121) VGGNet (Visual Geometry Group Network) is a CNN with a small number of layers, achieving good performance in image classification tasks. It is basically known for its simplicity and generalizability to new datasets.…”
Section: Faster Rcnnmentioning
confidence: 99%