2022
DOI: 10.1007/s10489-022-04355-w
|View full text |Cite
|
Sign up to set email alerts
|

Local self-attention in transformer for visual question answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
2

Relationship

3
7

Authors

Journals

citations
Cited by 32 publications
(16 citation statements)
references
References 45 publications
0
16
0
Order By: Relevance
“…In this paper, an intrusion detection model (RESNETCCN) is proposed that fuses traffic detection requirements. In our future work, we will introduce more new ideas such as blockchain cryptography [8], [18], [9], [19], [16], alliance chain [36], [7], [20], visual Q&A [5], [28], transformer [21], panoramic image [17], reinforcement learning [3], internet of things [23], [24], shared data [6] in our model.We will continue to explore network intrusion detection methods in more areas such as unsupervised and semi-supervised [2] areas for network anomalous traffic data detection. In addition, we also try to introduce new evaluation metrics and establish systematic evaluation methods of intrusion detection.…”
Section: Discussionmentioning
confidence: 99%
“…In this paper, an intrusion detection model (RESNETCCN) is proposed that fuses traffic detection requirements. In our future work, we will introduce more new ideas such as blockchain cryptography [8], [18], [9], [19], [16], alliance chain [36], [7], [20], visual Q&A [5], [28], transformer [21], panoramic image [17], reinforcement learning [3], internet of things [23], [24], shared data [6] in our model.We will continue to explore network intrusion detection methods in more areas such as unsupervised and semi-supervised [2] areas for network anomalous traffic data detection. In addition, we also try to introduce new evaluation metrics and establish systematic evaluation methods of intrusion detection.…”
Section: Discussionmentioning
confidence: 99%
“…As shown in Table 4 , we compare the MAGM model with the current SOTA model, and the last row of Table 4 is the test result of the MAGM model proposed in this paper. The bilinear attention network BAN [ 20 ] considers the bilinear interaction between multimodal inputs to utilize the question feature and image feature information fully; BAN-Counter [ 20 ] combines BAN with Counter [ 20 ], which is a neural network component, can further improve the accuracy of the model on Number-type problems through the robust counting function; Bottom-up [ 59 ] and Bottom-up+MFH [ 25 ] combine regional visual features with question-guided visual attention; LSAT-R [ 28 ] model considers local self-attention, which can effectively avoid redundant information in global self-attention (“-R” indicates that the LSAT model is trained on the VQA2.0 dataset using the same region image features as the MAGM model and other SOTA models for comparison); Unified VLP [ 46 ] is a bidirectional and seq2seq-based unified visual-language pre-training model that can be fine-tuned for visual-language generation and understanding tasks. The pre-training models ViLBERT [ 43 ] and VisualBERT [ 47 ] use the BERT architecture, where VisualBERT is a single-stream model, and ViLBERT is a two-stream model.…”
Section: Methodsmentioning
confidence: 99%
“…The second type is content-based sparse attention [18,[41][42][43], which dynamically computes attention weights for input data, adaptively allocating attention to prioritize essential information while minimizing processing of irrelevant details, thereby demonstrating enhanced flexibility and adaptability. Furthermore, the local attention mechanism [16,44] is also regarded as a specialized form of sparse attention, predominantly utilizing a window mechanism to achieve localized and sparse focus within the data. In the VQA tasks, researchers often employ sparse attention mechanisms [29,45,46] to select crucial question or image features.…”
Section: Sparse Attentionmentioning
confidence: 99%