2022
DOI: 10.1016/j.knosys.2021.108075
|View full text |Cite
|
Sign up to set email alerts
|

Mixhead: Breaking the low-rank bottleneck in multi-head attention language models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…The computational process is almost similar to self‐attention [31]. We introduce the Softmaxes [32] to replace the Softmax [28], which can predict the relation weight for any two positions in the feature maps and acquire more accurate probability values. Since the features are pooled and downsampled, the smaller scale bring a few amount of computation.…”
Section: Methodsmentioning
confidence: 99%
“…The computational process is almost similar to self‐attention [31]. We introduce the Softmaxes [32] to replace the Softmax [28], which can predict the relation weight for any two positions in the feature maps and acquire more accurate probability values. Since the features are pooled and downsampled, the smaller scale bring a few amount of computation.…”
Section: Methodsmentioning
confidence: 99%
“…After the edge encoder and path encoder, we adopt two linear projection layers for the output of edge encoder and longest path encoder to transform the edge and path feature into high-dimension space. We also set a heading talking layer [32] to overcome low bottleneck [33].…”
Section: Graphnovo Architecturementioning
confidence: 99%
“…We remark, however, that the embedding space d will always be much smaller than the number of images or texts, which means that we are actually imposing a low-rank constraint on the joint probability distribution. In NLP, this effect has been referred to as the "softmax bottleneck" [43]. We now consider a set of factors Z = Z 1 × .…”
Section: Linear Compositionality In Vlmsmentioning
confidence: 99%