2022
DOI: 10.48550/arxiv.2207.09094
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MoEC: Mixture of Expert Clusters

Abstract: Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated. However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation. Such problems are especially severe on tasks with limited data, thus hindering the progress for MoE models to improve perfo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 20 publications
(23 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?