2021
DOI: 10.48550/arxiv.2110.12667
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mixture-of-Variational-Experts for Continual Learning

Abstract: One significant shortcoming of machine learning is the poor ability of models to solve new problems quicker and without forgetting acquired knowledge. To better understand this issue, continual learning has emerged to systematically investigate learning protocols where the model sequentially observes samples generated by a series of tasks. First, we propose an optimality principle that facilitates a trade-off between learning and forgetting. We derive this principle from an information-theoretic formulation of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 22 publications
0
1
0
Order By: Relevance
“…The ViT architecture, comprised of FF and self-attention layers, was developed to overcome the drawbacks of these approaches. The ViT architecture forms the basis of a visual transformer originally developed for NLP tasks [34]. The transformer's capability to interpret coincident data sequences simultaneously is one of its prime -advantages, coveted for CV applications [35].…”
Section: Proposed Methodology a System Model And Architecturementioning
confidence: 99%
“…The ViT architecture, comprised of FF and self-attention layers, was developed to overcome the drawbacks of these approaches. The ViT architecture forms the basis of a visual transformer originally developed for NLP tasks [34]. The transformer's capability to interpret coincident data sequences simultaneously is one of its prime -advantages, coveted for CV applications [35].…”
Section: Proposed Methodology a System Model And Architecturementioning
confidence: 99%