2020
DOI: 10.1007/s11063-020-10351-3
|View full text |Cite
|
Sign up to set email alerts
|

Specialization in Hierarchical Learning Systems

Abstract: Joining multiple decision-makers together is a powerful way to obtain more sophisticated decision-making systems, but requires to address the questions of division of labor and specialization. We investigate in how far information constraints in hierarchies of experts not only provide a principled method for regularization but also to enforce specialization. In particular, we devise an information-theoretically motivated on-line learning rule that allows partitioning of the problem space into multiple sub-prob… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 57 publications
0
11
0
Order By: Relevance
“…Secondly, we propose to use information processing constraints on the gating layer and the classifier based on a theory of bounded rationality (Ortega et al, 2015 ). To this end, we follow the work of Hihn and Braun ( 2020b ), where they introduce and motivate such constraints and show their favorable effects on generalization in the meta-learning setting (Hihn and Braun, 2020a ). We show that these types of constraints enable efficient representation learning, which is the pretext task.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Secondly, we propose to use information processing constraints on the gating layer and the classifier based on a theory of bounded rationality (Ortega et al, 2015 ). To this end, we follow the work of Hihn and Braun ( 2020b ), where they introduce and motivate such constraints and show their favorable effects on generalization in the meta-learning setting (Hihn and Braun, 2020a ). We show that these types of constraints enable efficient representation learning, which is the pretext task.…”
Section: Methodsmentioning
confidence: 99%
“…This term is not tractable as the data generating distribution p ( x ) is unknown. We approximate the true marginal by running an exponential running mean with window length τ (Hihn and Braun, 2020b ; Leibfried and Grau-Moya, 2020 ):…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…During each forward pass, the gating policy selects the top-k experts, such that the output of the layer is a weighted sum of k experts. To enable continual learning, we leverage an information-theoretic formulation of specialization in multi-expert systems [17] that emerged from an information-theoretic formulation of bounded rationality [34,10]. Specialized experts focus on a particular sub-region of inputs, enabling them to adapt quickly to new data.…”
Section: Introductionmentioning
confidence: 99%