Proceedings of the 22nd International Conference on Machine Learning - ICML '05 2005
DOI: 10.1145/1102351.1102457
|View full text |Cite
|
Sign up to set email alerts
|

Compact approximations to Bayesian predictive distributions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
57
0
2

Year Published

2008
2008
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(59 citation statements)
references
References 6 publications
0
57
0
2
Order By: Relevance
“…Although k-means is known to have some limitations, in our experience it usually performs reasonably well. An alternative approach would be to minimize the locations of the projected parameters {θ c ⊥ } C c=1 jointly using for example the method of Snelson and Ghahramani (2005), but this is computationally much more expensive.…”
Section: Clusteredmentioning
confidence: 99%
“…Although k-means is known to have some limitations, in our experience it usually performs reasonably well. An alternative approach would be to minimize the locations of the projected parameters {θ c ⊥ } C c=1 jointly using for example the method of Snelson and Ghahramani (2005), but this is computationally much more expensive.…”
Section: Clusteredmentioning
confidence: 99%
“…Actually, this BMA distribution minimizes on average the distance, (measured as the KL divergence, see Ref. [6]) with the true distribution [3]. The BMA distribution is better than any single model prediction [3].…”
Section: Bma Principlementioning
confidence: 99%
“…[6] we choose to seek the PEA approximation that is as close as possible to this optimal BMA classifier with respect to KL divergence. 1 Looking for an approximation of the BMA distribution is indeed a good strategy.…”
Section: Learning Strategy: Approximating Bma For Patch Functionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our work brings together ideas often referred to in the literature as knowledge distillation (Hinton et al, 2015), model compression (Bucilȃ et al, 2006), compact approximations (Snelson and Ghahramani, 2005a), mimicking models (Ba and Caruana, 2014), and teacher-student models (Romero et al, 2015). As a technical term, "knowledge distillation" was introduced only recently by Hinton et al (2015), who used it to mean model compression.…”
Section: Introductionmentioning
confidence: 99%