2017
DOI: 10.48550/arxiv.1710.04759
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bayesian Hypernetworks

Abstract: We study Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork h is a neural network which learns to transform a simple noise distribution, p( ) = N (0, I), to a distribution q(θ) := q(h( )) over the parameters θ of another neural network (the "primary network"). We train q with variational inference, using an invertible h to enable efficient estimation of the variational lower bound on the posterior p(θ|D) via sampling. In contrast to most methods f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
55
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 42 publications
(56 citation statements)
references
References 5 publications
1
55
0
Order By: Relevance
“…Hyper-network A naive hyper-network implementation might be over-parameterized, as it requires a quadratic number of parameters with respect to the size of the target network. Thus, we apply a similar trick to Krueger et al (2017) to make g tractably predict edits for modern large deep neural networks (e.g., BERT), namely, g makes use of the gradient information ∇ W L(a; x, θ) as it carries rich information about how f accesses the knowledge stored in θ (i.e., which parameters to update to increase the model likelihood given a).…”
Section: A Supplementary Materials A1 Relaxation and Approximation Of...mentioning
confidence: 99%
“…Hyper-network A naive hyper-network implementation might be over-parameterized, as it requires a quadratic number of parameters with respect to the size of the target network. Thus, we apply a similar trick to Krueger et al (2017) to make g tractably predict edits for modern large deep neural networks (e.g., BERT), namely, g makes use of the gradient information ∇ W L(a; x, θ) as it carries rich information about how f accesses the knowledge stored in θ (i.e., which parameters to update to increase the model likelihood given a).…”
Section: A Supplementary Materials A1 Relaxation and Approximation Of...mentioning
confidence: 99%
“…Hypernetwork. The goal of hypernetworks is to generate the weights of a target network, which is responsible for the main task [Bertinetto et al, 2016, Chung et al, 2016, Ha et al, 2016, Krueger et al, 2017, Lorraine & Duvenaud, 2018 Figure 1: Object space as discriminative weights. Objects live in a low-dimensional manifold of a high-dimensional latent space.…”
Section: Related Workmentioning
confidence: 99%
“…Nirkin et al, 2021, Sitzmann et al, 2020. For example, [Krueger et al, 2017] proposes Bayesian hypernetworks to learn the variational inference in neural networks and [Bertinetto et al, 2016] proposes to learn the network parameters in one shot. HyperSeg [Nirkin et al, 2021] presents real-time semantic segmentation by employing a U-Net within a U-Net architecture, and [Finn et al, 2019] applies hypernetwork to adapt to new tasks for continual lifelong learning.…”
Section: Related Workmentioning
confidence: 99%
“…Hierarchical modeling in the Bayesian framework has been successful to design the form of the prior (Daumé III, 2009;Zhao et al, 2017;Klushyn et al, 2019;Wang & Van Hoof, 2020) and posterior distributions (Ranganath et al, 2016;Krueger et al, 2017;Zhen et al, 2020) based on many observations. It allows the latent variable to follow a complicated distribution and forms a highly flexible approximation (Krueger et al, 2017).…”
Section: Related Workmentioning
confidence: 99%