Bayesian Hypernetworks

Krueger, David A.; Huang, Chin-Wei; Islam, Riashat; Turner, Ryan C.; Lacoste, Alexandre; Courville, Aaron

doi:10.48550/arxiv.1710.04759

Cited by 42 publications

(56 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hyper-network A naive hyper-network implementation might be over-parameterized, as it requires a quadratic number of parameters with respect to the size of the target network. Thus, we apply a similar trick to Krueger et al (2017) to make g tractably predict edits for modern large deep neural networks (e.g., BERT), namely, g makes use of the gradient information ∇ W L(a; x, θ) as it carries rich information about how f accesses the knowledge stored in θ (i.e., which parameters to update to increase the model likelihood given a).…”

Section: A Supplementary Materials A1 Relaxation and Approximation Of...mentioning

confidence: 99%

Editing Factual Knowledge in Language Models

Cao¹,

Aziz²,

Titov³

2021

Preprint

View full text Add to dashboard Cite

The factual knowledge acquired during pretraining and stored in the parameters of Language Models (LM) can be useful in downstream tasks (e.g., question answering or textual inference). However, some facts can be incorrectly induced or become obsolete over time. We present KNOWLEDGEEDITOR, a method which can be used to edit this knowledge and, thus, fix 'bugs' or unexpected predictions without the need for expensive retraining or fine-tuning. Besides being computationally efficient, KNOWLEDGEEDITOR does not require any modifications in LM pretraining (e.g., the use of meta-learning). In our approach, we train a hyper-network with constrained optimization to modify a fact without affecting the rest of the knowledge; the trained hyper-network is then used to predict the weight update at test time. We show KNOWL-EDGEEDITOR's efficacy with two popular architectures and knowledge-intensive tasks: i) a BERT model fine-tuned for fact-checking, and ii) a sequence-to-sequence BART model for question answering. With our method, changing a prediction on the specific wording of a query tends to result in a consistent change in predictions also for its paraphrases. We show that this can be further encouraged by exploiting (e.g., automatically-generated) paraphrases during training. Interestingly, our hyper-network can be regarded as a 'probe' revealing which components need to be changed to manipulate factual knowledge; our analysis shows that the updates tend to be concentrated on a small subset of components. 1

show abstract

Section: A Supplementary Materials A1 Relaxation and Approximation Of...mentioning

confidence: 99%

Editing Factual Knowledge in Language Models

Cao¹,

Aziz²,

Titov³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Hypernetwork. The goal of hypernetworks is to generate the weights of a target network, which is responsible for the main task [Bertinetto et al, 2016, Chung et al, 2016, Ha et al, 2016, Krueger et al, 2017, Lorraine & Duvenaud, 2018 Figure 1: Object space as discriminative weights. Objects live in a low-dimensional manifold of a high-dimensional latent space.…”

Section: Related Workmentioning

confidence: 99%

“…Nirkin et al, 2021, Sitzmann et al, 2020. For example, [Krueger et al, 2017] proposes Bayesian hypernetworks to learn the variational inference in neural networks and [Bertinetto et al, 2016] proposes to learn the network parameters in one shot. HyperSeg [Nirkin et al, 2021] presents real-time semantic segmentation by employing a U-Net within a U-Net architecture, and [Finn et al, 2019] applies hypernetwork to adapt to new tasks for continual lifelong learning.…”

Section: Related Workmentioning

confidence: 99%

Object Pursuit: Building a Space of Objects via Discriminative Weight Generation

Pan¹,

Yang²,

Mo³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose a framework to continuously learn object-centric representations for visual learning and understanding. Existing object-centric representations either rely on supervisions that individualize objects in the scene, or perform unsupervised disentanglement that can hardly deal with complex scenes in the real world. To mitigate the annotation burden and relax the constraints on the statistical complexity of the data, our method leverages interactions to effectively sample diverse variations of an object and the corresponding training signals while learning the object-centric representations. Throughout learning, objects are streamed one by one in random order with unknown identities, and are associated with latent codes that can synthesize discriminative weights for each object through a convolutional hypernetwork. Moreover, re-identification of learned objects and forgetting prevention are employed to make the learning process efficient and robust. We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations. Furthermore, we demonstrate the capability of the proposed framework in learning representations that can improve label efficiency in downstream tasks. Our code and trained models will be made publicly available.

show abstract

“…Hierarchical modeling in the Bayesian framework has been successful to design the form of the prior (Daumé III, 2009;Zhao et al, 2017;Klushyn et al, 2019;Wang & Van Hoof, 2020) and posterior distributions (Ranganath et al, 2016;Krueger et al, 2017;Zhen et al, 2020) based on many observations. It allows the latent variable to follow a complicated distribution and forms a highly flexible approximation (Krueger et al, 2017).…”

Section: Related Workmentioning

confidence: 99%

Multi-Task Neural Processes

Shen¹,

Zhen²,

Worring³

et al. 2021

Preprint

View full text Add to dashboard Cite

Neural processes have recently emerged as a class of powerful neural latent variable models that combine the strengths of neural networks and stochastic processes. As they can encode contextual data in the network's function space, they offer a new way to model task relatedness in multi-task learning. To study its potential, we develop multi-task neural processes, a new variant of neural processes for multi-task learning. In particular, we propose to explore transferable knowledge from related tasks in the function space to provide inductive bias for improving each individual task. To do so, we derive the function priors in a hierarchical Bayesian inference framework, which enables each task to incorporate the shared knowledge provided by related tasks into its context of the prediction function. Our multi-task neural processes methodologically expand the scope of vanilla neural processes and provide a new way of exploring task relatedness in function spaces for multi-task learning. The proposed multi-task neural processes are capable of learning multiple tasks with limited labeled data and in the presence of domain shift. We perform extensive experimental evaluations on several benchmarks for the multi-task regression and classification tasks. The results demonstrate the effectiveness of multi-task neural processes in transferring useful knowledge among tasks for multi-task learning and superior performance in multi-task classification and brain image segmentation 1 .

show abstract

Bayesian Hypernetworks

Cited by 42 publications

References 5 publications

Editing Factual Knowledge in Language Models

Editing Factual Knowledge in Language Models

Object Pursuit: Building a Space of Objects via Discriminative Weight Generation

Multi-Task Neural Processes

Contact Info

Product

Resources

About