2022
DOI: 10.48550/arxiv.2206.00164
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Theoretical Framework for Inference Learning

Abstract: Backpropagation (BP) is the most successful and widely used algorithm in deep learning. However, the computations required by BP are challenging to reconcile with known neurobiology. This difficulty has stimulated interest in more biologically plausible alternatives to BP. One such algorithm is the inference learning algorithm (IL). IL has close connections to neurobiological models of cortical function and has achieved equal performance to BP on supervised learning and auto-associative tasks. In contrast to B… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(8 citation statements)
references
References 28 publications
0
8
0
Order By: Relevance
“…We compare these methods in the case of a one hidden layer neural network, identical for the teacher and the student, with input dimension d 1 = 50, d 2 = 10 hidden units and a scalar output. This corresponds to the function (2) σ W (1) x + b (1) , (11) where σ(x) = max(0, x) and W indicates the collection of all parameters: W (1) ∈ R d2×d1 and W (2) ∈ R 1×d2 . We specify the prior by setting λ…”
Section: Teacher Student Convergence Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…We compare these methods in the case of a one hidden layer neural network, identical for the teacher and the student, with input dimension d 1 = 50, d 2 = 10 hidden units and a scalar output. This corresponds to the function (2) σ W (1) x + b (1) , (11) where σ(x) = max(0, x) and W indicates the collection of all parameters: W (1) ∈ R d2×d1 and W (2) ∈ R 1×d2 . We specify the prior by setting λ…”
Section: Teacher Student Convergence Methodsmentioning
confidence: 99%
“…, where P indicates the posterior distribution; this is the score rescaled by ∆ and averaged over the first layer weights. We also tried taking the gradient with respect to the second layer weights W (2) , or with respect to Z (2) . The results do not change significantly and the score methods keep severely underestimating the thermalization time.…”
Section: B4 Score Methodsmentioning
confidence: 99%
See 3 more Smart Citations