2018
DOI: 10.48550/arxiv.1812.11446
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Greedy Layerwise Learning Can Scale to ImageNet

Abstract: Shallow supervised 1-hidden layer neural networks have a number of favorable properties that make them easier to interpret, analyze, and optimize than their deep counterparts, but lack their representational power. Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks. Contrary to previous approaches using shallow networks, we focus on problems where deep learning is reported as critical for success. We thus study CNN… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(22 citation statements)
references
References 37 publications
0
22
0
Order By: Relevance
“…[18] used the layer-wise method to train residual blocks in ResNet sequentially, then refined the network with the standard end-to-end training. [19] studied the progressive separability of layer-wise trained supervised neural networks and demonstrated Greedy layer-wise Learning (GLL) can scale to large-scale datasets like ImageNet. Other attempts at supervised layer-wise learning involve a synthetic gradient [20] and a layer-wise loss that combines local classifier and similarity matching loss [21].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…[18] used the layer-wise method to train residual blocks in ResNet sequentially, then refined the network with the standard end-to-end training. [19] studied the progressive separability of layer-wise trained supervised neural networks and demonstrated Greedy layer-wise Learning (GLL) can scale to large-scale datasets like ImageNet. Other attempts at supervised layer-wise learning involve a synthetic gradient [20] and a layer-wise loss that combines local classifier and similarity matching loss [21].…”
Section: Related Workmentioning
confidence: 99%
“…Greedy and Randomized Layer-wise Learning. We adapted the supervised Greedy Layer-wise Learning (GLL) [19] method to self-supervised learning by training convolutional layers sequentially with auxiliary heads and self-supervised loss, as shown in Figure 1. The base encoders are trained layer by layer.…”
Section: Layer-wise Learning With Random Feedbackmentioning
confidence: 99%
See 1 more Smart Citation
“…We survey three categories of BP literature-(i) better hardware implementation of BP [15,16,31,11,32,25], (ii) workarounds to approximate BP [33,7,10], and (iii) biologically inspired algorithms. Biologically inspired algorithms can further be segregated into four types: (i) Inspired from biological observations [29,7,26,17], these works try to approximate BP with the intention resolve its biological implausibility, (ii) Propagation of an alternative to error [19,21], (iii) Leveraging local errors, the power of single layer networks, and layer wise pre-training to approximate BP [24,23,3], (iv) Resolving the locking problem using decoupling [14,6,12,1,20] and its variants [27,8,22,4]. We were deeply motivated by (ii), (iii), and (iv) while coming up with the idea of 'front contributions'-specifically, propagating something other than error, the idea of a single layer network, and decoupling, collectively inspire 'front contributions'.…”
Section: Introduction and Related Workmentioning
confidence: 99%
“…using gradient descent, where F denotes the network function space. Studying inductive bias in the context of autoencoders is relevant since (1) components of convolutional autoencoders are building blocks of many CNNs; (2) layerwise pre-training using autoencoders is a standard technique to initialize individual layers of CNNs to improve training [2,5,8]; and (3) autoencoder architectures are used in many image-to-image tasks such as image segmentation or impainting [25]. Furthermore, the inductive bias that we characterize in autoencoders may apply to more general architectures.…”
Section: Introductionmentioning
confidence: 99%