2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01329
|View full text |Cite
|
Sign up to set email alerts
|

How does topology influence gradient propagation and model performance of deep networks with DenseNet-type skip connections?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 14 publications
(23 citation statements)
references
References 19 publications
0
23
0
Order By: Relevance
“…We denote the network width as 𝑤 𝑐 = [2,3,4]. Finally, the maximum number of channels that can supply skip connections is given by 𝑡 𝑐 = [2,5,6]. That is, the first cell can have a maximum of two skip connection candidates per layer (i.e., previous channels that can supply skip connections), the second cell can have a maximum of five skip connections candidates per layer, and so on.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We denote the network width as 𝑤 𝑐 = [2,3,4]. Finally, the maximum number of channels that can supply skip connections is given by 𝑡 𝑐 = [2,5,6]. That is, the first cell can have a maximum of two skip connection candidates per layer (i.e., previous channels that can supply skip connections), the second cell can have a maximum of five skip connections candidates per layer, and so on.…”
Section: Methodsmentioning
confidence: 99%
“…Similarly, a DNN architecture can be seen as a network of connected neurons. As discussed in [5], the topology of deep networks has a significant impact on how effectively the gradients can propagate through the network and thus the test performance of neural networks. These observations motivate us to take an approach from network science to quantify the topological property of neural networks to accelerate NAS.…”
Section: Introductionmentioning
confidence: 99%
“…Training ViTs is slow: hence an architecture search guided by evaluating trained models' accuracies will be dauntingly expensive. We note a recent surge of training-free neural architecture search methods for ReLU-based CNNs, leveraging local linear maps (Mellor et al, 2020), gradient sensitivity (Abdelfattah et al, 2021), number of linear regions (Chen et al, 2021e;f), or network topology (Bhardwaj et al, 2021). However, ViTs are equipped with more complex non-linear functions: self-attention, softmax, and GeLU.…”
Section: Assessing Vit Complexity At Initialization Via Manifold Prop...mentioning
confidence: 99%
“…Also, some interesting phenomena (Frankle et al, 2020) are observed during the early phase of NN training, such as trainable sparse sub-networks emerge (Frankle et al, 2019), gradient descent moves into a small subspace (Gur-Ari et al, 2018), and there exists a critical effective connection between layers (Achille et al, 2019). Bhardwaj et al (2021) built a nice connection between architectures (with concatenation-type skip connections) and the performance, and proposed a new topological metric to identify NNs with similar accuracy. Many of these studies are built on dynamical system and network science.…”
Section: Epochmentioning
confidence: 99%