2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00186
|View full text |Cite
|
Sign up to set email alerts
|

Searching for a Robust Neural Architecture in Four GPU Hours

Abstract: Conventional neural architecture search (NAS) approaches are based on reinforcement learning or evolutionary strategy, which take more than 3000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach learning to search by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains billions of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
550
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 577 publications
(551 citation statements)
references
References 23 publications
1
550
0
Order By: Relevance
“…The deeper and wider architectures of deep CNNs bring about the superior performance of computer vision tasks [6,26,45]. However, they also cause the prohibitively expensive computational cost and make the model deployment on mobile devices hard if not impossible.…”
Section: Introductionmentioning
confidence: 99%
“…The deeper and wider architectures of deep CNNs bring about the superior performance of computer vision tasks [6,26,45]. However, they also cause the prohibitively expensive computational cost and make the model deployment on mobile devices hard if not impossible.…”
Section: Introductionmentioning
confidence: 99%
“…In order to back-propagate gradient though , we propose using the Gumbel-Max trick [ 39 , 40 ] to re-formulate Equation ( 1 ), which makes it possible to sample from a discrete probability distribution in an efficient way, as can see in ( 5 ) and ( 6 ). This method is proposed to perform NAS for the first time in GDAS [ 41 ]. DARTS needs to keep all intermediate results in memory, but the Gumbel-Max trick selects only one operation at a time.…”
Section: Methodsmentioning
confidence: 99%
“…The dense units are also fixed as 512, 1024, 2048 and 4096 for MNIST, CIFAR10, CIFAR100 and Tiny-ImageNet experiments. However, these hyperparameters may also be encoded in the search space and then searched using Binary CSA as demonstrated in [37]. Furthermore, the ablation experiments are performed to study the impact of tournament select method over random selection and our proposed dynamic flight length distribution , (Eq.…”
Section: Methodsmentioning
confidence: 99%
“…This method is also simpler than RL based methods as it does not involve controller. GDAS [37] proposes to use a differentiable architecture sampler and applies it to directed acyclic graphs (DAGs).…”
Section: Differential Evolution Based Neural Architecture Searchmentioning
confidence: 99%