2019
DOI: 10.48550/arxiv.1902.06583
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast Efficient Hyperparameter Tuning for Policy Gradients

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…RL training is notoriously sensitive to hyper-parameters and environment changes [18]. Recently, many works attempted to take techniques in AutoML to alleviate human intervention, for example, hyper-parameter optimization [7,36,49,53], reward search [9,45] and network architecture search [38,10]. In contrast to these methods which optimize a new configuration for each environment, we search for auxiliary loss functions that generalize across different settings such as (i) different robots of control; (ii) different data types of observation; (iii) partially observable settings; (iv) different network architectures; (v) different benchmark domains.…”
Section: Related Workmentioning
confidence: 99%
“…RL training is notoriously sensitive to hyper-parameters and environment changes [18]. Recently, many works attempted to take techniques in AutoML to alleviate human intervention, for example, hyper-parameter optimization [7,36,49,53], reward search [9,45] and network architecture search [38,10]. In contrast to these methods which optimize a new configuration for each environment, we search for auxiliary loss functions that generalize across different settings such as (i) different robots of control; (ii) different data types of observation; (iii) partially observable settings; (iv) different network architectures; (v) different benchmark domains.…”
Section: Related Workmentioning
confidence: 99%
“…30 Hyperparameter tuning is selecting a set of optimal hyperparameters for a machine learning or CNN methods. 31 To evaluate the system's performance, this study considers Learning Rate (α), Batch Size (m), and Optimizers. Depending on the hyperparameter values, different results are obtained.…”
Section: Experimental Design Evaluation Metrics Evaluations Outcomes ...mentioning
confidence: 99%
“…OMPAC [10] uses a genetic algorithm to select the policy's softmax temperature and TD(λ) parameters in discrete environments such as Atari and Tetris. HOOF [37] generates a population of policy gradient updates with different loss function parameters and selects the best combination to continue training with weighted importance sampling. Agent57 [2] uses hyperparameter selection by a multi-armed bandit to improve exploration and surpasses human performance on the Atari benchmark.…”
Section: Related Workmentioning
confidence: 99%