Approximating gradients for differentiable quality diversity in reinforcement learning

Tjanaka, Bryon; Fontaine, Matthew C.; Togelius, Julian; Nikolaidis, Stefanos

doi:10.1145/3512290.3528705

Cited by 23 publications

(16 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Quality diversity for reinforcement learning (QD-RL). As defined in prior work [12], QD-RL is a special instance of QD in which φ parameterizes a reinforcement learning (RL) agent's policy π φ and the objective is the expected discounted return of the agent. QD-RL extends Markov Decision Processes (MDPs) [28] and is formulated as a tuple (S, U, p, r, γ, m).…”

Section: Problem Statementmentioning

confidence: 99%

“…For example, in locomotion, exploring corresponds to finding new controllers which use the robot's feet a different amount, while optimizing corresponds to making existing controllers walk faster. Prior work [12] shows that Fig. 1: We propose variants of the CMA-MAE algorithm which scale to high-dimensional controllers.…”

Section: Introductionmentioning

confidence: 99%

“…Prior algorithms [13], [12] seem to strike a balance between these two aspects of QD in locomotion tasks, leading to state-of-the-art results. However, these algorithms have practical limitations due to their dependence on deep reinforcement learning (RL) methods.…”

Section: Introductionmentioning

confidence: 99%

“…Compared with deep RL, ESs can run entirely on CPU and do not require network training, and ESs such as CMA-ES [15] are designed to have almost no hyperparameters. Given these benefits, prior work [12], [11] has developed QD algorithms based on ESs, but these methods have not yet been able to match the performance of deep RL-based QD methods.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing

Tjanaka¹,

Fontaine²,

Aniruddha³

et al. 2022

Preprint

View full text Add to dashboard Cite

Pre-training a diverse set of robot controllers in simulation has enabled robots to adapt online to damage in robot locomotion tasks. However, finding diverse, highperforming controllers requires specialized hardware and extensive tuning of a large number of hyperparameters. On the other hand, the Covariance Matrix Adaptation MAP-Annealing algorithm, an evolution strategies (ES)-based quality diversity algorithm, does not have these limitations and has been shown to achieve state-of-the-art performance in standard benchmark domains. However, CMA-MAE cannot scale to modern neural network controllers due to its quadratic complexity. We leverage efficient approximation methods in ES to propose three new CMA-MAE variants that scale to very high dimensions. Our experiments show that the variants outperform ES-based baselines in benchmark robotic locomotion tasks, while being comparable with state-of-the-art deep reinforcement learningbased quality diversity algorithms. Source code and videos are available at https://scalingcmamae.github.io Compute archive improvement Adapt and with ES Sample solutions Evaluate policies and insert into archive

show abstract

Section: Problem Statementmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing

Tjanaka¹,

Fontaine²,

Aniruddha³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…6The following website serves as a database with research related to QD: https://qualitydiversity.github.io/ maintained by Antoine Cully convergence search for preserving the high-performing individuals within the novel niches [114]. Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) is another algorithm in the QD family, and one that has gained considerable popularity in multiple areas such as games [115][116][117] and robotics [118,119]. As the other QD algorithms, MAP-Elites explores the behavioral space for a collection of solutions that are both highperforming and diverse among each other, with the caveat that MAP-Elites discretizes the behavior space as a grid of cells informed by a set of feature dimensions that illuminate the behavior space.…”

Section: Quality Diversitymentioning

confidence: 99%

E X P Loring Game De S Ign Through Human-Ai Col Laborat Ion

Alvarez¹

View full text Add to dashboard Cite

Game design is a hard and multi-faceted task that intertwines different gameplay mechanics, audio, level, graphic, and narrative facets. Games' facets are developed in conjunction with others with a common goal that makes games coherent and interesting. These combinations result in plenty of games in diverse genres, which usually require a collaboration of a diverse group of designers. Collaborators can take different roles and support each other with their strengths resulting in games with unique characteristics. The multi-faceted nature of games and their collaborative properties and requirements make it an exciting task to use Artificial Intelligence (AI). The generation of these facets together requires a holistic approach, which is one of the most challenging tasks within computational creativity. Given the collaborative aspect of games, this thesis approaches their generation through Human-AI collaboration, specifically using a mixed-initiative co-creative (MI-CC) paradigm. This paradigm creates an interactive and collaborative scenario that leverages AI and human strengths with an alternating and proactive initiative to approach a task. However, this paradigm introduces several challenges, such as Human and AI goal alignment or competing properties. In this thesis, game design and the generation of game facets by themselves and intertwined are explored through Human-AI collaboration. The AI takes a colleague's role with the designer, arising multiple dynamics, challenges, and opportunities. The main hypothesis is that AI can be incorporated into systems as a collaborator, enhancing design tools, fostering human creativity, and reducing workload. The challenges and opportunities that arise from this are explored, discussed, and approached throughout the thesis. As a result, multiple approaches and methods such as quality-diversity algorithms and designer modeling are proposed to generate game facets in tandem with humans, create a better workflow, enhance the interaction, and establish adaptive experiences.

show abstract

A population-based approach for multi-agent interpretable reinforcement learning

Crespi,

Ferigo,

Custode

et al. 2023

Applied Soft Computing

View full text Add to dashboard Cite

Approximating gradients for differentiable quality diversity in reinforcement learning

Cited by 23 publications

References 25 publications

Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing

Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing

E X P Loring Game De S Ign Through Human-Ai Col Laborat Ion

A population-based approach for multi-agent interpretable reinforcement learning

Contact Info

Product

Resources

About