Learning with delayed synaptic plasticity

Yaman, Anil; Iacca, Giovanni; Mocanu, Decebal Constantin; Fletcher, George H. L.; Pechenizkiy, Mykola

doi:10.1145/3321707.3321723

Cited by 5 publications

(3 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Orchard and Wang [37], linear and non-linear learning rules are evolved to adapt to a simple foraging task. Yaman et al [62] use genetic algorithms to optimize delayed synaptic plasticity that can learn from distal rewards. Common to these examples are that learning rules that update neural connections have access to a reward signal during the lifetime of the agent.…”

Section: Hebbian Plasticitymentioning

confidence: 99%

Evolving and merging hebbian learning rules

Winther

Risi

2021

Proceedings of the Genetic and Evolutionary Computation Conference

View full text Add to dashboard Cite

Generalization to out-of-distribution (OOD) circumstances after training remains a challenge for artificial agents. To improve the robustness displayed by plastic Hebbian neural networks, we evolve a set of Hebbian learning rules, where multiple connections are assigned to a single rule. Inspired by the biological phenomenon of the genomic bottleneck, we show that by allowing multiple connections in the network to share the same local learning rule, it is possible to drastically reduce the number of trainable parameters, while obtaining a more robust agent. During evolution, by iteratively using simple K-Means clustering to combine rules, our Evolve & Merge approach is able to reduce the number of trainable parameters from 61,440 to 1,920, while at the same time improving robustness, all without increasing the number of generations used. While optimization of the agents is done on a standard quadruped robot morphology, we evaluate the agents' performances on slight morphology modifications in a total of 30 unseen morphologies. Our results add to the discussion on generalization, overfitting and OOD adaptation. To create agents that can adapt to a wider array of unexpected situations, Hebbian learning combined with a regularising "genomic bottleneck" could be a promising research direction. CCS CONCEPTS• Computing methodologies → Bio-inspired approaches.

show abstract

Section: Hebbian Plasticitymentioning

confidence: 99%

Evolving and merging hebbian learning rules

Winther

Risi

2021

Proceedings of the Genetic and Evolutionary Computation Conference

View full text Add to dashboard Cite

show abstract

“…The eligibility traces were proposed to trace the pairwise activations of pre-and post-synaptic neurons during an episode [3]. Data structures inspired by the eligibility traces were previously employed to associate the pairwise neuron activations with reinforcement signals [7,16,18]. Shown in Table 1, we use neuron activation traces (NATs) in each synapse to keep track of their activations (i.e.…”

Section: Evolving Plasticity For Producing Noveltymentioning

confidence: 99%

Novelty producing synaptic plasticity

Yaman

Iacca

Mocanu

et al. 2020

Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion

Self Cite

View full text Add to dashboard Cite

A learning process with the plasticity property often requires reinforcement signals to guide the process. However, in some tasks (e.g. maze-navigation), it is very difficult (or impossible) to measure the performance of an agent (i.e. a fitness value) to provide reinforcements since the position of the goal is not known. This requires finding the correct behavior among a vast number of possible behaviors without having the knowledge of the reinforcement signals. In these cases, an exhaustive search may be needed. However, this might not be feasible especially when optimizing artificial neural networks in continuous domains. In this work, we introduce novelty producing synaptic plasticity (NPSP), where we evolve synaptic plasticity rules to produce as many novel behaviors as possible to find the behavior that can solve the problem. We evaluate the NPSP on maze-navigation on deceptive maze environments that require complex actions and the achievement of subgoals to complete. Our results show that the search heuristic used with the proposed NPSP is indeed capable of producing much more novel behaviors in comparison with a random search taken as baseline.

show abstract

“…MCC was demonstrated for the very first time in a maze navigation problem [13]. Mazes are, in fact, a paradigmatic example of tasks with sparse, delayed reward [15], for which various approaches based on quality search [16], [17] or novelty [18] have been proposed. According to the setting proposed in [13], tasks (mazes) are co-evolved with agents (maze navigators) controlled by neural networks.…”

Section: Introductionmentioning

confidence: 99%

Curriculum Learning for Robot Manipulation Tasks With Sparse Reward Through Environment Shifts

Sayar,

Iacca,

Knoll

2024

IEEE Access

Self Cite

View full text Add to dashboard Cite

Multi-goal reinforcement learning (RL) with sparse rewards poses a significant challenge for RL methods. Hindsight experience replay (HER) addresses this challenge by learning from failures and replacing the desired goals with achieved states. However, HER often becomes inefficient when the desired goals are far away from the initial states. This paper introduces co-adapting hindsight experience replay with environment shifts (in short, COHER). COHER generates progressively more complex tasks as soon as the agent's success surpasses a predefined threshold. The generated tasks and agent are coupled to optimize the behavior of the agent within each task-agent pair. We evaluate COHER on various sparse reward robotic tasks that require obstacle avoidance capabilities and compare COHER with hindsight goal generation (HGG), curriculum-guided hindsight experience replay (CHER), and vanilla HER. The results show that COHER consistently outperforms the other methods and that the obtained policies can avoid obstacles without having explicit information about their position. Lastly, we deploy such policies to a real Franka robot for Sim2Real analysis. We observe that the robot can achieve the task by avoiding obstacles, whereas policies obtained with other methods cannot. The videos and code are publicly available at: https://erdiphd.github.io/COHER/.INDEX TERMS Curriculum learning-based reinforcement learning, hindsight experience replay, multi-goal reinforcement learning, robotic control.

show abstract

Learning with delayed synaptic plasticity

Cited by 5 publications

References 30 publications

Evolving and merging hebbian learning rules

Evolving and merging hebbian learning rules

Novelty producing synaptic plasticity

Curriculum Learning for Robot Manipulation Tasks With Sparse Reward Through Environment Shifts

Contact Info

Product

Resources

About