On First-Order Meta-Learning Algorithms

Nichol, Alex; Achiam, Joshua; John, Sabu

doi:10.48550/arxiv.1803.02999

Cited by 554 publications

(933 citation statements)

References 9 publications

Supporting

Mentioning

863

Contrasting

Order By: Relevance

“…While firstorder MAML requires less memory and compute for each update, it performed significantly worse than MAML. Other first-order MAML methods such as REPTILE (Nichol et al, 2018) are an alternative that can reduce memory usage at the cost of more compute time and some accuracy.…”

Section: Discussionmentioning

confidence: 99%

Meta-learning Spiking Neural Networks with Surrogate Gradient Descent

Stewart¹,

Neftci²

2022

Preprint

View full text Add to dashboard Cite

Adaptive "life-long" learning at the edge and during online task performance is an aspirational goal of AI research. Neuromorphic hardware implementing Spiking Neural Networks (SNNs) are particularly attractive in this regard, as their real-time, event-based, local computing paradigm makes them suitable for edge implementations and fast learning. However, the long and iterative learning that characterizes state-of-the-art SNN training is incompatible with the physical nature and real-time operation of neuromorphic hardware. Bi-level learning, such as meta-learning is increasingly used in deep learning to overcome these limitations. In this work, we demonstrate gradient-based meta-learning in SNNs using the surrogate gradient method that approximates the spiking threshold function for gradient estimations. Because surrogate gradients can be made twice differentiable, well-established, and effective second-order gradient meta-learning methods such as Model Agnostic Meta Learning (MAML) can be used. We show that SNNs meta-trained using MAML match or exceed the performance of conventional Artificial Neural Networks (ANNs) meta-trained with MAML on event-based meta-datasets. Furthermore, we demonstrate the specific advantages that accrue from meta-learning: fast learning without the requirement of high precision weights or gradients. Our results emphasize how meta-learning techniques can become instrumental for deploying neuromorphic learning technologies on real-world problems.

show abstract

Section: Discussionmentioning

confidence: 99%

Meta-learning Spiking Neural Networks with Surrogate Gradient Descent

Stewart¹,

Neftci²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The meta-model is built so that it can be rapidly adapted (online) to any new learning task that may be encountered, exploiting just a few experiences from the new task. The works in [11], [12] validate a meta-learning framework that can be used in several learning tasks, e.g., it can be applied to both supervised ML (regression and classification) and RL scenarios. Other works propose metalearning for more specific scenarios, i.e., the update rule and selective copy of weights of deep networks [36], [37], [38] and recurrent networks [39], [40], [41].…”

Section: Learning Concepts In Networkingmentioning

confidence: 99%

“…Once derived with the above procedure, meta-model Θ can be used as a starting point for finding any specific model that suits a newly encountered task, by only using a small amount of experience collected on this new task [11], [12]. In our scenario, the learning tasks of FALCON are the different network conditions it may encounter and it should adapt to by deriving specific scheduling policies.…”

Section: A Algorithmmentioning

confidence: 99%

“…Within the above context, this paper proposes FALCON, a ML-based multipath scheduler that combines online and offline learning. FALCON builds on the idea of meta-learning [11], [12], where a metamodel is set up via offline learning and fine-tuned via online learning. The online learning experience also feeds back to the offline learning function to form a closed loop for continuously updating the meta-model.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FALCON: Fast and Accurate Multipath Scheduling using Offline and Online Learning

Wu¹,

Alay²,

Brunström³

et al. 2022

Preprint

View full text Add to dashboard Cite

Multipath transport protocols enable the concurrent use of different network paths, benefiting a fast and reliable data transmission. The scheduler of a multipath transport protocol determines how to distribute data packets over different paths. Existing multipath schedulers either conform to predefined policies or to online trained policies. The adoption of millimeter wave (mmWave) paths in 5th Generation (5G) networks and Wireless Local Area Networks (WLANs) introduces timevarying network conditions, under which the existing schedulers struggle to achieve fast and accurate adaptation. In this paper, we propose FALCON, a learning-based multipath scheduler that can adapt fast and accurately to time-varying network conditions. FALCON builds on the idea of meta-learning where offline learning is used to create a set of meta-models that represent coarse-grained network conditions, and online learning is used to bootstrap a specific model for the current fine-grained network conditions towards deriving the scheduling policy to deal with such conditions. Using trace-driven emulation experiments, we demonstrate FALCON outperforms the best state-of-the-art scheduler by up to 19.3% and 23.6% in static and mobile networks, respectively. Furthermore, we show FALCON is quite flexible to work with different types of applications such as bulk transfer and web services. Moreover, we observe FALCON has a much faster adaptation time compared to all the other learningbased schedulers, reaching almost an 8-fold speedup compared to the best of them. Finally, we have validated the emulation results in real-world settings illustrating that FALCON adapts well to the dynamicity of real networks, consistently outperforming all other schedulers.

show abstract

“…Meta reinforcement learning (e.g., Duan et al, 2016;Finn et al, 2017;Nichol et al, 2018;Xu et al, 2018;Rakelly et al, 2019;Zintgraf et al, 2020) can be seen as a generalized settings of reward transfer, where the difference between the tasks can also be different in the underlying dynamics. And usually they still need few-shot interactions with the environment to generalize, differ from our pure offline settings.…”

Section: Related Workmentioning

confidence: 99%

Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

Tang¹,

Feng²,

Liu³

2022

Preprint

View full text Add to dashboard Cite

Reinforcement learning (RL) has drawn increasing interests in recent years due to its tremendous success in various applications. However, standard RL algorithms can only be applied for single reward function, and cannot adapt to an unseen reward function quickly. In this paper, we advocate a general operator view of reinforcement learning, which enables us to directly approximate the operator that maps from reward function to value function. The benefit of learning the operator is that we can incorporate any new reward function as input and attain its corresponding value function in a zero-shot manner. To approximate this special type of operator, we design a number of novel operator neural network architectures based on its theoretical properties. Our design of operator networks outperform the existing methods and the standard design of general purpose operator network, and we demonstrate the benefit of our operator deep Q-learning framework in several tasks including reward transferring for offline policy evaluation (OPE) and reward transferring for offline policy optimization in a range of tasks.

show abstract

On First-Order Meta-Learning Algorithms

Cited by 554 publications

References 9 publications

Meta-learning Spiking Neural Networks with Surrogate Gradient Descent

Meta-learning Spiking Neural Networks with Surrogate Gradient Descent

FALCON: Fast and Accurate Multipath Scheduling using Offline and Online Learning

Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

Contact Info

Product

Resources

About