Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

Denevi, Giulia; Ciliberto, Carlo; Grazzi, Riccardo; Pontil, Massimiliano

doi:10.48550/arxiv.1903.10399

Cited by 9 publications

(13 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There has also been line of recent work providing guarantees for gradient-based meta-learning (MAML) [Finn et al, 2017]. Finn et al [2019], Khodak et al [2019a] and Denevi et al [2019] work in the framework of online convex optimization (OCO) and use a notion of task similarity that assumes closeness of all tasks to a single fixed point in parameter space to provide guarantees. Khodak et al [2019b] strengthens earlier meta-learning guarantees in the OCO framework and provides bounds with more general notions of data-dependent task similarity.…”

Section: Related Workmentioning

confidence: 99%

Provable Meta-Learning of Linear Representations

Tripuraneni¹,

Jin²,

Jordan³

2020

Preprint

View full text Add to dashboard Cite

Meta-learning, or learning-to-learn, seeks to design algorithms that can utilize previous experience to rapidly learn new skills or adapt to new environments. Representation learning-a key tool for performing meta-learning-learns a data representation that can transfer knowledge across multiple tasks, which is essential in regimes where data is scarce. Despite a recent surge of interest in the practice of meta-learning, the theoretical underpinnings of meta-learning algorithms are lacking, especially in the context of learning transferable representations. In this paper, we focus on the problem of multi-task linear regression-in which multiple linear regression models share a common, low-dimensional linear representation. Here, we provide provably fast, sample-efficient algorithms to address the dual challenges of (1) learning a common set of features from multiple, related tasks, and (2) transferring this knowledge to new, unseen tasks. Both are central to the general problem of meta-learning. Finally, we complement these results by providing information-theoretic lower bounds on the sample complexity of learning these linear features, showing that our algorithms are optimal up to logarithmic factors.

show abstract

Section: Related Workmentioning

confidence: 99%

Provable Meta-Learning of Linear Representations

Tripuraneni¹,

Jin²,

Jordan³

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…A theoretical study was proposed by [2], but the strategies in this paper are not feasible in practice. This problem was improved recently [11,5,12,17,36,16,14,21]. The closest work to this paper is [13], where the authors propose an efficient strategy to learn the starting point of online gradient descent.…”

Section: Related Workmentioning

confidence: 99%

Meta-strategy for Learning Tuning Parameters with Guarantees

Meunier,

Alquier

2021

Preprint

View full text Add to dashboard Cite

Online gradient methods, like the online gradient algorithm (OGA), often depend on tuning parameters that are difficult to set in practice. We consider an online meta-learning scenario, and we propose a meta-strategy to learn these parameters from past tasks. Our strategy is based on the minimization of a regret bound. It allows to learn the initialization and the step size in OGA with guarantees. We provide a regret analysis of the strategy in the case of convex losses. It suggests that, when there are parameters θ1, . . . , θT solving well tasks 1, . . . , T respectively and that are close enough one to each other, our strategy indeed improves on learning each task in isolation.

show abstract

“…This thesis focuses on the first MAML algorithms, but the techniques here can be extended to analyze the Hessian-free multi-step MAML. Alternatively to meta-initialization algorithms such as MAML, meta-regularization approaches aim to learn a good bias for a regularized empirical risk minimization problem for intra-task learning [2,22,21,20,104,8,132]. [8] formalized a connection between meta-initialization and meta-regularization from an online learning perspective.…”

Section: Related Workmentioning

confidence: 99%

Bilevel Optimization for Machine Learning: Algorithm Design and Convergence Analysis

2021

Preprint

View full text Add to dashboard Cite

Bilevel optimization has become a powerful framework in a variety of machine learning applications including signal processing, meta-learning, hyperparameter optimization, reinforcement learning and network architecture search. There are generally two classes of bilevel optimization formulations for modern machine learning: 1) problem-based bilevel optimization, whose inner-level problem is formulated as finding a minimizer of a given loss function; and 2) algorithm-based bilevel optimization, whose inner-level solution is an output of a fixed algorithm. For the first problem class, two popular types of gradient-based algorithms have been proposed to estimate the gradient of the outer-level objective (hypergradient) via approximate implicit differentiation (AID) and iterative differentiation (ITD). Algorithms for the second problem class include the popular model-agnostic meta-learning (MAML) and almost no inner loop (ANIL). Although bilevel optimization algorithms have been widely used, their convergence rate and fundamental limitations have not been well explored.

show abstract

Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

Cited by 9 publications

References 10 publications

Provable Meta-Learning of Linear Representations

Provable Meta-Learning of Linear Representations

Meta-strategy for Learning Tuning Parameters with Guarantees

Bilevel Optimization for Machine Learning: Algorithm Design and Convergence Analysis

Contact Info

Product

Resources

About