2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00660
|View full text |Cite
|
Sign up to set email alerts
|

Searching for Robustness: Loss Learning for Noisy Classification Tasks

Abstract: We present a "learning to learn" approach for discovering white-box classification loss functions that are robust to label noise in the training data. We parameterise a flexible family of loss functions using Taylor polynomials, and apply evolutionary strategies to search for noise-robust losses in this space. To learn re-usable loss functions that can apply to new tasks, our fitness function scores their performance in aggregate across a range of training datasets and architectures. The resulting white-box lo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 21 publications
0
7
0
Order By: Relevance
“…For example, in [24,28], differentiable surrogates of non-differentiable performance metrics are learned to reduce the misalignment problem between the performance metric and the loss function. Alternatively, in [4,9,27,46], loss functions are learned to improve sample efficiency and asymptotic performance in supervised and reinforcement learning, while in [3,20,35], they improved on the robustness of a model to domain-shifts and improved domain-generalization.…”
Section: Gradient-based Approachesmentioning
confidence: 99%
“…For example, in [24,28], differentiable surrogates of non-differentiable performance metrics are learned to reduce the misalignment problem between the performance metric and the loss function. Alternatively, in [4,9,27,46], loss functions are learned to improve sample efficiency and asymptotic performance in supervised and reinforcement learning, while in [3,20,35], they improved on the robustness of a model to domain-shifts and improved domain-generalization.…”
Section: Gradient-based Approachesmentioning
confidence: 99%
“…MetaReg Balaji et al (2018) meta-learns regularization parameters to improve domain generalisation. ARL (Gao et al, 2021) meta-learns a loss function to improve robustness of learning form noisy labels.…”
Section: Related Workmentioning
confidence: 99%
“…This tells us that the expected convergence rate on novel tasks depends on the learning divergence on training tasks, plus complexity terms such as the F-norm of the meta-learned optimiser weights M . Note that restricting the diameter r of the parameter space is usually required to obtain generalisation guarantees (Bartlett et al, 2017;Long & Sedghi, 2020;Gouk et al, 2021), so this is not an unusual or counterproductive requirement.…”
Section: Generalisation Of the Learned Optimisermentioning
confidence: 99%
“…A promising alternative paradigm is to use evolution-based methods to learn M, favoring their inherent ability to avoid local optima via maintaining a population of solutions, their ease of parallelization of computation across multiple processors, and their ability to optimize for non-differentiable functions directly. Examples of such work include [16] and [17], which both represent M as parameterized Taylor polynomials optimized with covariance matrix adaptation evolutionary strategies (CMA-ES). These approaches successfully derive interpretable loss functions, but similar to previously, they also assume the parametric form via the degree of the polynomial.…”
Section: Evolution-based Approachesmentioning
confidence: 99%
“…In particular, many loss function learning approaches use a parametric loss function representation such as a neural network [15] or Taylor polynomial [16], [17], which is limited as it imposes unnecessary assumptions and constraints on the structure of the learned loss function. However, the current non-parametric alternative to this is to use a two-stage discovery and optimization process, which infers both the loss function structure and parameters simultaneously using genetic programming and covariance matrix adaptation [18], and quickly become intractable for large-scale optimization problems.…”
Section: Introductionmentioning
confidence: 99%