2021
DOI: 10.1093/imaiai/iaaa033
|View full text |Cite
|
Sign up to set email alerts
|

The information complexity of learning tasks, their structure and their distance

Abstract: We introduce an asymmetric distance in the space of learning tasks and a framework to compute their complexity. These concepts are foundational for the practice of transfer learning, whereby a parametric model is pre-trained for a task, and then fine tuned for another. The framework we develop is non-asymptotic, captures the finite nature of the training dataset and allows distinguishing learning from memorization. It encompasses, as special cases, classical notions from Kolmogorov complexity and Shannon and F… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 6 publications
0
12
0
Order By: Relevance
“…On a different note, there have been few proposals of viable metrics for evaluating similarity between real datasets, with the goal of obtaining predictors of the associated transfer learning performance. Some of these architecture agnostic metrics evaluate dataset distances based on information theory, information geometry and optimal transport [53][54][55][56][57]. An interesting direction for future work would be to connect these distance metrics with the parametric transformations in the CHMM.…”
Section: Connection With Related Workmentioning
confidence: 99%
“…On a different note, there have been few proposals of viable metrics for evaluating similarity between real datasets, with the goal of obtaining predictors of the associated transfer learning performance. Some of these architecture agnostic metrics evaluate dataset distances based on information theory, information geometry and optimal transport [53][54][55][56][57]. An interesting direction for future work would be to connect these distance metrics with the parametric transformations in the CHMM.…”
Section: Connection With Related Workmentioning
confidence: 99%
“…In addition, specifically for histopathology images, non-hierarchical OT has been used to compare individual cell morphology (Basu et al, 2014;Wang et al, 2011) and to quantify domain shift at the tile level (Stacke et al, 2021). In addition, the relationship between OT-calculated dataset distances and the difficulty of transferability has been previously described by (Alvarez-Melis and Fusi, 2020;Gao and Chaudhari, 2021;Achille et al, 2021), although the notion of distance they define is generic and thus does not leverage the hierarchical nature of individual datasets.…”
Section: Related Workmentioning
confidence: 99%
“…The latter depends on both the architecture used and the training algorithm, and increases during the training process, while the training loss decreases. While there is a wide variety of learning methods for large DNNs, most are variants of stochastic gradient descent (SGD), that share the same qualitative behavior and can, to first-order approximation, be interpreted as minimizing the Lagrangian [1] (free energy)…”
Section: Concepts Understandable By Differentiable Programmingmentioning
confidence: 99%