The information complexity of learning tasks, their structure and their distance

Achille, Alessandro; Paolini, Giovanni; Mbeng, Glen Bigan; Soatto, Stefano

doi:10.1093/imaiai/iaaa033

Cited by 20 publications

(12 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On a different note, there have been few proposals of viable metrics for evaluating similarity between real datasets, with the goal of obtaining predictors of the associated transfer learning performance. Some of these architecture agnostic metrics evaluate dataset distances based on information theory, information geometry and optimal transport [53][54][55][56][57]. An interesting direction for future work would be to connect these distance metrics with the parametric transformations in the CHMM.…”

Section: Connection With Related Workmentioning

confidence: 99%

Probing transfer learning with a model of synthetic correlated datasets

Gerace¹,

Saglietti²,

Mannelli³

et al. 2022

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Transfer learning can signiﬁcantly improve the sample eﬃciency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task. Despite years of successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical understanding of these procedures is still limited. In the present work, we re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets. This setup allows for an analytic characterization of the generalization performance obtained when transferring the learned feature map from the source to the target task. Focusing on the problem of training two-layer networks in a binary classiﬁcation setting, we show that our model can capture a range of salient features of transfer learning with real data. Moreover, by exploiting parametric control over the correlation between the two data-sets, we systematically investigate under which conditions the transfer of features is beneﬁcial for generalization.

show abstract

Section: Connection With Related Workmentioning

confidence: 99%

Probing transfer learning with a model of synthetic correlated datasets

Gerace¹,

Saglietti²,

Mannelli³

et al. 2022

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…In addition, specifically for histopathology images, non-hierarchical OT has been used to compare individual cell morphology (Basu et al, 2014;Wang et al, 2011) and to quantify domain shift at the tile level (Stacke et al, 2021). In addition, the relationship between OT-calculated dataset distances and the difficulty of transferability has been previously described by (Alvarez-Melis and Fusi, 2020;Gao and Chaudhari, 2021;Achille et al, 2021), although the notion of distance they define is generic and thus does not leverage the hierarchical nature of individual datasets.…”

Section: Related Workmentioning

confidence: 99%

Hierarchical Optimal Transport for Comparing Histopathology Datasets

Yeaton¹,

Krishnan²,

Mieloszyk³

et al. 2022

Preprint

View full text Add to dashboard Cite

Scarcity of labeled histopathology data limits the applicability of deep learning methods to under-profiled cancer types and labels. Transfer learning allows researchers to overcome the limitations of small datasets by pre-training machine learning models on larger datasets similar to the small target dataset. However, similarity between datasets is often determined heuristically. In this paper, we propose a principled notion of distance between histopathology datasets based on a hierarchical generalization of optimal transport distances. Our method does not require any training, is agnostic to model type, and preserves much of the hierarchical structure in histopathology datasets imposed by tiling. We apply our method to H&E stained slides from The Cancer Genome Atlas from six different cancer types. We show that our method outperforms a baseline distance in a cancer-type prediction task. Our results also show that our optimal transport distance predicts difficulty of transferability in a tumor vs. normal prediction setting.

show abstract

“…The latter depends on both the architecture used and the training algorithm, and increases during the training process, while the training loss decreases. While there is a wide variety of learning methods for large DNNs, most are variants of stochastic gradient descent (SGD), that share the same qualitative behavior and can, to first-order approximation, be interpreted as minimizing the Lagrangian [1] (free energy)…”

Section: Concepts Understandable By Differentiable Programmingmentioning

confidence: 99%

On the Learnability of Physical Concepts: Can a Neural Network Understand What's Real?

Achille¹,

Soatto²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

We revisit the classic signal-to-symbol barrier in light of the remarkable ability of deep neural networks to generate realistic synthetic data. DeepFakes and spoofing highlight the feebleness of the link between physical reality and its abstract representation, whether learned by a digital computer or a biological agent. Starting from a widely applicable definition of abstract concept, we show that standard feed-forward architectures cannot capture but trivial concepts, regardless of the number of weights and the amount of training data, despite being extremely effective classifiers. On the other hand, architectures that incorporate recursion can represent a significantly larger class of concepts, but may still be unable to learn them from a finite dataset. We qualitatively describe the class of concepts that can be "understood" by modern architectures trained with variants of stochastic gradient descent, using a (free energy) Lagrangian to measure information complexity. Even if a concept has been understood, however, a network has no means of communicating its understanding to an external agent, except through continuous interaction and validation. We then characterize physical objects as abstract concepts and use the previous analysis to show that physical objects can be encoded by finite architectures. However, to understand physical concepts, sensors must provide persistently exciting observations, for which the ability to control the data acquisition process is essential (active perception). The importance of control depends on the modality, benefiting visual more than acoustic or chemical perception. Finally, we conclude that binding physical entities to digital identities is possible in finite time with finite resources, therefore in principle solving the signal-to-symbol barrier problem, but awareness that the barrier has been overcome cannot be achieved in finite time by an external agent in general, thus engendering the need for continuous validation. We conduct a critical discussion of the assumptions and limitations of our analysis and indicate open avenues for future work.

show abstract

The information complexity of learning tasks, their structure and their distance

Cited by 20 publications

References 6 publications

Probing transfer learning with a model of synthetic correlated datasets

Probing transfer learning with a model of synthetic correlated datasets

Hierarchical Optimal Transport for Comparing Histopathology Datasets

On the Learnability of Physical Concepts: Can a Neural Network Understand What's Real?

Contact Info

Product

Resources

About