A grand challenge in representation learning is to learn the different explanatory factors of variation behind the high dimen-sional data. Encoder models are often determined to optimize performance on training data when the real objective is to generalize well to unseen data. Although there is enough numerical evidence suggesting that noise injection (during training) at the representation level might improve the generalization ability of encoders, an information-theoretic understanding of this principle remains elusive. This paper presents a sample-dependent bound on the generalization gap of the cross-entropy loss that scales with the information complexity (IC) of the representations, meaning the mutual information between inputs and their representations. The IC is empirically investigated for standard multi-layer neural networks with SGD on MNIST and CIFAR-10 datasets; the behaviour of the gap and the IC appear to be in direct correlation, suggesting that SGD selects encoders to implicitly minimize the IC. We specialize the IC to study the role of Dropout on the generalization capacity of deep encoders which is shown to be directly related to the encoder capacity, being a measure of the distinguishability among samples from their representations. Our results support some recent regularization methods.
Aims. With the aim of assessing the effects of bars on disc galaxy properties, we present an analysis of different characteristics of spiral galaxies with strong, weak and without bars. Methods. We identified barred galaxies from the Sloan Digital Sky Survey (SDSS). By visual inspection of SDSS images we classified the face-on spiral galaxies brighter than g < 16.5 mag into strong-bar, weak-bar and unbarred. With the goal of providing an appropiate quantification of the influence of bars on galaxy properties, we also constructed a suitable control sample of unbarred galaxies with similar redshift, magnitude, morphology, bulge sizes, and local density environment distributions to that of barred galaxies. Results. We found 522 strong-barred and 770 weak-barred galaxies, which represent a bar fraction of 25.82%, with respect to the full sample of spiral galaxies, in good agreement with several previous studies. We also found that strong-barred galaxies show lower efficient in star formation activity and older stellar populations (as derived with the D n (4000) spectral index), with respect to weak-barred and unbarred spirals from the control sample. In addition, there is a significant excess of strong barred galaxies with red colors. The color-color and color-magnitude diagrams show that unbarred and weak-barred galaxies are more extended towards the blue zone, while strong-barred disc objects are mostly grouped in the red region. Strong barred galaxies present an important excess of high metallicity values, compared to unbarred and weak-barred disc objects, which show similar 12 + log (O/H) distributions. Regarding the mass-metallicity relation, we found that weak-barred and unbarred galaxies are fitted by similar curves, while strongbarred ones show a curve which falls abruptly, with more significance in the range of low stellar masses (log(M * /M ⊙ ) < 10.0). These results would indicate that prominent bars produced an accelerating effect on the gas processing, reflected in the significant changes in the physical properties of the host galaxies.
This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to fully describe the data itself, in order to build meaningful representations of a relevant content (multiple labels). We begin by introducing the noisy lossy source coding paradigm with the log-loss fidelity criterion which provides the fundamental tradeoffs between the cross-entropy loss (average risk) and the information rate of the features (model complexity). Our approach allows an information theoretic formulation of the multi-task learning (MTL) problem which is a supervised learning framework in which the prediction models for several related tasks are learned jointly from common representations to achieve better generalization performance. Then, we present an iterative algorithm for computing the optimal tradeoffs and its global convergence is proven provided that some conditions hold. An important property of this algorithm is that it provides a natural safeguard against overfitting, because it minimizes the average risk taking into account a penalization induced by the model complexity. Remarkably, empirical results illustrate that there exists an optimal information rate minimizing the excess risk which depends on the nature and the amount of available training data. An application to hierarchical text categorization is also investigated, extending previous works.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.