Toward Better Generalization Bounds with Locally Elastic Stability

Deng, Zhun; He, Hangfeng; Su, Weijie J.

doi:10.48550/arxiv.2010.13988

Cited by 2 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We show that the transformations used by ContRE would not significantly affect the generalization estimation, by additionally reporting the correlations between transformed training and testing data, i.e., L 𝜽 (𝑅(𝒁 )) and L 𝜽 (𝑅(𝒁 𝑡𝑒𝑠𝑡 )) (0.983, 0.903 and 0.989 for CIFAR10, CIFAR100 and ImageNet respectively), but for further extensions, transformations should be cautiously chosen. (2) Theoretical analyses of local elasticity [3,11,15] seems related to our approach, but theoretical links among contrastive examples, generalization performance and this notion are not available yet. (3) Lack of appropriate data transformations, it would be difficult to extend to other data formats, such as texts, audios, graphs etc.…”

Section: Conclusion and Discussionmentioning

confidence: 99%

“…Corneanu et al [8] define DNNs on the topological space and estimate the gaps from the topology summaries. Note that ContRE could be loosely guaranteed through data-dependent theoretical bounds [3] and potentially related to the local elasticity [11,15] while the theoretical adaptation to DNN models should be more cautiously considered. We leave this as future work.…”

Section: Contrastive Learning and Contrastivementioning

confidence: 99%

“…Recently, a practical approach to evaluate generalization performance that exploits the training samples with image transformations, e.g., crops, shifts, rotations and color distortions, has been discussed in [23] and suggested as a potential way to assess the generalization performance using such contrastive examples (named after contrastive learning [5,7] for the transformed examples). Though this approach could be loosely backed by some theoretical results [3,11], the empirical use of contrastive examples for performance evaluation would be less effective when the similar data transformations have been used for training data augmentation to produce the model. This failure can be well explained by the high capacity of DNNs in memorization to overfit every seen sample [46].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Practical Assessment of Generalization Performance Robustness for Deep Networks via Contrastive Examples

Wu¹,

Li²,

Xiong³

et al. 2021

Preprint

View full text Add to dashboard Cite

Training images with data transformations have been suggested as contrastive examples to complement the testing set for generalization performance evaluation of deep neural networks (DNNs) [23]. In this work, we propose a practical framework ContRE 1 that uses Contrastive examples for DNN geneRalization performance Estimation. Specifically, ContRE follows the assumption in [5, 16] that robust DNN models with good generalization performance are capable of extracting a consistent set of features and making consistent predictions from the same image under varying data transformations. Incorporating with a set of randomized strategies for well-designed data transformations over the training set, ContRE adopts classification errors and Fisher ratios on the generated contrastive examples to assess and analyze the generalization performance of DNN models in complement with a testing set. To show the effectiveness and efficiency of ContRE, extensive experiments have been done using various DNN models on three open source benchmark datasets with thorough ablation studies and applicability analyses. Our experiment results confirm that (1) behaviors of deep models on contrastive examples are strongly correlated to what on the testing set, and (2) ContRE is a robust measure of generalization performance complementing to the testing set in various settings.

show abstract

Section: Conclusion and Discussionmentioning

confidence: 99%

Section: Contrastive Learning and Contrastivementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Practical Assessment of Generalization Performance Robustness for Deep Networks via Contrastive Examples

Wu¹,

Li²,

Xiong³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It was shown that stability is closely related to the fundamental problem of learnability (Rakhlin et al, 2005;Shalev-Shwartz et al, 2010). Hardt et al (2016) pioneered the generalization analysis of SGD via stability, which inspired several upcoming work to understand stochastic optimization algorithms based on different algorithmic stability measures, e.g., uniform stability (Chen et al, 2018;Lin et al, 2016;Madden et al, 2020;Mou et al, 2018;Richards et al, 2020), argument stability (Bassily et al, 2020;Lei & Ying, 2020;Liu et al, 2017), on-average stability (Kuzborskij & Lampert, 2018;Lei & Ying, 2021a), hypothesis stability (Charles & Papailiopoulos, 2018;Foster et al, 2019;London, 2017), Bayes stability (Li et al, 2020) and locally elastic stability (Deng et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

Lei,

Yang,

Yang

et al. 2021

Preprint

View full text Add to dashboard Cite

Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs), AUC maximization and robust estimation, to mention but a few. A substantial amount of studies are devoted to studying the convergence behavior of their stochastic gradient-type algorithms. In contrast, there is relatively little work on their generalization, i.e., how the learning models built from training examples would behave on test examples. In this paper, we provide a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvexnonconcave cases through the lens of algorithmic stability. We establish a quantitative connection between stability and several generalization measures both in expectation and with high probability. For the convex-concave setting, our stability analysis shows that stochastic gradient descent ascent attains optimal generalization bounds for both smooth and nonsmooth minimax problems. We also establish generalization bounds for both weakly-convex-weakly-concave and gradient-dominated problems.The weak PD empirical risk of (w, v) is defined asWe refer to △ w (w, v) − △ w S (w, v) as the weak PD generalization error of the model (w, v).Definition 2 (Strong Primal-Dual Risk). The strong PD population risk of a model (w, v) is defined asThe strong PD empirical risk of (w, v) is defined asWe refer to △ s (w, v) − △ s S (w, v) as the strong PD generalization error of the model (w, v).Definition 3 (Primal Risk). The primal population risk of a model w is defined as R(w) = sup v∈V F (w, v). The primal empirical risk of w is defined as R S (w) = sup v∈V F S (w, v). We refer to R(w) − R S (w) as the primal generalization error of the model w, and R(w) − inf w ′ R(w ′ ) as the excess primal population risk.According to the above definitions, we know w, v). Furthermore, we refer to F (w, v) − F S (w, v) as the plain generalization error as it is standard in SLT. A standard approach to handle a population risk is to decompose it into a generalization error (estimation error) and an empirical risk (optimization error) (Bousquet & Bottou, 2008). For example, the weak PD population risk can be decomposed as △ w (w, v) = △ w (w, v) − △ w S (w, v) + △ w S (w, v). (2.2)

show abstract

Toward Better Generalization Bounds with Locally Elastic Stability

Cited by 2 publications

References 18 publications

Practical Assessment of Generalization Performance Robustness for Deep Networks via Contrastive Examples

Practical Assessment of Generalization Performance Robustness for Deep Networks via Contrastive Examples

Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

Contact Info

Product

Resources

About