Self-supervised contrastive learning has recently been shown to be very effective in preventing deep networks from overfitting noisy labels. Despite its empirical success, the theoretical understanding of the effect of contrastive learning on boosting robustness is very limited. In this work, we rigorously prove that the representation matrix learned by contrastive learning boosts robustness, by having: (i) one prominent singular value corresponding to every sub-class in the data, and remaining significantly smaller singular values; and (ii) a large alignment between the prominent singular vector and the clean labels of each subclass. The above properties allow a linear layer trained on the representations to quickly learn the clean labels, and prevent it from overfitting the noise for a large number of training iterations. We further show that the low-rank structure of the Jacobian of deep networks pre-trained with contrastive learning allows them to achieve a superior performance initially, when fine-tuned on noisy labels. Finally, we demonstrate that the initial robustness provided by contrastive learning enables robust training methods to achieve stateof-the-art performance under extreme noise levels, e.g., an average of 27.18% and 15.58% increase in accuracy on CIFAR-10 and CIFAR-100 with 80% symmetric noisy labels, and 4.11% increase in accuracy on WebVision. Noise Type Sym Asym Sym Asym Noise Ratio 20 50 80 40 20 50 80 40 F-correction 85.1 ± 0.4 76.0 ± 0.2 34.8 ± 4.5 83.6 ± 2.2 55.8 ± 0.5 43.3 ± 0.7 − 42.3 ± 0.7 Decoupling 86.7 ± 0.3 79.3 ± 0.6 36.9 ± 4.6 75.3 ± 0.8 57.6 ± 0.5 45.7 ± 0.4 − 43.1 ± 0.4 Co-teaching 89.1 ± 0.3 82.1 ± 0.6 16.2 ± 3.2 84.6 ± 2.8 64.0 ± 0.3 52.3 ± 0.4 − 47.7 ± 1.2 MentorNet 88.4 ± 0.5 77.1 ± 0.4 28.9 ± 2.3 77.3 ± 0.8 63.0 ± 0.4 46.4 ± 0.4 − 42.4 ± 0.5 D2L 86.1 ± 0.4 67.4 ± 3.6 10.0 ± 0.1 85.6 ± 1.2 12.5 ± 4.2 5.6 ± 5.4 − 14.1 ± 5.8 INCV 89.7 ± 0.2 84.8 ± 0.3 52.3 ± 3.5 86.0 ± 0.5 60.2 ± 0.2 53.1 ± 0.4 − 50.7 ± 0.2 T-Revision 79.3 ± 0.5 78.5 ± 0.6 36.2 ± 1.6 76.3 ± 0.8 52.4 ± 0.3 37.6 ± 0.3 − 32.3 ± 0.4 L DMI 84.3 ± 0.4 78.8 ± 0.5 20.9 ± 2.2 84.8 ± 0.7 56.8 ± 0.4 42.2 ± 0.5 − 39.5 ± 0.4 Crust * 85.3 ± 0.5 86.8 ± 0.3 33.8 ± 1.3 76.7 ± 3.4 62.9 ± 0.3 55.5 ± 1.1 18.5 ± 0.8 52.5 ± 0.4 Mixup 89.7 ± 0.7 84.5 ± 0.3 40.7 ± 1.1 86.3 ± 0.1 64.0 ± 0.4 53.4 ± 0.5 15.1 ± 0.1 54.4 ± 2.0 ELR * 90.6 ± 0.6 87.7 ± 1.0 69.5 ± 5.0 86.6 ± 2.9 63.6 ± 1.7 52.5 ± 4.2 23.4 ± 1.9 59.7 ± 0.1 CL+E2E * 88.8 ± 0.5 82.8 ± 0.2 72.0 ± 0.3 83.5 ± 0.5 63.5 ± 0.2 56.1 ± 0.3 36.7 ± 0.3 52.4 ± 0.2 CL+Crust * 86.5 ± 0.7 87.6 ± 0.3 77.9 ± 0.3 85.9 ± 0.4 63.0 ± 0.8 58.3 ± 0.1 34.8 ± 1.5 53.3 ± 0.7 CL+Mixup * 90.8 ± 0.2 84.6 ± 0.4 74.8 ± 0.3 87.5 ± 1.3 64.4 ± 0.4 55.5 ± 0.1 30.3 ± 0.4 55.5 ± 0.8 CL+ELR * 91.3 ± 0.0 89.1 ± 0.1 77.7 ± 0.2 89.7 ± 0.3 64.7 ± 0.2 55.6 ± 0.2 35.9 ± 0.3 63.6 ± 0.1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.