2017
DOI: 10.48550/arxiv.1712.09936
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient Regularization Improves Accuracy of Discriminative Models

Abstract: Regularizing the gradient norm of the output of a neural network with respect to its inputs is a powerful technique, rediscovered several times. This paper presents evidence that gradient regularization can consistently improve classification accuracy on vision tasks, using modern deep neural networks, especially when the amount of training data is small. We introduce our regularizers as members of a broader class of Jacobian-based regularizers. We demonstrate empirically on real and synthetic data that the le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(21 citation statements)
references
References 9 publications
0
21
0
Order By: Relevance
“…When considering such an approximate algorithm, one naively must trade off efficiency against accuracy for computing the Jacobian, which ultimately trades computation time for robustness. Prior work by Varga et al [2017] briefly considers an approach based on random projection, but without providing any analysis on the quality of the Jacobian approximation. Here, we describe our algorithm, analyze theoretical convergence guarantees, and verify empirically that there is only a negligible difference in model solution quality between training with the exact computation of the Jacobian as compared to training with the approximate algorithm, even when using a single random projection (see Figure 2).…”
Section: Efficient Approximate Algorithmmentioning
confidence: 99%
“…When considering such an approximate algorithm, one naively must trade off efficiency against accuracy for computing the Jacobian, which ultimately trades computation time for robustness. Prior work by Varga et al [2017] briefly considers an approach based on random projection, but without providing any analysis on the quality of the Jacobian approximation. Here, we describe our algorithm, analyze theoretical convergence guarantees, and verify empirically that there is only a negligible difference in model solution quality between training with the exact computation of the Jacobian as compared to training with the approximate algorithm, even when using a single random projection (see Figure 2).…”
Section: Efficient Approximate Algorithmmentioning
confidence: 99%
“…Finally our work is in line with recent research (Liu et al, 2019a;Santurkar et al, 2018) that emphasizes the benefit of analyzing gradients to understand neural networks and devise potential improvements to their training. We share elements with Drucker & Le Cun (1991) and more recently Varga et al (2017) in that we propose explicit regularization methods for gradients.…”
Section: Related Workmentioning
confidence: 99%
“…We note that the above penalty can also be thought of as a network regularization. Similar gradient penalties are used in machine learning to improve generalization ability and to improve the robustness to adversarial attacks [36]. The use of gradient penalty is observed to be qualitatively equivalent to penalizing the norm of the weights of the network.…”
Section: B Distance/network Regularizationmentioning
confidence: 95%