Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator

Qin, Zhenyue; Kim, Dongwoo

doi:10.48550/arxiv.1911.10688

Cited by 25 publications

(22 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…First, we compare the generalization performance of the proposed method against baselines by training classifiers on CIFAR-100 (Krizhevsky et al, 2009), Tiny-ImageNet (Chrabaszcz et al, 2017), ImageNet (Deng et al, 2009), and the Google commands speech dataset (Warden, 2017). Next, we test the localization performance of classifiers following the evaluation protocol of Qin and Kim (2019). We also measure calibration error (Guo et al, 2017) of classifiers to verify Co-Mixup successfully alleviates the over-confidence issue by Zhang et al (2018).…”

Section: Methodsmentioning

confidence: 99%

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Kim,

Choo,

Jeong

et al. 2021

Preprint

View full text Add to dashboard Cite

While deep neural networks show great performance on fitting to the training distribution, improving the networks' generalization performance to the test distribution and robustness to the sensitivity to input perturbations still remain as a challenge. Although a number of mixup based augmentation strategies have been proposed to partially address them, it remains unclear as to how to best utilize the supervisory signal within each input data for mixup from the optimization perspective. We propose a new perspective on batch mixup and formulate the optimal construction of a batch of mixup data maximizing the data saliency measure of each individual mixup data and encouraging the supermodular diversity among the constructed mixup data. This leads to a novel discrete optimization problem minimizing the difference between submodular functions. We also propose an efficient modular approximation based iterative submodular minimization algorithm for efficient mixup computation per each minibatch suitable for minibatch based neural network training. Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results compared to other mixup methods. The source code is available at https://github.com/snu-mllab/Co-Mixup.

show abstract

Section: Methodsmentioning

confidence: 99%

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Kim,

Choo,

Jeong

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Setting the bounding box The output of g is map M in range 0 to 1, obtained by the sigmoid activation function. In order to derive a bounding box from this map, we follow the method of [6,7,29]. First, a threshold τ is calculated as…”

Section: Methods II (Siamese Network)mentioning

confidence: 99%

“…Many algorithms were proposed for the task of WSOL. The Class Activation Map (CAM) explainability method [51] and its variants [29] identify the salient pixels that lead to the classification. A multi-task loss function proposed by [24] takes shape into consideration.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning a Weight Map for Weakly-Supervised Localization

Shaharabany¹,

Wolf²

2021

Preprint

View full text Add to dashboard Cite

In the weakly supervised localization setting, supervision is given as an image-level label. We propose to employ an image classifier f and to train a generative network g that outputs, given the input image, a per-pixel weight map that indicates the location of the object within the image. Network g is trained by minimizing the discrepancy between the output of the classifier f on the original image and its output given the same image weighted by the output of g. The scheme requires a regularization term that ensures that g does not provide a uniform weight, and an early stopping criterion in order to prevent g from over-segmenting the image. Our results indicate that the method outperforms existing localization methods by a sizable margin on the challenging fine-grained classification datasets, as well as a generic image recognition dataset. Additionally, the obtained weight map is also state-of-the-art in weakly supervised segmentation in fine-grained categorization datasets.

show abstract

“…where Y are labels of target examples and Z = g(X) are features of them extracted by the pre-trained feature extractor g. Based on the theory in [20], Proposition 1 shows that TransRate provides an upper bound to the log-likelihood of the model h * • g. Detailed proofs can be found in Appendix C.…”

Section: Computation-efficient Transferability Estimationmentioning

confidence: 99%

Frustratingly Easy Transferability Estimation

Huang¹,

Wei²,

Rong³

et al. 2021

Preprint

View full text Add to dashboard Cite

Transferability estimation has been an essential tool in selecting a pre-trained model and the layers of it to transfer, so as to maximize the performance on a target task and prevent negative transfer. Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. We propose a simple, efficient, and effective transferability measure named TransRate. With single pass through the target data, TransRate measures the transferability as the mutual information between the features of target examples extracted by a pre-trained model and labels of them. We overcome the challenge of efficient mutual information estimation by resorting to coding rate that serves as an effective alternative to entropy. TransRate is theoretically analyzed to be closely related to the performance after transfer learning. Despite its extraordinary simplicity in 10 lines of codes, TransRate performs remarkably well in extensive evaluations on 22 pre-trained models and 16 downstream tasks.

show abstract

Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator

Cited by 25 publications

References 11 publications

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Learning a Weight Map for Weakly-Supervised Localization

Frustratingly Easy Transferability Estimation

Contact Info

Product

Resources

About