Cross-entropy is the de-facto loss function in modern classification tasks that involve distinguishing hundreds or even thousands of classes. To design better loss functions for new machine learning tasks, it is critical to understand what makes a loss function suitable for a problem. For instance, what makes the cross entropy better than other alternatives such as quadratic loss? In this work, we discuss the role of loss functions in learning tasks with a large number of classes. We hypothesize that different loss functions can have large variability in the difficulty of optimization and that simplicity of training is a key catalyst for better test-time performance. Our intuition draws from the success of over-parameterization in deep learning: As a model has more parameters, it trains faster and achieves higher test accuracy. We argue that, effectively, cross-entropy loss results in a much more over-parameterized problem compared to the quadratic loss, thanks to its emphasis on the correct class (associated with the label). Such over-parameterization drastically simplifies the training process and ends up boosting the test performance. For separable mixture models, we provide a separation result where cross-entropy loss can always achieve small training loss, whereas quadratic loss has diminishing benefit as the number of classes and class correlations increase. Numerical experiments with CIFAR 100 corroborate our results. We show that the accuracy with quadratic loss disproportionately degrades with a growing number of classes; however, encouraging quadratic loss to focus on the correct class results in a drastically improved performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.