Recently, a parameterized class of loss functions called α-loss, α ∈ [1, ∞], has been introduced for classification. This family, which includes the log-loss and the 0-1 loss as special cases, comes with compelling properties including an equivalent margin-based form which is classification-calibrated for all α. We introduce a generalization of this family to the entire range of α ∈ (0, ∞] and establish how the parameter α enables the practitioner to choose among a host of operating conditions that are important in modern machine learning tasks. We prove that smaller α values are more conducive to faster optimization; in fact, α-loss is convex for α ≤ 1 and quasi-convex for α > 1. Moreover, we establish bounds to quantify the degradation of the local-quasi-convexity of the optimization landscape as α increases; we show that this directly translates to a computational slow down. On the other hand, our theoretical results also suggest that larger α values lead to better generalization performance. This is a consequence of the ability of the α-loss to limit the effect of less likely data as α increases from 1, thereby facilitating robustness to outliers and noise in the training data. We provide strong evidence supporting this assertion with several experiments on benchmark datasets that establish the efficacy of α-loss for α > 1 in robustness to errors in the training data. Of equal interest is the fact that, for α < 1, our experiments show that the decreased robustness seems to counteract class imbalances in training data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.