There has been a growing interest in expressivity of deep neural networks. However, most of the existing work about this topic focuses only on the specific activation function such as ReLU or sigmoid. In this paper, we investigate the approximation ability of deep neural networks with a broad class of activation functions. This class of activation functions includes most of frequently used activation functions. We derive the required depth, width and sparsity of a deep neural network to approximate any Hölder smooth function upto a given approximation error for the large class of activation functions. Based on our approximation error analysis, we derive the minimax optimality of the deep neural network estimators with the general activation functions in both regression and classification problems. on the expressivity of deep neural networks, i.e., ability to approximate a rich class of functions efficiently. The well-known classical result on this topic is the universal approximation theorem, which states that every continuous function can be approximated arbitrarily well by a neural network [11,15,12,5,20]. But these results do not specify the required numbers of layers and nodes of a neural network to achieve a given approximation accuracy.Recently, several results about the effects of the numbers of layers and nodes of a deep neural network to its expressivity have been reported. They provide upper bounds of the numbers of layers and nodes required for neural networks to uniformly approximate all functions of interest. Examples of a class of functions include the space of rational functions of polynomials [30], the Hölder space [33,27,2,21], Besov and mixed Besov spaces [29] and even a class of discontinuous functions [25,16].The nonlinear activation function is a central part that makes neural networks differ from the linear models, that is, a neural network becomes a linear function if the linear activation function is used. Therefore, the choice of an activation function substantially influences on the performance and computational efficiency. Numerous activation functions have been suggested to improve neural network learning [3,6,4,26,18,32]. We refer to the papers [13,26] for an overview of this topic.There are also many recent theoretical studies about the approximation ability of deep neural networks. However, most of the studies focus on a specific type of the activation function such as ReLU [33,27,25,16,29], or small classes of activation functions such as sigmoidal functions with additional monotonicity, continuity, and/or boundedness conditions [24,9,8,10,7] and madmissible functions which are sufficiently smooth and bounded [2]. For definitions of sigmoidal and m-admissible functions, see [9] and [2], respectively. Thus a unified theoretical framework still lacks.In this paper, we investigate the approximation ability of deep neural networks with a quite general class of activation functions. We derive the required numbers of layers and nodes of a deep neural network to approximate any Hölder smo...
We derive the fast convergence rates of a deep neural network (DNN) classifier with the rectified linear unit (ReLU) activation function learned using the hinge loss. We consider three cases for a true model: (1) a smooth decision boundary, (2) smooth conditional class probability, and (3) the margin condition (i.e., the probability of inputs near the decision boundary is small). We show that the DNN classifier learned using the hinge loss achieves fast rate convergences for all three cases provided that the architecture (i.e., the number of layers, number of nodes and sparsity). is carefully selected. An important implication is that DNN architectures are very flexible for use in various cases without much modification. In addition, we consider a DNN classifier learned by minimizing the crossentropy, and show that the DNN classifier achieves a fast convergence rate under the condition that the conditional class probabilities of most data are sufficiently close to either 1 or zero. This assumption is not unusual for image recognition because human beings are extremely good at recognizing most images. To confirm our theoretical explanation, we present the results of a small numerical study conducted to compare the hinge loss and cross-entropy.
Recent theoretical studies proved that deep neural network (DNN) estimators obtained by minimizing empirical risk with a certain sparsity constraint can attain optimal convergence rates for regression and classification problems. However, the sparsity constraint requires knowing certain properties of the true model, which are not available in practice. Moreover, computation is difficult due to the discrete nature of the sparsity constraint. In this letter, we propose a novel penalized estimation method for sparse DNNs that resolves the problems existing in the sparsity constraint. We establish an oracle inequality for the excess risk of the proposed sparse-penalized DNN estimator and derive convergence rates for several learning tasks. In particular, we prove that the sparse-penalized estimator can adaptively attain minimax convergence rates for various nonparametric regression problems. For computation, we develop an efficient gradient-based optimization algorithm that guarantees the monotonic reduction of the objective function.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.