Smooth Function Approximation by Deep Neural Networks with General Activation Functions

Ohn, Ilsang; Kim, Yongdai

doi:10.3390/e21070627

Cited by 67 publications

(52 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, Theorem 4.1 in Ohn and Kim (2019), which is stated for the reader's convenience in Appendix B, provides support for our observation, presented in Section 5, that sparsening the network (i.e., splitting) increases the approximation error. Hence, in what follows, we also consider the so called soft constraints approach using a fully connected network, where the static no arbitrage conditions (2) are favored by penalization, as opposed to imposed to hold exactly in the previous hard constraint approach.…”

Section: Soft Constraints Approachsupporting

confidence: 70%

“…Let the function being approximated, p ∈ H α,R ([0, 1] i ) be Hölder smooth with parameters α > 0 and R > 0, where H α,R (Ω) := {p ∈ H α (Ω) : ||p|| H α (Ω) ≤ R}. Then Theorem 4.1 in Ohn and Kim (2019) states the existence of positive constants L 0 , N 0 , Σ 0 , B 0 depending only on i, α, R and ς s.t. for any > 0, the neural network Figure A1 shows the upper bound Σ on the network sparsity, |θ| 0 ≤ Σ, as a function of the error tolerance and Hölder smoothness, α, of the function being approximated.…”

Section: Discussionmentioning

confidence: 99%

“…We recall a result from Ohn and Kim (2019) which describes how the sparsity in a neural network affects its approximation error.…”

Section: Appendix a Change Of Variables In The Dupire Equationmentioning

confidence: 99%

See 2 more Smart Citations

Deep Local Volatility

Chataigner

Crépey

Dixon

2020

Risks

View full text Add to dashboard Cite

Deep learning for option pricing has emerged as a novel methodology for fast computations with applications in calibration and computation of Greeks. However, many of these approaches do not enforce any no-arbitrage conditions, and the subsequent local volatility surface is never considered. In this article, we develop a deep learning approach for interpolation of European vanilla option prices which jointly yields the full surface of local volatilities. We demonstrate the modification of the loss function or the feed forward network architecture to enforce (hard constraints approach) or favor (soft constraints approach) the no-arbitrage conditions and we specify the experimental design parameters that are needed for adequate performance. A novel component is the use of the Dupire formula to enforce bounds on the local volatility associated with option prices, during the network fitting. Our methodology is benchmarked numerically on real datasets of DAX vanilla options.

show abstract

Section: Soft Constraints Approachsupporting

confidence: 70%

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Deep Local Volatility

Chataigner

Crépey

Dixon

2020

Risks

View full text Add to dashboard Cite

show abstract

“…The previous works have demonstrated that the locally nonlinear activation functions consistently outperformed multilinear activation functions across the different network architectures on visual recognition tasks. This is proven by a recent theoretical study of Ohn and Kim (2019) that the locally non-linear region can promote better expressivity and non-linear approximation capability.…”

Section: Locally Non-linearmentioning

confidence: 93%

Parametric Flatten-T Swish: An Adaptive Nonlinear Activation Function for Deep Learning

Chieng

Wahid

Ong

2020

Journal of Information and Communication Technology

View full text Add to dashboard Cite

QActivation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN-5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.

show abstract

“…Actually, the Rectified Linear Units (ReLU) activation function is the most popular choice in practical use of the neural network [12]. In this reason, most of the recent results on the universal approximation theory is about the ReLU network [5,[13][14][15][16][17][18][19][20]. Cohen et al [13] provided the deep convolutional neural network with the ReLU activation function that cannot be realized by a shallow network if the number of nodes of its hidden layer is no more than an exponential bound.…”

Section: Introductionmentioning

confidence: 99%

ReLU Network with Bounded Width Is a Universal Approximator in View of an Approximate Identity

Moon

2021

Applied Sciences

View full text Add to dashboard Cite

Deep neural networks have shown very successful performance in a wide range of tasks, but a theory of why they work so well is in the early stage. Recently, the expressive power of neural networks, important for understanding deep learning, has received considerable attention. Classic results, provided by Cybenko, Barron, etc., state that a network with a single hidden layer and suitable activation functions is a universal approximator. A few years ago, one started to study how width affects the expressiveness of neural networks, i.e., a universal approximation theorem for a deep neural network with a Rectified Linear Unit (ReLU) activation function and bounded width. Here, we show how any continuous function on a compact set of Rnin,nin∈N can be approximated by a ReLU network having hidden layers with at most nin+5 nodes in view of an approximate identity.

show abstract

Smooth Function Approximation by Deep Neural Networks with General Activation Functions

Cited by 67 publications

References 30 publications

Deep Local Volatility

Deep Local Volatility

Parametric Flatten-T Swish: An Adaptive Nonlinear Activation Function for Deep Learning

ReLU Network with Bounded Width Is a Universal Approximator in View of an Approximate Identity

Contact Info

Product

Resources

About