Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks

Jagtap, Ameya D.; Kawaguchi, Kenji; Karniadakis, George Em

doi:10.1098/rspa.2020.0334

Cited by 207 publications

(92 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The last activation function is the adaptive swish function where a is an additional parameter and is optimized in the training process [14,15]. Overall, the DNN approximation of PDE solution can be written as…”

Section: Network Structurementioning

confidence: 99%

A Comparison Study of Deep Galerkin Method and Deep Ritz Method for Elliptic Problems with Different Boundary Conditions

Chen¹

2020

CMR

View full text Add to dashboard Cite

Recent years have witnessed growing interests in solving partial differential equations by deep neural networks, especially in the high-dimensional case. Unlike classical numerical methods, such as finite difference method and finite element method, the enforcement of boundary conditions in deep neural networks is highly nontrivial. One general strategy is to use the penalty method. In the work, we conduct a comparison study for elliptic problems with four different boundary conditions, i.e., Dirichlet, Neumann, Robin, and periodic boundary conditions, using two representative methods: deep Galerkin method and deep Ritz method. In the former, the PDE residual is minimized in the least-squares sense while the corresponding variational problem is minimized in the latter. Therefore, it is reasonably expected that deep Galerkin method works better for smooth solutions while deep Ritz method works better for low-regularity solutions. However, by a number of examples, we observe that deep Ritz method can outperform deep Galerkin method with a clear dependence of dimensionality even for smooth solutions and deep Galerkin method can also outperform deep Ritz method for low-regularity solutions. Besides, in some cases, when the boundary condition can be implemented in an exact manner, we find that such a strategy not only provides a better approximate solution but also facilitates the training process.

show abstract

Section: Network Structurementioning

confidence: 99%

A Comparison Study of Deep Galerkin Method and Deep Ritz Method for Elliptic Problems with Different Boundary Conditions

Chen¹

2020

CMR

View full text Add to dashboard Cite

show abstract

“…Layer-wise introduction of the additional parameters a k changes the slope of activation function in each hidden-layer, thereby increasing the training speed. Moreover, these activation slopes can also contribute to the loss function through the slope recovery term, see [21,22] for more details. Such locally adaptive activation functions enhance the learning capacity of the network, especially during the early training period.…”

Section: Mathematical Setup For Fully Connected Neural Networkmentioning

confidence: 99%

“…The gradient dynamics of the adaptive activation modifies the standard dynamics (fixed activation) by multiplying a conditioning matrix by the gradient and by adding the approximate second-order term. In this paper, we used scaling factor n=5 for all hidden-layers and initialize na k =1, ∀k, see [22] for details.…”

Section: Mathematical Setup For Fully Connected Neural Networkmentioning

confidence: 99%

Extended Physics-Informed Neural Networks (XPINNs): A Generalized Space-Time Domain Decomposition Based Deep Learning Framework for Nonlinear Partial Differential Equations

Karniadakis¹

2020

CICP

426

View full text Add to dashboard Cite

“…Other examples include networks with an adaptive polynomial activation function [ 33 ], slope varying activation function [ 34 ] or back-propagation modification resulting in AAF [ 34 , 35 ]. In parallel with our work, a neuron-wise and layer-wise adaptive activation function for physics-informed neural networks was presented in [ 36 ].…”

Section: Introductionmentioning

confidence: 99%

On transformative adaptive activation functions in neural networks for gene expression inference

2021

View full text Add to dashboard Cite

Gene expression profiling was made more cost-effective by the NIH LINCS program that profiles only ∼1, 000 selected landmark genes and uses them to reconstruct the whole profile. The D–GEX method employs neural networks to infer the entire profile. However, the original D–GEX can be significantly improved. We propose a novel transformative adaptive activation function that improves the gene expression inference even further and which generalizes several existing adaptive activation functions. Our improved neural network achieves an average mean absolute error of 0.1340, which is a significant improvement over our reimplementation of the original D–GEX, which achieves an average mean absolute error of 0.1637. The proposed transformative adaptive function enables a significantly more accurate reconstruction of the full gene expression profiles with only a small increase in the complexity of the model and its training procedure compared to other methods.

show abstract

Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks

Cited by 207 publications

References 15 publications

A Comparison Study of Deep Galerkin Method and Deep Ritz Method for Elliptic Problems with Different Boundary Conditions

A Comparison Study of Deep Galerkin Method and Deep Ritz Method for Elliptic Problems with Different Boundary Conditions

Extended Physics-Informed Neural Networks (XPINNs): A Generalized Space-Time Domain Decomposition Based Deep Learning Framework for Nonlinear Partial Differential Equations

On transformative adaptive activation functions in neural networks for gene expression inference

Contact Info

Product

Resources

About