2019
DOI: 10.1016/j.neunet.2018.11.002
|View full text |Cite
|
Sign up to set email alerts
|

Kafnets: Kernel-based non-parametric activation functions for neural networks

Abstract: Neural networks are generally built by interleaving (adaptable) linear layers with (fixed) nonlinear activation functions. To increase their flexibility, several authors have proposed methods for adapting the activation functions themselves, endowing them with varying degrees of flexibility. None of these approaches, however, have gained wide acceptance in practice, and research in this topic remains open. In this paper, we introduce a novel family of flexible activation functions that are based on an inexpens… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
75
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 62 publications
(77 citation statements)
references
References 37 publications
2
75
0
Order By: Relevance
“…Also linear networks (i.e., when φ is the identity) are commonly studied, both for the proven impact on applications and the very interesting results that can be derived in closed-form 24,25,27,31,32 . Other activation functions have been proposed in the neural networks literature, including those that normalize the activation values on a closed hyper-surface 33 and those based on non-parametric estimation via composition of kernel functions 34 . The output y ∈ R N out is generated by the matrix W out , whose weights are learned: generally by ridge regression or lasso 3, 35 but also with online training mechanisms 36 .…”
Section: Echo State Networkmentioning
confidence: 99%
“…Also linear networks (i.e., when φ is the identity) are commonly studied, both for the proven impact on applications and the very interesting results that can be derived in closed-form 24,25,27,31,32 . Other activation functions have been proposed in the neural networks literature, including those that normalize the activation values on a closed hyper-surface 33 and those based on non-parametric estimation via composition of kernel functions 34 . The output y ∈ R N out is generated by the matrix W out , whose weights are learned: generally by ridge regression or lasso 3, 35 but also with online training mechanisms 36 .…”
Section: Echo State Networkmentioning
confidence: 99%
“…As stated in the introduction, as a further contribution we also decided to compare the standard NNs to the Kafnets [26], a class of neural networks using KAFs as activation functions. Briefly, using this architecture each activation function is allowed to change shape during training using a small number of adaptable coefficients, and the aim of this set of comparisons is to explore whether this additional number of degrees of freedom is beneficial or not in terms of CL.…”
Section: Architectures and Methodsmentioning
confidence: 99%
“…Each Kafnet used has the same architecture of the NN counterpart but the size of each layer/kernel is reduced by 30%, in order to have roughly the same number of adaptable parameters for both architectures. The hyper-parameters for the KAFs use the same values from [26].…”
Section: Architectures and Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…None of the known approaches possess all these properties simultaneously. For example, property p1 is non satisfied for all the approaches discussed in Section 2.1 and 2.3, the approaches discusses in Section 2.2 either do no satisfy property p1 as in [Agostinelli et al, 2014] or property p3 as in [Scardapane et al, 2018].…”
Section: Summarizingmentioning
confidence: 99%