1999
DOI: 10.1109/72.737501
|View full text |Cite
|
Sign up to set email alerts
|

Structurally adaptive modular networks for nonstationary environments

Abstract: This paper introduces a neural network capable of dynamically adapting its architecture to realize time variant nonlinear input-output maps. This network has its roots in the mixture of experts framework but uses a localized model for the gating network. Modules or experts are grown or pruned depending on the complexity of the modeling problem. The structural adaptation procedure addresses the model selection problem and typically leads to much better parameter estimation. Batch mode learning equations are ext… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2002
2002
2009
2009

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 42 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…The divide and conquer approach includes many other specific methods such as local linear regression, CART/MARS, adaptive subspace models, etc (Johansen and Foss 1992;Ramamurti and Ghosh 1999;Holmstrom et al 1997).…”
Section: Divide and Conquermentioning
confidence: 99%
“…The divide and conquer approach includes many other specific methods such as local linear regression, CART/MARS, adaptive subspace models, etc (Johansen and Foss 1992;Ramamurti and Ghosh 1999;Holmstrom et al 1997).…”
Section: Divide and Conquermentioning
confidence: 99%
“…The division of the input space is done not via soft hyperplanes as before, but via soft hyper-ellipsoids in such a manner as to generate more localized regions of specialization to be assigned to the (linear or nonlinear) experts [24]. Each of the outputs of the gating module is now calculated as…”
Section: The Mixture Of Experts Framework and Its Variantsmentioning
confidence: 99%
“…The canonical ME model, envisaged by Jacobs et al [14] and later developed upon by Jordan and Jacobs [15], employs a single-layer perceptron with soft-max activation function as gating module and expert modules with linear activation functions. Besides such canonical model, two other ME variants have been proposed and investigated more recently in the literature, namely the Localized Mixtures of Experts (LMEs), formulated by Xu et al [38] and later more scrutinized by Ramamurti and Ghosh [24], and the Gated Mixtures of Experts (GMEs), devised by Weigend et al [37] and later extended by Srivastava et al [31].…”
Section: Introductionmentioning
confidence: 99%
“…and L is a large number [50,42,38]. Note that this exponential decay factor of (1 − 1/L) ensures that c t+1 converges from below to L. Thus, after the "cold start" period is over, the history maintained in the computation has an effective length L. The choice of L depends on the degree of non-stationarity, and a fundamental tradeoff between resolution and memory depth is encountered [40].…”
Section: Case Study: Balanced Clustering Of Directional Datamentioning
confidence: 99%