“…The canonical ME model, envisaged by Jacobs et al [14] and later developed upon by Jordan and Jacobs [15], employs a single-layer perceptron with soft-max activation function as gating module and expert modules with linear activation functions. Besides such canonical model, two other ME variants have been proposed and investigated more recently in the literature, namely the Localized Mixtures of Experts (LMEs), formulated by Xu et al [38] and later more scrutinized by Ramamurti and Ghosh [24], and the Gated Mixtures of Experts (GMEs), devised by Weigend et al [37] and later extended by Srivastava et al [31].…”