“…To deal with long time lags between relevant events, several sequence processing methods were proposed, including Focused BP based on decay factors for activations of units in RNNs (Mozer, 1989(Mozer, , 1992, Time-Delay Neural Networks (TDNNs) (Lang et al, 1990) and their adaptive extension (Bodenhausen and Waibel, 1991), Nonlinear AutoRegressive with eXogenous inputs (NARX) RNNs (Lin et al, 1996), certain hierarchical RNNs (Hihi and Bengio, 1996) (compare Sec. 5.10, 1991), RL economies in RNNs with WTA units and local learning rules (Schmidhuber, 1989b), and other methods (e.g., Ring, 1993Ring, , 1994Plate, 1993;de Vries and Principe, 1991;Sun et al, 1993a;Bengio et al, 1994).…”