“…We study the effect of the initialization hyperparameters on signal propagation for a very broad class of recurrent architectures, which includes as special cases many state-of-the-art RNN cells, including the GRU (Cho et al, 2014), the LSTM (Hochreiter and Schmidhuber, 1997), and the peephole LSTM (Gers et al, 2002). The analysis is based on the mean field theory of signal propagation developed in a line of prior work (Schoenholz et al, 2016;Xiao et al, 2018;Chen et al, 2018;Yang et al, 2019), as well as the concept of dynamical isometry (Saxe et al, 2013;Pennington et al, 2017; that is necessary for stable gradient backpropagation and which was shown to be crucial for training simpler RNN architectures (Chen et al, 2018). We perform a number of experiments to corroborate the results of the calculations and use them to motivate initialization schemes that outperform standard initialization approaches on a number of long sequence tasks.…”