Recurrent neural networks (RNNs) are known to be difficult to train due to the gradient vanishing and exploding problems and thus difficult to learn long-term patterns and construct deep networks. To address these problems, this paper proposes a new type of RNNs with the recurrent connection formulated as Hadamard product, referred to as independently recurrent neural network (IndRNN), where neurons in the same layer are independent of each other and connected across layers. The gradient vanishing and exploding problems are solved in IndRNN, and thus long-term dependencies can be learned. Moreover, an IndRNN can work with non-saturated activation functions such as ReLU (rectified linear unit) and be still trained robustly. Different deeper IndRNN architectures, including the basic stacked IndRNN, residual IndRNN and densely connected IndRNN, have been investigated, all of which can be much deeper than the existing RNNs. Furthermore, IndRNN reduces the computation at each time step and can be over 10 times faster than the commonly used Long short-term memory (LSTM). Experimental results have shown that the proposed IndRNN is able to process very long sequences and construct very deep networks. Better performances have been achieved on various tasks with IndRNNs compared with the traditional RNN and LSTM.unit (GRU) [15] have been proposed to address the gradient problems. However, the use of the hyperbolic tangent and the sigmoid functions as the activation function in these variants results in gradient decay over layers. Consequently, construction and training of a deep LSTM or GRU based RNN network is practically difficult.On the other hand, the existing RNN models share the same component σ(Wx t + Uh t−1 + b) in (1), where the recurrent connection connects all the neurons through time. This makes it hard to interpret and understand the roles of each individual neuron (e.g., what patterns each neuron responds to) without considering the others. Moreover, with the recurrent connections, matrix product is performed at each time step and the computation cannot be easily paralleled, leading to a very time-consuming process when dealing with long sequences.In this paper, we propose a new type of RNN, referred to as independently recurrent neural network (IndRNN). In the proposed IndRNN, the recurrent inputs are processed with the Hadamard product as h t = σ(Wx t +u h t−1 +b). This provides a number of advantages over the traditional RNNs including:• Able to process longer sequences: the gradient vanishing and exploding problem is solved by regulating the recurrent weights, and long-term memory can be kept in order to process long sequences. Experiments have demonstrated that an IndRNN can well process sequences over 5000 steps. • Able to construct deeper networks: multiple layers of IndRNNs can be efficiently stacked, especially with skip-connection and dense connection, to increase the depth of the network. An example of 21-layer residual IndRNN and deep densely connected In-dRNN are demonstrated in the experiments. ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.