Recent studies in the field of video super‐resolution (VSR) have taken great advantage of neural networks and achieved remarkable performance. However, current neural network‐based VSR methods have the following two limitations: (1) sophisticated models usually have high computational overhead and require large caches to store intermediate feature maps and parameters. (2) Bidirectional RNN‐based methods cannot perform VSR until a complete video sequence is available because of the back‐propagation of hidden states. The above two shortcomings make the existing VSR models hard to achieve real‐time VSR. To reduce the computational complexity of the network, the authors propose a lightweight hardware‐efficient recurrent neural network model, LERN, which is combined with depthwise separable convolution and pixel unshuffle. An SRA block is also introduced to help the model get better VSR performance. In order to wisely exploit video inter‐frame information without causing the back propagation of hidden states, the authors introduce a new recurrent architecture that is optimized for video sequences. Besides, the authors design a quantization algorithm to find the optimal fixed‐point representations for each layer. Experimental results demonstrate that LERN can achieve excellent performance with low overhead.