“…LSTM is a widely applicable kind of RNN which contains feedback connections for both single data points and entire data sequences in deep learning [ 50 ]. The optimization task regarding accurate future image prediction has been a highlighted problem in artificial intelligence in recent several years [ 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 ]. Kalchbrenner et al have developed a video pixel network to predict the joint distribution of future image in pixel videos [ 60 ].…”