Computational color constancy (CCC) consists of computing color estimates of the ambient scene illumination and using them to remove unwanted chromatic distortions. Much research has focused on illuminant estimation for CCC on single images, with few attempts of leveraging the temporal information intrinsic in sequences of correlated images (e.g., the frames in a video), a task known as temporal color constancy (TCC). The state-of-the-art for TCC is temporal color constancy network (TCCNet), a deep-learning architecture that uses a convolutional long short-term memory network for aggregating the encodings produced by convolutional neural network submodules for each image in a sequence. We extend this architecture with different models obtained by (i) substituting the TCCNet submodules with cascading convolutional color constancy (C4), the state-of-the-art method for CCC targeting images and (ii) adding a cascading strategy to perform an iterative improvement of the estimate of the illuminant. We tested our models on the recently released TCC benchmark and achieved results that surpass the state-of-the-art. Analyzing the impact of the number of frames involved in illuminant estimation on performance, we show that it is possible to reduce inference time by training the models on a few selected frames from the sequences while retaining comparable accuracy.