An objective stereo video quality assessment (SVQA) strives to be consistent with human visual perception while ensuring a low time and labor cost of evaluation. The temporal–spatial characteristics of video make the data processing volume of quality evaluation surge, making an SVQA more challenging. Aiming at the effect of distortion on the stereoscopic temporal domain, a stereo video quality assessment method based on the temporal–spatial relation is proposed in this paper. Specifically, a temporal adaptive model (TAM) for a video is established to describe the space–time domain of the video from both local and global levels. This model can be easily embedded into any 2D CNN backbone network. Compared with the improved model based on 3D CNN, this model has obvious advantages in operating efficiency. Experimental results on NAMA3DS1-COSPAD1 database, WaterlooIVC 3D Video Phase I database, QI-SVQA database and SIAT depth quality database show that the model has excellent performance.