Traffic congestion detection systems help manage traffic in crowded cities by analyzing videos of vehicles. Existing systems largely depend on texture and motion features. Such systems face several challenges, including illumination changes caused by variations in weather conditions, complexity of scenes, vehicle occlusion, and the ambiguity of stopped vehicles. To overcome these issues, this paper proposes a rapid and reliable traffic congestion detection method based on the modeling of video dynamics using deep residual learning and motion trajectories. The proposed method efficiently uses both motion and deep texture features to overcome the limitations of existing methods. Unlike other methods that simply extract texture features from a single frame, we use an efficient representation learning method to capture the latent structures in traffic videos by modeling the evolution of texture features. This representation yields a noticeable improvement in detection results under various weather conditions. Regarding motion features, we propose an algorithm to distinguish stopped vehicles and background objects, whereas most existing motion-based approaches fail to address this issue. Both types of obtained features are used to construct an ensemble classification model based on the support vector machine algorithm. Two benchmark datasets are considered to demonstrate the robustness of the proposed method: the UCSD dataset and NU1 video dataset. The proposed method achieves competitive results (97.64% accuracy) when compared to state-ofthe-art methods.