Traffic congestion detection plays an important role for road management. However, it is difficult to automatically report traffic congestion when it occurs in large-scale road network. One of key challenges for rapidly and precisely identifying early congestion is huge variations in appearance caused by illumination, weather, camera settings and other traffic conditions. To address it, we proposed a trafficoriented model to classify congestion from large dataset of ultra-low frame rate video captured from traffic surveillance system. The proposed deeply supervised traffic congestion detector has two modules: attention proposal module and deeply supervised inception network. Specifically, within the shallow layers, the binary edge/corner density features are used in attention proposal module to generate the rang of interest (ROI) mask automatically. This strategy keeps the training process focusing on the congestion features without disturbances. Following the attention proposal module, a very deep structure based on the inception network was used together to effectively extract rich and discriminative features then detect traffic congestion. The approach was tested on a self-established dataset based on empirical data, which contains images captured from 14470 surveillance cameras for monitoring 5,215 km of freeway in Shaanxi province, China. The experimental results show that the accuracy of the proposed method could reach 95.77% considering various disturbances, conditions and other limitations, which is improved than unsupervised networks.