Traffic congestion detection method based on surveillance video is gradually widely used in intelligent transportation systems (ITS). Due to complex challenges such as weather change, vehicle occlusion, camera jitter, camera installation location, and so on, current methods are difficult to balance in real time and accuracy. Here, a new real‐time and robust traffic congestion detection framework and a vision‐based multi‐dimensional congestion detection model are proposed. Firstly, the framework introduces an object detector based on the lightweight convolutional neural network (CNN) and an efficient multi‐object IoU‐like tracker to obtain traffic dynamic information in real time. Then, traffic density, traffic velocity, and duration of instantaneous congestion are defined and a multi‐dimensional congestion detection model is established. Furthermore, an adaptive updating strategy of dynamic parameters is investigated. Finally, in multiple groups of comparative experiments, the framework is verified to be applicable to a variety of lightweight CNN detectors and IoU‐like trackers. The precision and recall of the multi‐dimensional congestion model can reach 95.1% and 92.1% respectively, with 43FPS. The comparative experimental results show that the proposed method is real‐time, more robust and accurate, and can be employed for online traffic congestion detection based on surveillance video.