Population growth in large cities has contributed to the increase in vehicles' number, leading to the traffic congestion problem. Incompetent traffic supervision could squander an inconsiderable number of man-hours and might lead to fatal consequences. Therefore, intelligent traffic surveillance systems have to carry more significant roles in highway monitoring and traffic management system throughout the years. Although vehicle detection and classification methods have evolved rapidly throughout the years, they still lack high-level reasoning. Accurate and precise vehicle recognition and classification are still insufficient to develop an intelligent and reliable traffic system. There is a demand to increase the confidence in image understanding and effectively extract the images conformed to human perception and without human interference. This paper attempts to summarize a review on several methods that semantically extract and analyze traffic density with image processing techniques. Three (3) methods that have been selected to be discussed in this paper are semantic analysis of traffic video using image understanding, mining semantic context details of traffic scene, and integrating vision and language in semantic description of traffic events from image sequences. Each method is discussed thoroughly, and their outstanding issue is deliberated in this paper.