Computer vision researchers are actively studying the use of video in traffic monitoring. TrafficMonitor uses a stationary calibrated camera to automatically track and classify vehicles on roadways. In practical uses like autonomous vehicles, segmenting semantic video continues to be difficult due to high-performance standards, the high cost of convolutional neural networks (CNNs), and the significant need for low latency. An effective machine learning environment will be developed to meet the performance and latency challenges outlined above. The use of deep learning architectures like SegNet and Flownet2.0 on the CamVid dataset enables this environment to conduct pixel-wise semantic segmentation of video properties while maintaining low latency. In this work, we discuss some state-of-the-art ways to estimating the speed of vehicles, locating vehicles, and tracking objects. As a result, it is ideally suited for real-world applications since it takes advantage of both SegNet and Flownet topologies. The decision network determines whether an image frame should be processed by a segmentation network or an optical flow network based on the expected confidence score. In conjunction with adaptive scheduling of the key frame approach, this technique for decision-making can help to speed up the procedure. Using the ResNet50 SegNet model, a mean IoU of "54.27 per cent" and an average fps of "19.57" were observed. Aside from decision network and adaptive key frame sequencing, it was discovered that FlowNet2.0 increased the frames processed per second to "30.19" on GPU with such a mean IoU of "47.65%". Because the GPU was utilised "47.65%" of the time, this resulted. There has been an increase in the speed of the Video semantic segmentation network without sacrificing quality, as demonstrated by this improvement in performance.