Nowadays, traffic monitoring systems are at the frontline of smart city movement, and traffic density estimation is useful to a traffic monitoring system. The system of this work estimates traffic density using a five-layered CNN with a variety of input feature maps and filter sizes. There are 64, 64, 96, 96, and 96 feature maps for each pair of convolutions and max-pooling layers, and each pair's corresponding filter sizes are 5*5, 3*3, 5*5, 3*3, and 3*3. The proposed system divides the traffic into three categories: High, Medium, and Low traffic based on images that are taken from traffic videos that are recorded by traffic surveillance cameras. To test the system, we used the WSDT (Washington State Department of Traffic Transportation) Dataset of recorded video footage from the highway CCTVs in cities Seattle and Washington. The model is evaluated using parameters such as 0.99 precision, 0.99 recall, 0.99 f1-score, and model accuracy of 99.6 %. By examining the dataset, we have trained the model in such a manner to produce better results.