Fire accidents in residential, commercial, and industrial environments are a major concern since they cause considerable infrastructure and human life damage. On other hand, the risk of fires is growing in conjunction with the growth of urban buildings. The existing techniques for detecting fire through smoke sensors are difficult in large regions. Furthermore, during fire accidents, the visibility of the evacuation path is occupied with smoke and, thus, causes challenges for people evacuating individuals from the building. To overcome this challenge, we have recommended a vision-based fire detection system. A vision-based fire detection system is implemented to identify fire events as well as to count the number people inside the building. In this study, deep neural network (DNN) models, i.e., MobileNet SSD and ResNet101, are embedded in the vision node along with the Kinect sensor in order to detect fire accidents and further count the number of people inside the building. A web application is developed and integrated with the vision node through a local server for visualizing the real-time events in the building related to the fire and people counting. Finally, a real-time experiment is performed to check the accuracy of the proposed system for smoke detection and people density.