All around the world, the crime rate has been increasing day by day, causing a rise in security issues. Closed-Circuit Television (CCTV) cameras have been installed throughout the world with the aim of decreasing crime and increasing public safety. The usage of CCTV cameras helps to increase crime detection accuracy significantly. Daily, a considerable amount of data has been recorded through CCTV cameras. Detection and recognition of culprits in the recorded data is a challenging task as it takes a lot of time, and human interaction is also involved. So, there is a need to develop a system that performs real-time detection and tracking of humans. This paper proposes a human detection and tracking system based on deep learning that assigns a unique ID to humans who enter the video scene. Multi-Task Cascaded Convolutional Neural Networks (MTCNN) and FaceNet models are used to achieve the desired target. The MTCNN model is trained on the WIDER SPACE dataset to perform human detection. FaceNet is used for human identification that is trained on the LFW dataset. The proposed system has been evaluated on 50 video sequences captured in different environments and achieved 97% average accuracy.