In the realm of surveillance, closed-circuit television (CCTV) cameras serve as a vigilant watch over unfamiliar entities. However, the unpredictability of such entities necessitates continuous human monitoring, an endeavor prone to error and demanding of significant resources. The automation of this process through face recognition could alleviate these burdens, provided the system delivers high precision and rapid judgment capabilities. This study presents a novel solution to these challenges: an automated human recognition and verification surveillance system, founded on a max-voting ensemble method. This innovative approach amalgamates five influential feature extraction models: VGGFace, FaceNet, FaceNet-512, Dlib, and Arcface, with a support vector machine deployed for classification. The proposed system was subjected to rigorous testing on the AT&T, faces94, Grimace, Georgia Tech, and FaceScrub datasets, demonstrating an impressive accuracy of 100% on the AT&T, faces94, and Grimace datasets, and 99.3% and 98% on the Georgia Tech and FaceScrub datasets, respectively. The system's performance was further enhanced through a re-verification technique, which facilitated swift and precise prediction of unknown entities in real time. This study thus contributes a significant advancement to the field of automated surveillance, offering a potent tool for efficient, accurate human recognition.