Face recognition (FR) technology has gained widespread popularity due to its diverse utility and broad range of applications. It is extensively used in various domains, including information security, access control, and surveillance. Achieving better real-time face detection (FD) performance can be challenging, especially when running multiple algorithms that require both high accuracy and swift execution (high frame rate) into embedded System on Chips (SoC). In this study, a comprehensive methodology and system implementation are proposed for concurrent face detection, landmark extraction, quality assessment, and face recognition directly at the edge, without relying on external resources. The approach integrates cutting-edge techniques, including the utilization of the Extended YOLO model for face detection and the ArcFace model for feature extraction, optimized for deployment on embedded devices. By leveraging these models alongside a dedicated recognition database and efficient software architecture, the system achieves remarkable accuracy and real-time processing capabilities. Critical aspects of the methodology involve tailoring model optimization for SoC environments, specifically focusing on the YOLO face detection model and the ArcFace feature extraction model. These optimizations aim to enhance computational efficiency while preserving accuracy. Furthermore, efficient software architecture plays a crucial role, allowing for the seamless integration of multiple components on embedded devices. Optimization techniques are employed to minimize overhead and maximize performance, ensuring real-time processing capabilities. By offering a detailed framework and implementation strategy, this research contributes significantly to the development of a high-performance, highly accurate real-time face recognition system optimized for embedded devices.