Real-time face recognition has been of great interest in the last decade due to its wide and variant critical applications which include biometrics, security in public places, and identification in login systems. This has encouraged researchers to design fast and accurate embedded and portable systems that are capable to detect and recognize a large number of faces at almost video frame rate. Due to the increasing volume of reference faces, traditional general purpose computing engines such as the ones based on Intel's Pentium processors have shown not to be adequate and various dedicated hardware accelerators based on either Graphical Processing Unit (GPU), Field Programmable Gate Arrays (FPGA), Application Specific Integrated Circuit (ASIC), or even multi-core Central Processing Units (CPU) have emerged. Earlier published review papers on face detection/recognition have discussed face detection and face recognition algorithms enhancement that improve the detection accuracy. Nevertheless, none of them has reviewed the hardware accelerators used for this application. Accordingly, this paper aims to provide a comprehensive review of the most recent face recognition algorithms and associated embedded hardware systems targeting real-time performance. A detailed comparison between neural network and non-neural network-based algorithms in terms of accuracy and processing time is provided. Discussions on their suitability to be implemented into parallel hardware architectures such as Single Instruction Multiple Thread (SIMT) or Single Instruction Multiple Data (SIMD) is also discussed.