Human pose estimation (HPE) has become a prevalent research topic in computer vision. The technology can be applied in many areas, such as video surveillance, medical assistance, and sport motion analysis. Due to higher demand for HPE, many HPE libraries have been developed in the last 20 years. In the last 5 years, more and more skeleton-based HPE algorithms have been developed and packaged into libraries to provide ease of use for researchers. Hence, the performance of these libraries is important when researchers intend to integrate them into real-world applications for video surveillance, medical assistance, and sport motion analysis. However, a comprehensive performance comparison of these libraries has yet to be conducted. Therefore, this paper aims to investigate the strengths and weaknesses of four popular state-of-the-art skeleton-based HPE libraries for human pose detection, including OpenPose, PoseNet, MoveNet, and MediaPipe Pose. A comparative analysis of these libraries based on images and videos is presented in this paper. The percentage of detected joints (PDJ) was used as the evaluation metric in all comparative experiments to reveal the performance of the HPE libraries. MoveNet showed the best performance for detecting different human poses in static images and videos.
Three-dimensional digital images are gaining more attention in pattern recognition field. Mostly literatures, however, only focus on theoretical framework of twodimensional moment invariants, that are only implemented on two-dimensional images. Consequently, it reduces the invariance flexibility to support three-dimensional objects. In this paper, we introduce three-dimensional scale invariants ofLegendre moments. They are algebraically derived directlyfrom Legendrepolynomials. Simulated experiments using three-dimensional binary images are carried out to verify the validity ofproposed invariance.
Digital signage is widely utilized in digital-out-of-home (DOOH) advertising for marketing and business. Recently, the combination of the digital camera and digital signage enables the advertiser to gather the audience demographic for audience measurement. Audience measurement is useful for the advertiser to understand the audience's behavior and improve their business strategies. When an audience is facing the digital display, the vision-based DOOH system will process the audience's face and broadcast a personalized advertisement. Most of the digital signage is available in an uncontrolled environment of public areas. Thus, it poses two main challenges for the vision-based DOOH system to track the audience's movement, which are multiple adjacent faces and occlusion by passer-by. In this paper, a new framework is proposed to combine the digital signage with a depth camera for tracking multi-face in the three-dimensional (3D) environment. The proposed framework extracts the audience's face centroid position (x, y) and depth information (z) and plots into the aerial map to simulate the audience's movement that is corresponding to the real-world environment. The advertiser can further measure the advertising effectiveness through the audience's behavior.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.