Convolutional neural networks (CNNs) have been widely utilized in many computer vision tasks. However, CNNs have a fixed reception field and lack the ability of long-range perception, which is crucial to human pose estimation. Due to its capability to capture long-range dependencies between pixels, transformer architecture has been adopted to computer vision applications recently and is proven to be a highly effective architecture. We are interested in exploring its capability in human pose estimation, and thus propose a novel model based on transformer architecture, enhanced with a feature pyramid fusion structure. More specifically, we use pre-trained Swin Transformer as our backbone and extract features from input images, we leverage a feature pyramid structure to extract feature maps from different stages. By fusing the features together, our model predicts the keypoint heatmap. The experiment results of our study have demonstrated that the proposed transformer-based model can achieve better performance compared to the state-of-the-art CNN-based models.
In modern security situations, tracking multiple human objects in real-time within challenging urban environments is a critical capability for enhancing situational awareness, minimizing response time, and increasing overall operational effectiveness. Tracking multiple entities enables informed decision-making, risk mitigation, and the safeguarding of civil-military operations to ensure safety and mission success. This paper presents a multi-modal electro-optical/infrared (EO/IR) and radio frequency (RF) fused sensing (MEIRFS) platform for real-time human object detection, recognition, classification, and tracking in challenging environments. By utilizing different sensors in a complementary manner, the robustness of the sensing system is enhanced, enabling reliable detection and recognition results across various situations. Specifically designed radar tags and thermal tags can be used to discriminate between friendly and non-friendly objects. The system incorporates deep learning-based image fusion and human object recognition and tracking (HORT) algorithms to ensure accurate situation assessment. After integrating into an all-terrain robot, multiple ground tests were conducted to verify the consistency of the HORT in various environments. The MEIRFS sensor system has been designed to meet the Size, Weight, Power, and Cost (SWaP-C) requirements for installation on autonomous ground and aerial vehicles.
Human pose estimation has enabled numerous applications by classifying and tracking body movements. Although a few open datasets have emerged to facilitate the evaluation of pose detection methods, they are too generic to benefit domain specific applications such as physical therapy which has quantitative clinical metrics and requires precise differentiation and measurement. To address this issue, we construct the first human keypoints detection dataset for physical therapy, in particular lower body rehabilitation. The dataset consists of 1,885,637 distinctive human poses for 31 lower body rehab exercises, which are performed by 20 actors under the guidance of a licensed physical therapist. Their motion are captured with state of art of motion tracking system in both 3D and 2D to establish the ground truth. Furthermore, to optimize a number of deep learning models applied on this unique dataset, we utilize Extremely Efficient Spatial Pyramid (EESP) and attention mechanism to reduce the models' computational complexity. Our experiment results show that the optimized models achieve comparable performance with nearly 4x reduction in complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.