Bo Wang scite author profile

Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in the recent years. Generally, the performance of existing methods drops when the target person is too small/large, or the motion is too fast/slow relative to the scale and speed of the training data. Moreover, to our knowledge, many of these methods are not designed or trained under severe occlusion explicitly, making their performance on handling occlusion compromised. Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation. As humans in videos may appear in different scales and have various motion speeds, we apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multi-stride temporal convolutional networks (TCNs) to estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement. During training, we explicitly mask out some keypoints to simulate various occlusion cases, from minor to severe occlusion, so that our network can learn better and becomes robust to various degrees of occlusion. As there are limited 3D ground truth data, we further utilize 2D video data to inject a semi-supervised learning capability to our network. Experiments on public data sets validate the effectiveness of our method, and our ablation studies show the strengths of our network's individual submodules.

show abstract

Ultrasound Speckle Reduction via Super Resolution and Nonlinear Diffusion

Wang

Cao

Dai

et al. 2010

View full text Add to dashboard Cite

Data-Driven Rank Aggregation with Application to Grand Challenges

Fishbaugh

Prastawa

Wang

et al. 2017

View full text Add to dashboard Cite

Modeling 4D pathological changes by leveraging normative models

Wang

Prastawa

Irimia

et al. 2016

Computer Vision and Image Understanding

View full text Add to dashboard Cite

With the increasing use of efficient multimodal 3D imaging, clinicians are able to access longitudinal imaging to stage pathological diseases, to monitor the efficacy of therapeutic interventions, or to assess and quantify rehabilitation efforts. Analysis of such four-dimensional (4D) image data presenting pathologies, including disappearing and newly appearing lesions, represents a significant challenge due to the presence of complex spatio-temporal changes. Image analysis methods for such 4D image data have to include not only a concept for joint segmentation of 3D datasets to account for inherent correlations of subject-specific repeated scans but also a mechanism to account for large deformations and the destruction and formation of lesions (e.g., edema, bleeding) due to underlying physiological processes associated with damage, intervention, and recovery. In this paper, we propose a novel framework that provides a joint segmentation-registration framework to tackle the inherent problem of image registration in the presence of objects not present in all images of the time series. Our methodology models 4D changes in pathological anatomy across time and and also provides an explicit mapping of a healthy normative template to a subject’s image data with pathologies. Since atlas-moderated segmentation methods cannot explain appearance and locality pathological structures that are not represented in the template atlas, the new framework provides different options for initialization via a supervised learning approach, iterative semisupervised active learning, and also transfer learning, which results in a fully automatic 4D segmentation method. We demonstrate the effectiveness of our novel approach with synthetic experiments and a 4D multimodal MRI dataset of severe traumatic brain injury (TBI), including validation via comparison to expert segmentations. However, the proposed methodology is generic in regard to different clinical applications requiring quantitative analysis of 4D imaging representing spatio-temporal changes of pathologies.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bo Wang

Occlusion-Aware Networks for 3D Human Pose Estimation in Video

3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training

Ultrasound Speckle Reduction via Super Resolution and Nonlinear Diffusion

Data-Driven Rank Aggregation with Application to Grand Challenges

Modeling 4D pathological changes by leveraging normative models

Contact Info

Product

Resources

About