In this paper we consider the problem of human pose estimation from a single still image. We propose a novel approach where each location in the image votes for the position of each keypoint using a convolutional neural net. The voting scheme allows us to utilize information from the whole image, rather than rely on a sparse set of keypoint locations. Using dense, multi-target votes, not only produces good keypoint predictions, but also enables us to compute image-dependent joint keypoint probabilities by looking at consensus voting. This differs from most previous methods where joint probabilities are learned from relative keypoint locations and are independent of the image. We finally combine the keypoints votes and joint probabilities in order to identify the optimal pose configuration. We show our competitive performance on the MPII Human Pose and Leeds Sports Pose datasets.
The human visual system recognizes objects and their constituent parts rapidly and with high accuracy. Standard models of recognition by the visual cortex use feed-forward processing, in which an object's parts are detected before the complete object. However, parts are often ambiguous on their own and require the prior detection and localization of the entire object. We show how a cortical-like hierarchy obtains recognition and localization of objects and parts at multiple levels nearly simultaneously by a single feed-forward sweep from low to high levels of the hierarchy, followed by a feedback sweep from high-to low-level areas.computer vision ͉ object recognition ͉ parts interpretation ͉ cortical hierarchy ͉ feedback processing I n the course of visual object recognition, we quickly recognize not only complete objects but also parts and subparts at different levels of detail. Hierarchical models of the visual cortex (1-3) typically perform recognition in a feed-forward manner in which recognition proceeds from the detection of simple features to more complex parts to the full object. However, the recognition of local parts is often ambiguous and depends on the object's context (Fig. 1), which is not available during feedforward processing.Psychological studies have also shown that the identification of a global shape and its local components proceed at similar speeds. Depending on the configuration, the global shape can either precede or follow the recognition of its local parts, and both contribute to final recognition (4, 5). Event-related potential (ERP) (6) and magnetoencephalography (MEG) (7) recordings have shown fast responses to both objects and parts, and physiological studies found that shape selectivity at different cortical levels emerges quickly and can sometimes further increase over a short time interval (8-11).We show below how objects and their multilevel components can be detected by the cortical hierarchy efficiently and almost simultaneously, even when the local parts on their own are highly ambiguous. Unlike feed-forward models, the basic computation is a particular bottom-up (BU) top-down (TD) cycle. Feedforward recognition was shown in past modeling to produce fast effective top-level recognition. However, we show that even when correct recognition is obtained by the BU pass, frequent errors occur at the parts level. A single TD pass is sufficient to correct almost all errors made during the BU pass, and the full cycle obtains not only object recognition but a detailed interpretation of the entire figure at multiple levels of details. We first describe below the computational model used for object and part recognition and then report testing results on natural images.Bidirectional Hierarchical Model. In this section, we consider the problem of detecting an object C together with a set P of parts
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.