Coarse-to-fine online learning for hand segmentation in egocentric video

Zhao, Ying; Luo, Zhiwei; Quan, Changqin

doi:10.1186/s13640-018-0262-1

Cited by 4 publications

(5 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Usually, methods for online hand segmentation made assumptions on the hand motion [55], [56], [57], [58] and/or required the user to perform a calibration with pre-defined hand movements [59]. In this way, the combination of color and motion features facilitates the detection of hand pixels, in order to train segmentation models online.…”

Section: Lack Of Pixel-level Annotationsmentioning

confidence: 99%

“…These matches were estimated using RANSAC [61] and after being removed, those left were assumed to belong to the hands and used to locate the seed point for region growing. Zhao et al [57], [58] based their approach on the typical motion pattern during actions involving the hands: preparatory phase (i.e., the hands move from the lower part of the frame to the image center) and interaction phase. During the preparatory phase they used a motion-based segmentation, computing the TV-L1 optical flow [62].…”

Section: Lack Of Pixel-level Annotationsmentioning

confidence: 99%

“…In these cases, a pre-filtering step that prevents from processing frames without any hands is necessary. This approach allows determining whether an image contains hands and it is usually followed by a hand segmentation step responsible for locating the hand region [9], [29], [32], [57], [58].…”

Section: Hand Detection As Image Classificationmentioning

confidence: 99%

“…To solve this issue, the authors proposed a dynamic Bayesian network (DBN) to smooth the classification results of the SVM and improve the prediction performance [72]. Zhao et al [57], [58] detected the presence of hands within each frame exploiting the typical interaction cycle of the hands (i.e., preparatory phase -interaction -hands out of the frame). Based on this observation, they defined an ego-saliency metric related to the probability of having hands within a frame.…”

Section: Hand Detection As Image Classificationmentioning

confidence: 99%

See 3 more Smart Citations

Analysis of the Hands in Egocentric Vision: A Survey

Bandini

Zariffa

2023

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Egocentric vision (a.k.a. first-person vision -FPV) applications have thrived over the past few years, thanks to the availability of affordable wearable cameras and large annotated datasets. The position of the wearable camera (usually mounted on the head) allows recording exactly what the camera wearers have in front of them, in particular hands and manipulated objects. This intrinsic advantage enables the study of the hands from multiple perspectives: localizing hands and their parts within the images; understanding what actions and activities the hands are involved in; and developing human-computer interfaces that rely on hand gestures. In this survey, we review the literature that focuses on the hands using egocentric vision, categorizing the existing approaches into: localization (where are the hands or part of them?); interpretation (what are the hands doing?); and application (e.g., systems that used egocentric hand cues for solving a specific problem). Moreover, a list of the most prominent datasets with hand-based annotations is provided.

show abstract

Section: Lack Of Pixel-level Annotationsmentioning

confidence: 99%

Section: Lack Of Pixel-level Annotationsmentioning

confidence: 99%

Section: Hand Detection As Image Classificationmentioning

confidence: 99%

Section: Hand Detection As Image Classificationmentioning

confidence: 99%

See 2 more Smart Citations

Analysis of the Hands in Egocentric Vision: A Survey

Bandini

Zariffa

2023

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…Beside body joints, keypoints can be extended to refer to the small visual units with semantic information indicating the compositions, shapes and poses of the target objects, such as finger joints or key positions of any other objects. Therefore, accurate keypoint detection in unconstrained environments brings benefit to other more detailed visual understanding tasks, including semantic segmentation[1, 2, 3], saliency object segmentation [4,5,6], hand segmentation [7] and pose estimation [8,9], viewpoint estimation [10,11,12,13], salient object detection [14,15,16], attention prediction [17] and 3D reconstruction [18,19,20]. Similar as many computer vision tasks, the progress on human pose estimation problem is significantly improved by deep convolutional neural networks.…”

Section: Introductionmentioning

confidence: 99%

Cluster-wise learning network for multi-person pose estimation

Zhao

Luo

Quan

et al. 2020

Pattern Recognition

Self Cite

View full text Add to dashboard Cite

Lightweight real-time hand segmentation leveraging MediaPipe landmark detection

Sánchez-Brizuela,

Cisnal,

de la Fuente-López

et al. 2023

Virtual Reality

View full text Add to dashboard Cite

Real-time hand segmentation is a key process in applications that require human–computer interaction, such as gesture recognition or augmented reality systems. However, the infinite shapes and orientations that hands can adopt, their variability in skin pigmentation and the self-occlusions that continuously appear in images make hand segmentation a truly complex problem, especially with uncontrolled lighting conditions and backgrounds. The development of robust, real-time hand segmentation algorithms is essential to achieve immersive augmented reality and mixed reality experiences by correctly interpreting collisions and occlusions. In this paper, we present a simple but powerful algorithm based on the MediaPipe Hands solution, a highly optimized neural network. The algorithm processes the landmarks provided by MediaPipe using morphological and logical operators to obtain the masks that allow dynamic updating of the skin color model. Different experiments were carried out comparing the influence of the color space on skin segmentation, with the CIELab color space chosen as the best option. An average intersection over union of 0.869 was achieved on the demanding Ego2Hands dataset running at 90 frames per second on a conventional computer without any hardware acceleration. Finally, the proposed segmentation procedure was implemented in an augmented reality application to add hand occlusion for improved user immersion. An open-source implementation of the algorithm is publicly available at https://github.com/itap-robotica-medica/lightweight-hand-segmentation.

show abstract

Coarse-to-fine online learning for hand segmentation in egocentric video

Cited by 4 publications

References 29 publications

Analysis of the Hands in Egocentric Vision: A Survey

Analysis of the Hands in Egocentric Vision: A Survey

Cluster-wise learning network for multi-person pose estimation

Lightweight real-time hand segmentation leveraging MediaPipe landmark detection

Contact Info

Product

Resources

About