Comparing gesture recognition accuracy using color and depth information

Doliotis, Paul; Stefan, Alexandra; McMurrough, Christopher; Eckhard, David; Athitsos, Vassilis

doi:10.1145/2141622.2141647

Cited by 84 publications

(54 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figures 5 and 6 show the results for the skeletal tracker and its comparison to the single hand detector, respectively. We can see in figure 6, for example, that 50% of the signs had a maximum pixel error of about 22 pixels or less when the comparison method of [4] was used to detect the hands. shows an example frame with good accuracy using the skeletal tracker on both hands in a two-handed sign.…”

Section: Resultsmentioning

confidence: 99%

“…This operation was performed on each frame of the signs, and the accuracy was calculated to serve as the benchmark for the evaluation of future methods. As an example comparison, we processed the one-handed signs with the single hand locator described in [4]-a method based on movement and depth alone-and calculated the results using the same pixel Euclidean distance similarity measure.…”

Section: Methodsmentioning

confidence: 99%

“…One of the earlier methods using the Kinect, proposed by Doliotis, et al, uses a combination of depth video motion analysis and scene distance information [4]. The algorithm isolates the person performing the gesture using segmentation on the depth data, and a score is calculated for each pixel belonging to the person based on distance to the camera and motion.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Toward a 3D body part detection video dataset and hand tracking benchmark

Conly

Doliotis

Jangyodsuk

et al. 2013

Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments

Self Cite

View full text Add to dashboard Cite

The purpose of this paper is twofold. First, we introduce our Microsoft Kinect-based video dataset of American Sign Language (ASL) signs designed for body part detection and tracking research. This dataset allows researchers to experiment with using more than 2-dimensional (2D) color video information in gesture recognition projects, as it gives them access to scene depth information. Not only can this make it easier to locate body parts like hands, but without this additional information, two completely different gestures that share a similar 2D trajectory projection can be difficult to distinguish from one another. Second, as an accurate hand locator is a critical element in any automated gesture or sign language recognition tool, this paper assesses the efficacy of one popular open source user skeleton tracker by examining its performance on random signs from the above dataset. We compare the hand positions as determined by the skeleton tracker to ground truth positions, which come from manual hand annotations of each video frame. The purpose of this study is to establish a benchmark for the assessment of more advanced detection and tracking methods that utilize scene depth data. For illustrative purposes, we compare the results of one of the methods previously developed in our lab for detecting a single hand to this benchmark.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Toward a 3D body part detection video dataset and hand tracking benchmark

Conly

Doliotis

Jangyodsuk

et al. 2013

Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments

Self Cite

View full text Add to dashboard Cite

show abstract

“…For example, in the [9] literature, palm position in each frame is extracted and thus a trajectory curve is formed. The frame rate of Kinect is 30 frames per second, and if the dynamic sign language "graduation certificate" lasts for 2.13 second, then totally 64 frames and 64 corresponding palm points can be obtained.…”

Section: Trace Acquirementmentioning

confidence: 99%

Real-Time Dynamic Sign Language Recognition Based on Hierarchical Matching Strategy

Liang¹,

Huang²,

Hu³

2017

IJSIP

View full text Add to dashboard Cite

show abstract

“…Several methods have been suggested to detect the signs easily and accurately. Among the noteworthy methodologies used for sign recognition are Template Matching (Liu & Fujimura, 2004), Conditional Random Fields (CRF) (Saad et al, 2012) and Dynamic Time Warping (DTW) (Doliotis et al, 2011). Dynamic signs are the signs which rely on hands, head and body motion.…”

Section: Literature Reviewmentioning

confidence: 99%