Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut

Wrench, Alan; Balch-Tomes, Jonathan

doi:10.3390/s22031133

Cited by 18 publications

(11 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Still ultrasound images were aligned to the audio recording using pulses generated by the Articulate Instruments PStretch unit, recorded in AAA alongside the speech signal. Tongue splines were automatically fit with DeepLabCut (Mathis et al, 2018;Nath et al, 2019) using the MobileNet1.0-based neural network implemented in AAA (Wrench and Balch-Tomes, 2022). Tongue coordinates were rotated to a common horizontal plane (Lawson et al, 2019;Scobbie et al, 2011) by visually estimating the orientation of the ultrasound probe and camera from the side-view lip video data.…”

Section: Discussionmentioning

confidence: 99%

Articulatory and acoustic dynamics of fronted back vowels in American English

Havenhill

2024

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

Fronting of the vowels /u, ʊ, o/ is observed throughout most North American English varieties, but has been analyzed mainly in terms of acoustics rather than articulation. Because an increase in F2, the acoustic correlate of vowel fronting, can be the result of any gesture that shortens the front cavity of the vocal tract, acoustic data alone do not reveal the combination of tongue fronting and/or lip unrounding that speakers use to produce fronted vowels. It is furthermore unresolved to what extent the articulation of fronted back vowels varies according to consonantal context and how the tongue and lips contribute to the F2 trajectory throughout the vowel. This paper presents articulatory and acoustic data on fronted back vowels from two varieties of American English: coastal Southern California and South Carolina. Through analysis of dynamic acoustic, ultrasound, and lip video data, it is shown that speakers of both varieties produce fronted /u, ʊ, o/ with rounded lips, and that high F2 observed for these vowels is associated with a front-central tongue position rather than unrounded lips. Examination of time-varying formant trajectories and articulatory configurations shows that the degree of vowel-internal F2 change is predominantly determined by coarticulatory influence of the coda.

show abstract

Section: Discussionmentioning

confidence: 99%

Articulatory and acoustic dynamics of fronted back vowels in American English

Havenhill

2024

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

show abstract

“…Human pose estimation technology such as Convolutional Pose Machines (CPM) and convolution neural network (CNN) based methods which allow extraction of human movement information directly from video clips have been repeatedly tested by researchers [ 85 , 86 ] while human pose estimation application on analyzing movement in the disease populations were reported to be useful by the studies in our review [ 14 , 16 – 25 , 27 , 29 , 32 – 38 , 41 , 44 , 50 – 55 , 57 , 66 , 71 – 74 ]. Given that such trajectory extraction method is in rapid evaluation and is becoming more mature for promising identification of posture [ 87 – 89 ], using hand-held camera or smartphone as the MMC system would be especially beneficial for understanding the motor performance of individuals in their daily living tasks, hence providing valuable information on levels of impairment and on the constraints that patients might encounter in their activities of daily living in their real-life environment. It is understandable that individuals, particularly young children and older people, might behave differently when they are placed for motion capturing in an unfamiliar laboratory or a simulated environment, thus risking the possibility that the motion analysis might not truly reflect the individuals’ actual movement patterns [ 90 ].…”

Section: Discussionmentioning

confidence: 99%

A systematic review of the applications of markerless motion capture (MMC) technology for clinical measurement in rehabilitation

Lam

Tang

Fong

2023

J NeuroEngineering Rehabil

View full text Add to dashboard Cite

Background Markerless motion capture (MMC) technology has been developed to avoid the need for body marker placement during motion tracking and analysis of human movement. Although researchers have long proposed the use of MMC technology in clinical measurement—identification and measurement of movement kinematics in a clinical population, its actual application is still in its preliminary stages. The benefits of MMC technology are also inconclusive with regard to its use in assessing patients’ conditions. In this review we put a minor focus on the method’s engineering components and sought primarily to determine the current application of MMC as a clinical measurement tool in rehabilitation. Methods A systematic computerized literature search was conducted in PubMed, Medline, CINAHL, CENTRAL, EMBASE, and IEEE. The search keywords used in each database were “Markerless Motion Capture OR Motion Capture OR Motion Capture Technology OR Markerless Motion Capture Technology OR Computer Vision OR Video-based OR Pose Estimation AND Assessment OR Clinical Assessment OR Clinical Measurement OR Assess.” Only peer-reviewed articles that applied MMC technology for clinical measurement were included. The last search took place on March 6, 2023. Details regarding the application of MMC technology for different types of patients and body parts, as well as the assessment results, were summarized. Results A total of 65 studies were included. The MMC systems used for measurement were most frequently used to identify symptoms or to detect differences in movement patterns between disease populations and their healthy counterparts. Patients with Parkinson’s disease (PD) who demonstrated obvious and well-defined physical signs were the largest patient group to which MMC assessment had been applied. Microsoft Kinect was the most frequently used MMC system, although there was a recent trend of motion analysis using video captured with a smartphone camera. Conclusions This review explored the current uses of MMC technology for clinical measurement. MMC technology has the potential to be used as an assessment tool as well as to assist in the detection and identification of symptoms, which might further contribute to the use of an artificial intelligence method for early screening for diseases. Further studies are warranted to develop and integrate MMC system in a platform that can be user-friendly and accurately analyzed by clinicians to extend the use of MMC technology in the disease populations.

show abstract

“…While DLC has been used extensively to track animal and human features during movements, its ability to track features in US videos has been minimally explored. [ 46 ] used DLC to track the upper surface of the tongue and compared it to other US contour estimators, concluding that DLC requires significantly less training data to perform with the same level of accuracy. [ 47 ] used DLC to track the gastrocnemius muscle–tendon junction, observing the morphology of the lower leg longitudinally.…”

Section: Methodsmentioning

confidence: 99%

“…The manually labeled data of both Group 1 and Group 2 was restructured into the appropriate file types for DLC to use for training. The training data was augmented using the imgaug method ( https://github.com/aleju/imgaug ) and a 50-layer ResNet network was re-trained using this data for 500 k iterations, where error commonly plateaus [ 46 ] for ResNet50.…”

Section: Methodsmentioning

confidence: 99%

A comparison of point-tracking algorithms in ultrasound videos from the upper limb

Magana-Salgado,

Namburi,

Feigin-Almon

et al. 2023

BioMed Eng OnLine

View full text Add to dashboard Cite

Tracking points in ultrasound (US) videos can be especially useful to characterize tissues in motion. Tracking algorithms that analyze successive video frames, such as variations of Optical Flow and Lucas–Kanade (LK), exploit frame-to-frame temporal information to track regions of interest. In contrast, convolutional neural-network (CNN) models process each video frame independently of neighboring frames. In this paper, we show that frame-to-frame trackers accumulate error over time. We propose three interpolation-like methods to combat error accumulation and show that all three methods reduce tracking errors in frame-to-frame trackers. On the neural-network end, we show that a CNN-based tracker, DeepLabCut (DLC), outperforms all four frame-to-frame trackers when tracking tissues in motion. DLC is more accurate than the frame-to-frame trackers and less sensitive to variations in types of tissue movement. The only caveat found with DLC comes from its non-temporal tracking strategy, leading to jitter between consecutive frames. Overall, when tracking points in videos of moving tissue, we recommend using DLC when prioritizing accuracy and robustness across movements in videos, and using LK with the proposed error-correction methods for small movements when tracking jitter is unacceptable.

show abstract

Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut

Cited by 18 publications

References 33 publications

Articulatory and acoustic dynamics of fronted back vowels in American English

Articulatory and acoustic dynamics of fronted back vowels in American English

A systematic review of the applications of markerless motion capture (MMC) technology for clinical measurement in rehabilitation

A comparison of point-tracking algorithms in ultrasound videos from the upper limb

Contact Info

Product

Resources

About