2020
DOI: 10.1109/taslp.2020.3036182
|View full text |Cite
|
Sign up to set email alerts
|

Vocal Tract Contour Tracking in rtMRI Using Deep Temporal Regression Network

Abstract: Recent advances in real-time Magnetic Resonance Imaging (rtMRI) provide an invaluable tool to study speech articulation. In this paper, we present an effective deep learning approach for supervised detection and tracking of vocal tract contours in a sequence of rtMRI frames. We train a single input multiple output deep temporal regression network (DTRN) to detect the vocal tract (VT) contour and the separation boundary between different articulators. The DTRN learns the non-linear mapping from an overlapping f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 31 publications
0
5
0
Order By: Relevance
“…Previous methodologies have implemented greater automation by leveraging deep learning techniques. However, these rely on training data sourced from a relatively narrow sample of behaviours, imaging centres, and pulse sequences (e.g., Asadiabadi & Erzin, 2020 ; Bresch & Narayanan, 2009 ; Eslami et al, 2020 ; Labrunie et al, 2018 ; van Leeuwen et al, 2019 ; Mannem & Ghosh, 2021 ; Pandey & Sabbir Arif, 2021 ; Ruthven et al, 2021 ; Silva & Teixeira, 2015 ; Somandepalli et al, 2017 ; Takemoto et al, 2019 ; Valliappan et al, 2019 ) . While our pipeline is more labour-intensive, we have demonstrated that it generalises beyond the dataset for which it was designed (see Belyk et al, 2022 ).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Previous methodologies have implemented greater automation by leveraging deep learning techniques. However, these rely on training data sourced from a relatively narrow sample of behaviours, imaging centres, and pulse sequences (e.g., Asadiabadi & Erzin, 2020 ; Bresch & Narayanan, 2009 ; Eslami et al, 2020 ; Labrunie et al, 2018 ; van Leeuwen et al, 2019 ; Mannem & Ghosh, 2021 ; Pandey & Sabbir Arif, 2021 ; Ruthven et al, 2021 ; Silva & Teixeira, 2015 ; Somandepalli et al, 2017 ; Takemoto et al, 2019 ; Valliappan et al, 2019 ) . While our pipeline is more labour-intensive, we have demonstrated that it generalises beyond the dataset for which it was designed (see Belyk et al, 2022 ).…”
Section: Discussionmentioning
confidence: 99%
“…A promising avenue for development has come from the application of machine learning, and deep learning algorithms in particular (Goodfellow et al, 2016 ). This is a field of statistics that continues to develop rapidly, and several implementations have been proposed for applications to rtMRI (e.g., Asadiabadi & Erzin, 2020 ; Bresch & Narayanan, 2009 ; Eslami et al, 2020 ; Labrunie et al, 2018 ; van Leeuwen et al, 2019 ; Mannem & Ghosh, 2021 ; Pandey & Sabbir Arif, 2021 ; Ruthven et al, 2021 ; Silva & Teixeira, 2015 ; Somandepalli et al, 2017 ; Takemoto et al, 2019 ; Valliappan et al, 2019 ). Deep-learning-based approaches can yield machine-generated traces of the vocal tract which are sufficiently accurate to be useful for scientific measurements.…”
Section: Introductionmentioning
confidence: 99%
“…Previous methodologies have implemented greater automation by leveraging powerful machine learning techniques. However, these make strong assumptions about the underlying data or rely on models which are trained from a narrow sample which reduces their generalisability to new research [12][13][14][15][16]. We have demonstrated that our pipeline generalises beyond the dataset for which it was designed.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, those approaches which are available have not yet been demonstrated to generalise beyond the individual datasets for which they were developed. These methodologies typically involve automated or machine learning processes which are trained and tested against a narrow range of data, typically composed of a small number of speakers scanned at a single imaging centre [12][13][14][15][16][17][18]. The development of these techniques has sampled disproportionally from a single image repository [6].…”
Section: Introductionmentioning
confidence: 99%
“…Initially proposed for tracking the tongue contour in ultrasound images with autoencoders [31], these techniques have been used for MRI images with the help of U-Net [32] as we did [21]. Other methods like Deep Temporal Regression Network (DTRN), which take into account a series of images, present the advantage of integrating the movement for tracking several articulators [33].…”
Section: Tracking Articulatorsmentioning
confidence: 99%