2021
DOI: 10.1016/j.media.2021.102131
|View full text |Cite
|
Sign up to set email alerts
|

A deep joint sparse non-negative matrix factorization framework for identifying the common and subject-specific functional units of tongue motion during speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 43 publications
1
6
0
Order By: Relevance
“…We then computed a sequence of voxel-level motion fields during the speech tasks from tagged MRI. 1,14 The data for this work were collected using a Siemens 3.0T TIM Trio system equipped with a 12-channel head coil and a 4-channel neck coil, using a segmented gradient echo sequence. 15,16 The 4D motion fields x has the size of 3 × 128 × 128 × 128 × 26.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We then computed a sequence of voxel-level motion fields during the speech tasks from tagged MRI. 1,14 The data for this work were collected using a Siemens 3.0T TIM Trio system equipped with a 12-channel head coil and a 4-channel neck coil, using a segmented gradient echo sequence. 15,16 The 4D motion fields x has the size of 3 × 128 × 128 × 128 × 26.…”
Section: Resultsmentioning
confidence: 99%
“…This can help us associate tongue and oropharyngeal muscle deformation with its corresponding acoustic information. Internal tissue point tracking data from three-dimensional (3D) tagged MRI 1 sequences contain far more information about the tongue and oropharyngeal motion than does the more conventional two-dimensional (2D) mid-sagittal image sequences obtained from cine-MRI 2 and tagged MRI. 3 Yet, associating these four-dimensional (4D) deformation fields with speech audio waveforms poses the following challenges: 1) efficient feature extraction from complex and high-dimensional tongue and oropharyngeal deformation and 2) heterogeneous data representations between 4D motion fields and high-frequency one-dimensional (1D) audio waveforms.…”
Section: Introductionmentioning
confidence: 99%
“…We generated a sequence of voxel-level motion fields during the speech tasks from tagged MRI. 13,14 The resulting 4D motion fields x consist of 26 frames with a size of 3 × [(128 × 128 × 128) × 26], where each voxel of the motion fields has three channels to represent 3D directions. As for the corresponding audio waveforms, their lengths range from 21,832 to 24,175.…”
Section: Methodsmentioning
confidence: 99%
“…The definition of deep unrolling was proposed by Gregor and LeCun [69], who unrolled the iterative shrinkage/thresholding algorithm (ISTA) to solve the optimization problem for sparse coding and achieved a nearly 20-fold improvement in time efficiency. Recently, by providing the neural network interpretability of iterative sparse coding with fewer layers and faster convergence, the ISTA-based deep unrolling algorithm has achieved great success in solv-ing inverse problems for biomedical imaging [70], exploiting multimodal side information for image superresolution [71], and implementing nonnegative matrix factorization for functional unit identification [72].…”
Section: Interpretable Deep Algorithm Unrollingmentioning
confidence: 99%