2022
DOI: 10.3390/s22228601
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping

Abstract: Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply ultrasound tongue imaging (UTI) as an input. (Micro)convex transducers are mostly used, which provide a wedge-shape visual image. However, this process is optimized for the visual inspection of the human eye, and the signal is often post-processed by the equipment. With newer ultrasound equipment, now it is possible to gain access to the raw scanline data (i.e., ultrasound echo return) without any internal post-processing. In th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 54 publications
0
9
0
Order By: Relevance
“…The use of articulatory information in speech technology is less mature than standard speech recognition or speech synthesis; those methods using articulatory signals are currently at the basic research level and do not yet have applications for everyday people. Most related research deals with the way how articulatory information can be used to extend speech technology, for example as input or output of the system, like articulatory-to-acoustic mapping [9], [15], [16] or acoustic-toarticulatory inversion [17], [27].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The use of articulatory information in speech technology is less mature than standard speech recognition or speech synthesis; those methods using articulatory signals are currently at the basic research level and do not yet have applications for everyday people. Most related research deals with the way how articulatory information can be used to extend speech technology, for example as input or output of the system, like articulatory-to-acoustic mapping [9], [15], [16] or acoustic-toarticulatory inversion [17], [27].…”
Section: Discussionmentioning
confidence: 99%
“…SSI systems represent a revolutionary direction in speech technology, where silent articulatory movements are captured by some device and from this speech is automatically generated while the original speaker does not make a sound [3]. In most previous research on SSI, only a few speakers have been studied [3]- [9]. Although the results of these studies are encouraging, further research is needed to develop session-and speakerindependent SSI systems [10].…”
Section: A the Relationship Between Articulatory Movement And Speech ...mentioning
confidence: 99%
See 1 more Smart Citation
“…2. 1 The layers of the 2D and 3D CNNs in the Keras implementation, along with their most important parameters.…”
Section: List Of Tablesmentioning
confidence: 99%
“…In addition to using the VAD implementation available from WebRTC [1], we trained a CNN to perform VAD directly from ultrasound images. volved using a simple frame-by-frame approach, where a single image was used as input and a 2D-CNN was applied to classify each frame as either silence or speech(SI/SP).…”
Section: Conv3dmentioning
confidence: 99%