Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping

Csapó, Tamás Gábor; Gosztolya, Gábor; Tóth, László; Shandiz, Amin Honarmandi; Markó, Alexandra

doi:10.3390/s22228601

Cited by 9 publications

(9 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of articulatory information in speech technology is less mature than standard speech recognition or speech synthesis; those methods using articulatory signals are currently at the basic research level and do not yet have applications for everyday people. Most related research deals with the way how articulatory information can be used to extend speech technology, for example as input or output of the system, like articulatory-to-acoustic mapping [9], [15], [16] or acoustic-toarticulatory inversion [17], [27].…”

Section: Discussionmentioning

confidence: 99%

“…SSI systems represent a revolutionary direction in speech technology, where silent articulatory movements are captured by some device and from this speech is automatically generated while the original speaker does not make a sound [3]. In most previous research on SSI, only a few speakers have been studied [3]- [9]. Although the results of these studies are encouraging, further research is needed to develop session-and speakerindependent SSI systems [10].…”

Section: A the Relationship Between Articulatory Movement And Speech ...mentioning

confidence: 99%

“…The ultrasound tongue images were used as 8-bit grayscale pixels in the raw ultrasound form of the "Micro" system. The images, originally 64x842 pixels, were resized to 64x128 pixels as this does not cause significant loss of information [9]. The ultrasound image is relatively redundant and can therefore be compressed efficiently, which can be an advantage in subsequent processing, as we only need to work with data of smaller dimensions.…”

Section: Preprocessing the Articulation Datamentioning

confidence: 99%

See 2 more Smart Citations

Is Dynamic Time Warping of speech signals suitable for articulatory signal comparison using ultrasound tongue images?

Csapó¹

2023

1st Workshop on Intelligent Infocommunication Networks, Systems and Services

View full text Add to dashboard Cite

In speech technology, the examination of speaker dependency is vital -that is, whether methods developed for one speaker can be adapted to another speaker or not. In the case of text-to-speech synthesis, well-usable speaker adaptation methods are already available, but they cannot be used directly for articulatory data (movement of the tongue, lips, etc, during speech production). In this research, we investigate the above question and analyze the speaker dependency of the articulatory movement, using audio signal and ultrasound tongue imaging (UTI) recorded in parallel during speech production. For the comparison, we use the well-known Dynamic Time Warping (DTW) procedure of speech technology. DTW of the speech signal has already been successfully applied 1) with UTI, for withinspeaker comparisons, 2) with electromagnetic articulography (EMA), for the analysis of inter-speaker differences, 3) with EMA and electrocortocography (ECoG), also for inter-speaker comparisons. However, there has been no previous research yet on the application of DTW on speech signals with ultrasound tongue images for different speakers. In the present research, we examine the applicability of DTW for comparing speakers' speech and articulatory data on a few Hungarian and English examples, and visually analyze them. In the long term, we plan to use the results for speech-based brain-computer interfaces, so that we can supplement the brain signal with ultrasound-based articulation information.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: A the Relationship Between Articulatory Movement And Speech ...mentioning

confidence: 99%

Section: Preprocessing the Articulation Datamentioning

confidence: 99%

See 1 more Smart Citation

Is Dynamic Time Warping of speech signals suitable for articulatory signal comparison using ultrasound tongue images?

Csapó¹

2023

1st Workshop on Intelligent Infocommunication Networks, Systems and Services

View full text Add to dashboard Cite

show abstract

“…2. 1 The layers of the 2D and 3D CNNs in the Keras implementation, along with their most important parameters.…”

Section: List Of Tablesmentioning

confidence: 99%

“…In addition to using the VAD implementation available from WebRTC [1], we trained a CNN to perform VAD directly from ultrasound images. volved using a simple frame-by-frame approach, where a single image was used as input and a 2D-CNN was applied to classify each frame as either silence or speech(SI/SP).…”

Section: Conv3dmentioning

confidence: 99%

Improvements of Silent Speech Interface Algorithms

Honarmandi Shandiz

View full text Add to dashboard Cite

Gammatone filter features are another type of speech feature extraction method that is based on modeling the human auditory system. They are calculated by filtering the speech signal with a bank of gammatone filters, which are modeled after the tuning of the auditory system's hair cells. The output of each filter is then rectified and low-pass filtered, and the resulting signals are then used as features.This function is commonly used in ANNs as an activation function for hidden layers It is able to produce speech with natural-sounding intonation and prosody.

show abstract

Automated Identification of Failure Cases in Organ at Risk Segmentation Using Distance Metrics: A Study on CT Data

Shandiz,

Rádics,

Tamada

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping

Cited by 9 publications

References 54 publications

Is Dynamic Time Warping of speech signals suitable for articulatory signal comparison using ultrasound tongue images?

Is Dynamic Time Warping of speech signals suitable for articulatory signal comparison using ultrasound tongue images?

Improvements of Silent Speech Interface Algorithms

Automated Identification of Failure Cases in Organ at Risk Segmentation Using Distance Metrics: A Study on CT Data

Contact Info

Product

Resources

About