Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

Zainkó, Csaba; Tóth, László; Shandiz, Amin Honarmandi; Gosztolya, Gábor; Markó, Alexandra; Németh, Géza; Csapó, Tamás Gábor

doi:10.21437/ssw.2021-10

Cited by 3 publications

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While this simple arrangement already performs reasonably well, significant improvement can be achieved by involving the input context, that is, by using a block of video frames as input instead of just one image. Several network architectures have been proposed to process 3D blocks of input data, for video processing in general [24,25,26], and for ultrasound input in particular [12,14,27,16,28]. In the experimental section we will experiment both with 2D and 3D Convolutional Neural Networks (CNNs) for the mapping task.…”

Section: The Uti-to-speech Frameworkmentioning

confidence: 99%

“…Ideally, these interfaces would record the articulation and synthesize speech based on the movement of the organs -without the user of the device actually producing any sound. The typical input of AAM can be a video of the lip movements [3,4,5,6,7,8], ultrasound tongue imaging (UTI) [3,9,10,11,12,13,14,15,16,17], or several other modalities (e.g., MRI, EMA, PMA, EOS, radar, multimodal, etc.). All of the articulatory tracking devices are highly sensitive to 1) the alignment of the recording equipment across sessions, 2) the actual speaker's anatomy.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Legal education at the crossroads of social responsibility and the development of individual competencies – experience at the University of Szeged (Hungary)

Tóth

Kálmán

2021

Acta Iuris Stetinensis

View full text Add to dashboard Cite

The authors describe the role of the legal clinical education as an instrument for community empowerment through pro bono legal counseling, easily understandable e-compilation (Vademecum) of legal terms, and extended legal practice for law students, together with the development of their professional competencies. The new requirements in legal education, as determined by the government in 2016, focus on labour market needs, but academics and leaders of the University of Szeged have created an amalgam of tools for access to justice for local residents and NGOs in a less wealthy social environment, thereby introducing changes 1 The research for this study was carried out with the support of the programs of the Hungarian Ministry of Justice enhancing the standards of legal education.

show abstract

Section: The Uti-to-speech Frameworkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Legal education at the crossroads of social responsibility and the development of individual competencies – experience at the University of Szeged (Hungary)

Tóth

Kálmán

2021

Acta Iuris Stetinensis

View full text Add to dashboard Cite

show abstract

“…While this simple arrangement already performs reasonably well, significant improvement can be achieved by involving the input context, that is, by using a block of video frames as input instead of just one image. Several network architectures have been proposed to process 3D blocks of input data, for video processing in general [24,48,104], and for ultrasound input in particular [53,86,102,113]. In the experimental section we will experiment both with 2D and 3D Convolutional Neural Networks (CNNs) for the mapping task.…”

Section: The Uti-to-speech Frameworkmentioning

confidence: 99%

Improvements of Silent Speech Interface Algorithms

Honarmandi Shandiz

View full text Add to dashboard Cite

Gammatone filter features are another type of speech feature extraction method that is based on modeling the human auditory system. They are calculated by filtering the speech signal with a bank of gammatone filters, which are modeled after the tuning of the auditory system's hair cells. The output of each filter is then rectified and low-pass filtered, and the resulting signals are then used as features.This function is commonly used in ANNs as an activation function for hidden layers It is able to produce speech with natural-sounding intonation and prosody.

show abstract

A systematic review of the application of machine learning techniques to ultrasound tongue imaging analysis

Xia,

Yuan,

Cao

et al. 2024

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

B-mode ultrasound has emerged as a prevalent tool for observing tongue motion in speech production, gaining traction in speech therapy applications. However, the effective analysis of ultrasound tongue image frame sequences (UTIFs) encounters many challenges, such as the presence of high levels of speckle noise and obscured views. Recently, the application of machine learning, especially deep learning techniques, to UTIF interpretation has shown promise in overcoming these hurdles. This paper presents a thorough examination of the existing literature, focusing on UTIF analysis. The scope of our work encompasses four key areas: a foundational introduction to deep learning principles, an exploration of motion tracking methodologies, a discussion of feature extraction techniques, and an examination of cross-modality mapping. The paper concludes with a detailed discussion of insights gleaned from the comprehensive literature review, outlining potential trends and challenges that lie ahead in the field.

show abstract

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

Cited by 3 publications

References 0 publications

Legal education at the crossroads of social responsibility and the development of individual competencies – experience at the University of Szeged (Hungary)

Legal education at the crossroads of social responsibility and the development of individual competencies – experience at the University of Szeged (Hungary)

Improvements of Silent Speech Interface Algorithms

A systematic review of the application of machine learning techniques to ultrasound tongue imaging analysis

Contact Info

Product

Resources

About