2021 IEEE Spoken Language Technology Workshop (SLT) 2021
DOI: 10.1109/slt48900.2021.9383619
|View full text |Cite
|
Sign up to set email alerts
|

Tal: A Synchronised Multi-Speaker Corpus of Ultrasound Tongue Imaging, Audio, and Lip Videos

Abstract: We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos. TaL consists of two parts: TaL1 is a set of six recording sessions of one professional voice talent, a male native speaker of English; TaL80 is a set of recording sessions of 81 native speakers of English without voice talent experience. Overall, the corpus contains 24 hours of parallel ultrasound, video, and audio data, of which approximately 13.5 hours are speech. This paper describes the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 27 publications
(25 citation statements)
references
References 33 publications
0
25
0
Order By: Relevance
“…A few frames were rejected if they had no discernible features leaving a total of 520 test frames. The recordings comprised: A total of 10 recordings from 6 TaL Corpus [ 32 ] adult speakers (Micro system, 90° FOV, 64-element 3 MHz, 20 mm radius convex depth 80, 81 fps). These recordings were the first few recordings from the corpus and not specially selected.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…A few frames were rejected if they had no discernible features leaving a total of 520 test frames. The recordings comprised: A total of 10 recordings from 6 TaL Corpus [ 32 ] adult speakers (Micro system, 90° FOV, 64-element 3 MHz, 20 mm radius convex depth 80, 81 fps). These recordings were the first few recordings from the corpus and not specially selected.…”
Section: Methodsmentioning
confidence: 99%
“…A total of 10 recordings from 6 TaL Corpus [ 32 ] adult speakers (Micro system, 90° FOV, 64-element 3 MHz, 20 mm radius convex depth 80, 81 fps). These recordings were the first few recordings from the corpus and not specially selected.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We experimented with four English male (03mn, 04me, 05ms, 07me) and four female subjects (01fi, 02fe, 06fe, and 09fe) from the UltraSuite-TaL80 database [19] (https: //ultrasuite.github.io/data/tal_corpus/). In parallel with speech (digitized at 48 kHz), the tongue movement was recorded in midsagittal orientation using the "Micro" ultrasound system of Articulate Instruments Ltd. at 81.5 fps.…”
Section: Datamentioning
confidence: 99%
“…The use of large, open vocabulary continuous speech recognition (LVCSR) to substitute human listening evaluations is a recent innovation. For instance, an open source LVCSR system available from [15] was also used to evaluate TTS intelligibility in [16]. Previously, only closed vocabulary ASR had been used for transcription tasks, as in [17].…”
Section: Introductionmentioning
confidence: 99%