2019
DOI: 10.1109/access.2019.2952406
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End-Based Tibetan Multitask Speech Recognition

Abstract: To date, speech recognition technology for majority languages has been applied in wireless communication devices successfully. However, as a minority language, Tibetan has very limited resources for conventional automatic speech recognition. It lacks of enough data, sub-word units, lexicons, and word inventories for some dialects. In this paper, we present a multitask end-to-end model to perform simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition. This model avoids … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 28 publications
0
6
0
Order By: Relevance
“…It was seen that the journals "IEEE Transaction Speech Audio Processing" and "International Journal of Speech and Technology" given the highest number of papers in the study. [62], [63], [64], [65], [66], [67], [68], [69], [70], [44], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [47], [81], [82], [83], [84], [85], [53], [86], [87], [5] Conferences 33 45.2 [88], [63], [89], [90], [91], [92], [93], [94], [95], [96], [97], [98], [99], [100], [22], [101], [102], [103], [104],…”
Section: A Rq1 Aimed To Find the Various Type Of Research Papers Usementioning
confidence: 99%
“…It was seen that the journals "IEEE Transaction Speech Audio Processing" and "International Journal of Speech and Technology" given the highest number of papers in the study. [62], [63], [64], [65], [66], [67], [68], [69], [70], [44], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [47], [81], [82], [83], [84], [85], [53], [86], [87], [5] Conferences 33 45.2 [88], [63], [89], [90], [91], [92], [93], [94], [95], [96], [97], [98], [99], [100], [22], [101], [102], [103], [104],…”
Section: A Rq1 Aimed To Find the Various Type Of Research Papers Usementioning
confidence: 99%
“…In 2019, Song Wang of Northwest University for Nationalities used the method of combining long short-term memory networks and connection timing classification to carry out end-to-end acoustic modeling and speech recognition of the Lhasa dialect within the U-Tsang dialect [6]. Yue Zhao and others at the Central University for Nationalities used a framework consisting of end-to-end speech recognition and the WaveNetconnectionist temporal classification\1 (CTC) method to train a multitask system that can complete dialect recognition, speech recognition, and speaker recognition tasks simultaneously [7][8][9].…”
Section: Introductionmentioning
confidence: 99%
“…However, in our work and [12], the WaveNet model is used for the generation of waveform sample with the input of predicted Mel spectrogram for speech synthesis. In the work [24] about speech recognition, the WaveNet model is used for the generation of text sequence and the input is MFCC features. e work [12] achieved the speech synthesis for Tibetan Lhasa-Ü-Tsang by using end-to-end model.…”
Section: Introductionmentioning
confidence: 99%