2020
DOI: 10.1109/taslp.2020.3037457
|View full text |Cite
|
Sign up to set email alerts
|

Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition

Abstract: Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events. In the present study, we propose a new learning mechanism based on subspacebased representation, which can extract concealed phonotactic structures from utterances, for language verification and dialect/accent identification. The framework mainly involves two successive parts. The first part involves subspace construction. Specifically, it decodes each utterance int… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 73 publications
0
4
0
Order By: Relevance
“…Hung-Shin Lee et al, [2] the suggested technique obtained relative error rate reductions of 52 percent, 46 percent, 56 percent, and 27 percent when compared to the sequence-based PPR-LM, PPR-VSM, and PPR-IVEC methods, as well as the lattice-based PPR-LM method. They offer a novel learning technique for language verification and dialect/accent recognition that is based on subspace-based representation and can extract hidden phonotactic features from utterances.…”
Section: Related Workmentioning
confidence: 99%
“…Hung-Shin Lee et al, [2] the suggested technique obtained relative error rate reductions of 52 percent, 46 percent, 56 percent, and 27 percent when compared to the sequence-based PPR-LM, PPR-VSM, and PPR-IVEC methods, as well as the lattice-based PPR-LM method. They offer a novel learning technique for language verification and dialect/accent recognition that is based on subspace-based representation and can extract hidden phonotactic features from utterances.…”
Section: Related Workmentioning
confidence: 99%
“…We decided to explore the English language detection (ELD) based on an ASR output for the following two reasons: 1) there is about 20% of non-English speech, so running ASR on all the data does not bring a large overhead (compared to filtering non-English prior to the ASR block), 2) standard acoustic-based Language IDentification (LID) can be too data-consuming [2,3,4] and its accuracy suffers in bi-lingual code-switching environments [5]. The ASR-based ELD systems share similarities with phonotactic LID sytems [6,7,8,9]: the former relies on word confusion networks, the latter relies on phoneme sequences.…”
Section: Motivationmentioning
confidence: 99%
“…We used a vocabulary of 28.4k unique tokens. Out of this, 15.3k tokens are 5-letter way-points from [38], and 5.2k tokens are airline designators for call-signs 6,7 . Pronunciations were generated using Phonetisaurus tool [39] with a G2P model trained on Librispeech lexicon [40].…”
Section: Speech-to-textmentioning
confidence: 99%
“…Artificial intelligence has achieved new goals in developing intelligent algorithms for processing languages (both text and audio), making humans more interactive with the systems. Many languages have been computationally modelled with a lot of effort [1][2][3]. In the early days of voice recognition and speaker recognition, speech signals were portrayed as a normal (smooth input without first-and second-order derivatives) input to systems.…”
Section: Introductionmentioning
confidence: 99%