2018
DOI: 10.1109/taslp.2017.2764271
|View full text |Cite
|
Sign up to set email alerts
|

Phonetic Temporal Neural Model for Language Identification

Abstract: Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic feat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
28
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 58 publications
(30 citation statements)
references
References 35 publications
1
28
1
Order By: Relevance
“…Tang et al [10] identified languages using acoustic level feature. They combined three different methods like HMM states and Gaussians to for this purpose.…”
Section: Literature Surveymentioning
confidence: 99%
See 1 more Smart Citation
“…Tang et al [10] identified languages using acoustic level feature. They combined three different methods like HMM states and Gaussians to for this purpose.…”
Section: Literature Surveymentioning
confidence: 99%
“…Generally prosodic features are used combination with acoustic features to improve accuracy of LID systems as these MFCC feature vectors carry the information about phonemes and discriminate occurrence of frequency of phonemes among languages. MFCC feature vectors are best features involved to design language identification system but some Indian languages like Telugu, Pitch and energy are also show more significant variation with other languages so that pitch and energy are suitable features to classify the Indian languages in order to increase the accuracy of system [10]. We choose 13 dimensional MFCC and 1-pitch as feature vectors and concatenated to form hybrid feature vectors with 14 dimensionalities.…”
Section: Prosodic Featuresmentioning
confidence: 99%
“…Applying ASR methods to SLI, e.g. by training language classifiers on phoneme embeddings extracted from a phoneme recognizer, has shown to work very well [19,20,21,22]. While end-to-end SLI performed directly on labeled speech features is usually outperformed by models that utilize phoneme level information, it is sometimes possible to reach good performance also with end-to-end models [6,23].…”
Section: End-to-end Deep Learning Sli Toolkitmentioning
confidence: 99%
“…1 (a) and (b) respectively. The third one is based on the recently proposed phonetic temporal neural (PTN) model [22], where an auxiliary phonetic model produces phonetic feature, and an RNN LID model is used to identify the language. The architecture is shown in Fig.…”
Section: B Dnn Systemsmentioning
confidence: 99%