Phonetic Temporal Neural Model for Language Identification

Tang, Zhiyuan; Wang, Dong; Chen, Yixiang; Li, Lantian; Abel, Andrew

doi:10.1109/taslp.2017.2764271

Cited by 58 publications

(30 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tang et al [10] identified languages using acoustic level feature. They combined three different methods like HMM states and Gaussians to for this purpose.…”

Section: Literature Surveymentioning

confidence: 99%

“…Generally prosodic features are used combination with acoustic features to improve accuracy of LID systems as these MFCC feature vectors carry the information about phonemes and discriminate occurrence of frequency of phonemes among languages. MFCC feature vectors are best features involved to design language identification system but some Indian languages like Telugu, Pitch and energy are also show more significant variation with other languages so that pitch and energy are suitable features to classify the Indian languages in order to increase the accuracy of system [10]. We choose 13 dimensional MFCC and 1-pitch as feature vectors and concatenated to form hybrid feature vectors with 14 dimensionalities.…”

Section: Prosodic Featuresmentioning

confidence: 99%

See 1 more Smart Citation

HMM Based Language Identification from Speech Utterances of Popular Indic Languages Using Spectral and Prosodic Features

Sadanandam¹

2021

View full text Add to dashboard Cite

Language identification system (LID) is a system which automatically recognises the languages of short-term duration of unknown utterance of human beings. It recognises the discriminate features and reveals the language of utterance that belongs to. In this paper, we consider concatenated feature vectors of Mel Frequency Cepstral Coefficients (MFCC) and Pitch for designing LID. We design a reference model one for each language using 14-dimensional feature vectors using Hidden Markov model (HMM) then evaluate against all reference models of listed languages. The likelihood value of test sample feature vectors given in the evaluation is considered to decide the language of unknown utterance of test speech sample. In this paper we consider seven Indian languages for the experimental set up and the performance of system is evaluated. The average performance of the system is 89.31% and 90.63% for three states and four states HMM for 3sec test speech utterances respectively and also it is also observed that the system gives significant results with 3sec test speech for four state HMM even though we follow simple procedure.

show abstract

“…Tang et al [10] identified languages using acoustic level feature. They combined three different methods like HMM states and Gaussians to for this purpose.…”

Section: Literature Surveymentioning

confidence: 99%

Section: Prosodic Featuresmentioning

confidence: 99%

HMM Based Language Identification from Speech Utterances of Popular Indic Languages Using Spectral and Prosodic Features

Sadanandam¹

2021

View full text Add to dashboard Cite

show abstract

“…Applying ASR methods to SLI, e.g. by training language classifiers on phoneme embeddings extracted from a phoneme recognizer, has shown to work very well [19,20,21,22]. While end-to-end SLI performed directly on labeled speech features is usually outperformed by models that utilize phoneme level information, it is sometimes possible to reach good performance also with end-to-end models [6,23].…”

Section: End-to-end Deep Learning Sli Toolkitmentioning

confidence: 99%

Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets

Lindgren¹,

Jauhiainen²,

Kurimo³

2020

Interspeech 2020

View full text Add to dashboard Cite

In this paper, we propose a software toolkit for easier end-toend training of deep learning based spoken language identification models across several speech datasets. We apply our toolkit to implement three baseline models, one speaker recognition model, and three x-vector architecture variations, which are trained on three datasets previously used in spoken language identification experiments. All models are trained separately on each dataset (closed task) and on a combination of all datasets (open task), after which we compare if the open task training yields better language embeddings. We begin by training all models end-to-end as discriminative classifiers of spectral features, labeled by language. Then, we extract language embedding vectors from the trained end-to-end models, train separate Gaussian Naive Bayes classifiers on the vectors, and compare which model provides best language embeddings for the backend classifier. Our experiments show that the open task condition leads to improved language identification performance on only one of the datasets. In addition, we discovered that increasing x-vector model robustness with random frequency channel dropout significantly reduces its end-to-end classification performance on the test set, while not affecting back-end classification performance of its embeddings. Finally, we note that two baseline models consistently outperformed all other models.

show abstract

“…1 (a) and (b) respectively. The third one is based on the recently proposed phonetic temporal neural (PTN) model [22], where an auxiliary phonetic model produces phonetic feature, and an RNN LID model is used to identify the language. The architecture is shown in Fig.…”

Section: B Dnn Systemsmentioning

confidence: 99%

AP17-OLR challenge: Data, plan, and baseline

Tang

Wang

Chen

et al. 2017

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Self Cite

View full text Add to dashboard Cite

We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR. Compared to the event last year (AP16-OLR), the new challenge involves more languages and focuses more on short utterances. The data is offered by SpeechOcean and the NSFC M2ASR project. Two types of baselines are constructed to assist the participants, one is based on the i-vector model and the other is based on various neural networks. We report the baseline results evaluated with various metrics defined by the AP17-OLR evaluation plan and demonstrate that the combined database is a reasonable data resource for multilingual research. All the data is free for participants, and the Kaldi recipes for the baselines have been published online.

show abstract

Phonetic Temporal Neural Model for Language Identification

Cited by 58 publications

References 35 publications

HMM Based Language Identification from Speech Utterances of Popular Indic Languages Using Spectral and Prosodic Features

HMM Based Language Identification from Speech Utterances of Popular Indic Languages Using Spectral and Prosodic Features

Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets

AP17-OLR challenge: Data, plan, and baseline

Contact Info

Product

Resources

About