2021
DOI: 10.11591/eei.v10i4.2893
|View full text |Cite
|
Sign up to set email alerts
|

Spoken language identification using i-vectors, x-vectors, PLDA and logistic regression

Abstract: In this paper, i-vector and x-vector is used to extract the features from speech signal from local Indonesia languages, namely Javanese, Sundanese and Minang languages to help classifier identify the language spoken by the speaker. Probabilistic linear discriminant analysis (PLDA) are used as the baseline classifier and logistic regression technique are used because of prior studies showing logistic regression has better performance than PLDA for classifying speech data. Once these features are extracted. The … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…From the above result, the decision boundary has been calculated for threshold of returning a probability score between 0 and 1 [36]. Finally, the cost function represents the optimization of convex function to minimize the cost value and finding the global minimum of logistic regression.…”
Section: Logistic Regressionmentioning
confidence: 99%
“…From the above result, the decision boundary has been calculated for threshold of returning a probability score between 0 and 1 [36]. Finally, the cost function represents the optimization of convex function to minimize the cost value and finding the global minimum of logistic regression.…”
Section: Logistic Regressionmentioning
confidence: 99%
“…In speaker verification, the universal background model (UBM) is a speaker model that represents broad attributes and characteristics that can be compared to the specific person being verified [1]. Later, i-vector and x-vector [2] based ASV systems were introduced to replace the gaussian mixture model-universal background model (GMM-UBM) based ASV systems. Deep learning approaches [3] such as recurrent neural network (RNN) [4] as a backend classifier was shown the capability in speaker verification with a low equal error rate (EER).…”
Section: Introductionmentioning
confidence: 99%
“…Many studies on language identification have been conducted, with various feature extraction and classification techniques being used. Several techniques are used to extract features from the audio data, including phone recognition followed by language modeling (PRLM) [5] and parallel phone recognition followed by language modeling (PPRLM) [5] for phonetic approach or perceptual linear prediction (PLP) [5], mel-frequency cepstral coefficient (MFCC) [6]- [8], i-vector [8], [9] and x-vector [10] for the acoustic approx neural networks [11], convolutional neural networks (CNN) [12], [13], logistic regression (LR) [8], PLDA [14], Gaussian mixture model (GMM) [15], [16], support vector machine [17], [18] are among techniques used to classify the language spoken.…”
Section: Introductionmentioning
confidence: 99%
“…These findings were obtained using a dataset of speech corpora in three Indonesian local languages (Javanese, Sundanese, and Minangkabau) that were independently recorded. Abdurrahman et al [8] used an acoustic approach with ivector and x-vector extraction features with probabilistic linear discriminant analysis (PLDA) and LR classifications to study three Indonesian local languages. As a result, the x-vector performs best when using PLDA, while the i-vector outperforms the x-vector when using LR.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation