Supervised domain adaptation for I-vector based speaker recognition

Garcia‐Romero, Daniel; McCree, Alan

doi:10.1109/icassp.2014.6854362

Cited by 121 publications

(129 citation statements)

References 5 publications

Supporting

Mentioning

126

Contrasting

Unclassified

Order By: Relevance

“…The dataset 'SRE-1phn' contains audio from only a single telephone number per speaker and use of such a poor phone number diversity hinders the effective estimation of within-speaker variability of in-domain. In this case, the conventional approaches [13], [14] that estimate within-speaker variability from in-domain unlabeled dataset would fail, in spite of perfect speaker label estimation, due to insufficient channel information. Singer [25] also tackled same issue and suggested dataset selection criteria to prevent this situation in advance.…”

Section: Adaptation Under Insufficient Channel Informationmentioning

confidence: 99%

“…State-of-the-art techniques from other studies are examined for comparison. For Garcia-Romero's Interpolated approach [13] referred to as system 5 in Table III, the true speaker label is used for ideal case rather than clustering with AHC algorithm. Then, WC and AC from SWB and SRE-1phn are interpolated, as indicated in Table III by "SWB + SRE-1phn".…”

Section: Performance Comparison To State-of-the-art Techniquesmentioning

confidence: 99%

“…In this case, if a dataset domain is matched with operation domain, it is called in-domain dataset; if it is mismatched, it is called out-of-domain dataset. GarciaRomero [7] found that training the UBM and total variability subspace on some types of domains have limited effects on performance improvement. Instead, the performance depends heavily on PLDA parameter estimation.…”

Section: Introductionmentioning

confidence: 99%

“…Villalba [12] introduced a variational Bayesian approach for adapting PLDA models from out-of-domain datasets to in-domain. Garcia-Romero [13] introduced a clustering approach for unlabeled in-domain datasets that uses well-known Agglomerative Hierarchical Clustering (AHC) to estimate Within-speaker Covariance (WC) and Across-speaker Covariance (AC) with the PLDA model. In this approach, the estimated in-domain WC and AC are interpolated from out-ofdomain WC and AC [7], [13], [14].…”

Section: Introductionmentioning

confidence: 99%

“…Garcia-Romero [13] introduced a clustering approach for unlabeled in-domain datasets that uses well-known Agglomerative Hierarchical Clustering (AHC) to estimate Within-speaker Covariance (WC) and Across-speaker Covariance (AC) with the PLDA model. In this approach, the estimated in-domain WC and AC are interpolated from out-ofdomain WC and AC [7], [13], [14]. Kanagasundaram [15] introduced another IDVC technique, called Inter-Dataset Variability (IDV), to capture the variability between out-ofdomain and in-domain.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel Information

Shon¹,

Mun²,

Kim

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

In real-life conditions, mismatch between development and test domain degrades speaker recognition performance. To solve the issue, many researchers explored domain adaptation approaches using matched in-domain dataset. However, adaptation would be not effective if the dataset is insufficient to estimate channel variability of the domain. In this paper, we explore the problem of performance degradation under such a situation of insufficient channel information. In order to exploit limited in-domain dataset effectively, we propose an unsupervised domain adaptation approach using Autoencoder based Domain Adaptation (AEDA). The proposed approach combines an autoencoder with a denoising autoencoder to adapt resource-rich development dataset to test domain. The proposed technique is evaluated on the Domain Adaptation Challenge 13 experimental protocols that is widely used in speaker recognition for domain mismatched condition. The results show significant improvements over baselines and results from other prior studies.

show abstract

Section: Adaptation Under Insufficient Channel Informationmentioning

confidence: 99%

Section: Performance Comparison To State-of-the-art Techniquesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%