Adaptive Blind Audio Source Extraction Supervised By Dominant Speaker Identification Using X-Vectors

Janský, Jakub; Málek, Jiřı́; Cmejla, Jaroslav; Kounovsky, Tomas; Koldovsky, Zbynek; Žďánský, Jindřich

doi:10.1109/icassp40776.2020.9054693

Cited by 15 publications

(39 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The current paper extends the work [4] concerning the extraction of a moving SOI. Two contributions are discussed.…”

Section: Introductionsupporting

confidence: 57%

“…The drawback of this block-wise approach lies in difficult tuning of the interval length or the recursion weight. An adaptive fast converging IVE algorithm for simple acoustic conditions was proposed in [4].…”

Section: Introductionmentioning

confidence: 99%

“…Piloting using voice activity detection was proposed in [11] for mixtures of a single speaker and background noise. For mixtures of multiple speakers, detection of SOI dominance relying on X-vectors [12] was presented in [4].…”

Section: Introductionmentioning

confidence: 99%

“…When utilized in speaker identification, it is usually assumed that the analyzed signal contains single speaker only. However, it was shown in [4] that, when two speakers are simultaneously active, the dominant one is identified reliably. This phenomenon can be used to control the IVE convergence towards SOI.…”

Section: Introductionmentioning

confidence: 99%

“…1) The extraction is performed using BSE method based on the novel CSV mixing model. Its applicability to longer mixture intervals results into more accurate SOI extraction compared to the block-wise approach from [4]. 2) The computation of reverberation/noise-robust X-vectors is discussed.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Blind Extraction of Moving Audio Source in a Challenging Environment Supported by Speaker Identification Via X-Vectors

Málek

Janský

Kounovsky

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

We propose a novel approach for semi-supervised extraction of a moving audio source of interest (SOI) applicable in reverberant and noisy environments. The blind part of the method is based on independent vector extraction (IVE) and uses the recently proposed constant separating vector (CSV) mixing model. This model allows for changes of mixing parameters within the processed interval of the mixture, which potentially leads to higher accuracy of SOI estimation. The supervised part of the method concerns a pilot signal, which is related to the SOI and ensures the convergence of the blind method towards the SOI. The pilot is based on robust detection of frames where SOI is dominant via speaker embeddings called X-vectors. Robustness of the detection is achieved through augmentation of the data for the supervised training of the X-vectors. The pilot-supported extraction yields significantly better performance compared to its unsupervised counterpart identifying SOI solely using the initialization.

show abstract

“…The current paper extends the work [4] concerning the extraction of a moving SOI. Two contributions are discussed.…”

Section: Introductionsupporting

confidence: 57%

Section: Introductionmentioning

confidence: 99%