Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-516
|View full text |Cite
|
Sign up to set email alerts
|

Scenario-Dependent Speaker Diarization for DIHARD-III Challenge

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…The deployment of i-vectors in TS-VAD, while successful to some extent, showed limitations when applied to multiscenario data [58]. These findings pave the way for the exploration of x-vectors as an alternative, but a simple swap of i-vectors for x-vectors did not yield an immediate boost in performance [19].…”
Section: Target Speaker Voice Activity Detectionmentioning
confidence: 97%
“…The deployment of i-vectors in TS-VAD, while successful to some extent, showed limitations when applied to multiscenario data [58]. These findings pave the way for the exploration of x-vectors as an alternative, but a simple swap of i-vectors for x-vectors did not yield an immediate boost in performance [19].…”
Section: Target Speaker Voice Activity Detectionmentioning
confidence: 97%
“…The voice activity is recognized by the characteristic's speaker signal, as extracted by utilizing the Eq. (11).…”
Section: Voice Activity Detectionmentioning
confidence: 99%
“…The major component of this differentiation is clustering, in which the segmentation involves the bottom-up, global optimization, neural network clustering, as well as up-down approach. The existing approaches for clustering contain Deep Neural Network (DNN), spectral clustering, and bottleneck-based methods [11]. The existing methods of the SD have traversed by clustering free methods or end-to-end approaches by multiple speakers' discussion [12].…”
Section: Introductionmentioning
confidence: 99%
“…Later, LSTM [63] and Transformer [20] modules are implemented along the speaker dimension of models to handle a variable number of speakers. On the other hand, the i-vectors used in TS-VAD are relatively domain-dependent, restricting the system performance on multi-scenario dataset [64]. This finding paves the way for exploring more discriminative speaker embeddings like x-vectors as an alternative.…”
Section: Target-speaker Voice Activity Detectionmentioning
confidence: 99%