Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-51
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement

Abstract: The aim of this paper is to investigate the benefit of information from a speaker change detection system based on Convolutional Neural Network (CNN) when applied to the process of accumulation of statistics for an i-vector generation. The investigation is carried out on the problem of diarization. In our system, the output of the CNN is a probability value of a speaker change in a conversation for a given time segment. According to this probability, we cut the conversation into short segments that are then re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 29 publications
(25 citation statements)
references
References 26 publications
0
25
0
Order By: Relevance
“…For the embedding extraction module, recent work [2,3,7] has shown that the diarization performance can be significantly improved by replacing i-vectors [5] with neural network embeddings, a.k.a. d-vectors [6,8].…”
Section: Introductionmentioning
confidence: 99%
“…For the embedding extraction module, recent work [2,3,7] has shown that the diarization performance can be significantly improved by replacing i-vectors [5] with neural network embeddings, a.k.a. d-vectors [6,8].…”
Section: Introductionmentioning
confidence: 99%
“…The number of speakers is assumed to be known in advance. Figure. 8 shows the DER of 'TL 7' varies with k of soft prior (22). According to the variation trend, we choose k = 10 in our experiment.…”
Section: Experiments Results With Tlmentioning
confidence: 99%
“…For comparison, we also show the results of the offline system (adapted from our previous works [8,18]). Table 1.…”
Section: Resultsmentioning
confidence: 99%
“…Decision threshold for the online approach was θ = 0.6. Offline results (except oracle) were adapted from [8,18]. The table shows that in the offline scenario, the naïve fixed length segmentation produces reasonable results (likely due to resegmentation [16]), although it is surpassed by the CNN-based approach.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation