2017
DOI: 10.1016/j.csl.2017.01.011
|View full text |Cite
|
Sign up to set email alerts
|

Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 8 publications
0
11
0
Order By: Relevance
“…There are 109, 9 and 2 conversations containing 2, 3 and 4 speakers, respectively. We follow the practice of [55] and [56], conversations with 2 speakers are examined. We use Switchboard P1-3/Cell and SRE04-06 to train UBM, T and PLDA parameters.…”
Section: Experiments Results With Callhome97mentioning
confidence: 99%
“…There are 109, 9 and 2 conversations containing 2, 3 and 4 speakers, respectively. We follow the practice of [55] and [56], conversations with 2 speakers are examined. We use Switchboard P1-3/Cell and SRE04-06 to train UBM, T and PLDA parameters.…”
Section: Experiments Results With Callhome97mentioning
confidence: 99%
“…To confirm this fact, we display in Figure 3 information similar to that in Figure 1 but with PMFs computed only on speech parts when Figure 4 proposes the PMFs of the nonspeech parts (both experiments are using the same VAD process). We apply a very simple energy VAD, using the same approach as in [31] and [32]. When there is almost no difference between genuine and spoofing speech PMFs for the speech part, a significant difference is observed for the non-speech part.…”
Section: Pmfs Of Genuine and Spoofing Speechmentioning
confidence: 99%
“…The histogram of waveform amplitude is determined by dividing the bin counts by the number of samples in the signal, thereby giving the signal amplitude probability mass function (PMF). The entropy of the signal amplitude PMF is then determined according to the standard approach given by: 16 (1)…”
Section: Entropy Validationmentioning
confidence: 99%
“…Accordingly, VAD was applied to TIMIT and RSR datasets, but not to the ASVspoof datasets. VAD is performed according to approach described in [15] and [16]. Every 100ms a norm-2 signal was calculated according to En = 1599 k=0 s 2 (1600n + k) 0.5 .…”
Section: Pre-processingmentioning
confidence: 99%