Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1749
|View full text |Cite
|
Sign up to set email alerts
|

BUT System for DIHARD Speech Diarization Challenge 2018

Abstract: This paper presents the approach developed by the BUT team for the first DIHARD speech diarization challenge, which is based on our Bayesian Hidden Markov Model with eigenvoice priors system. Besides the description of the approach, we provide a brief analysis of different techniques and data processing methods tested on the development set. We also introduce a simple attempt for overlapped speech detection that we used for attaining cleaner speaker models and reassigning overlapped speech to multiple speakers… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
41
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 46 publications
(44 citation statements)
references
References 14 publications
3
41
0
Order By: Relevance
“…The speech enhancement module is used only for tracks 2 and 4 as a pre-processing front-end for the SAD pipeline as the diarization system did not show improvements using the enhanced audio. The scores obtained by the challenge baseline are quite high, with track 1 DER roughly in line with the performance of the best DIHARD I systems [14,15,25] and track 2 DER 5% higher than for DIHARD I (15% without enhancement), which we suspect reflects a combination of superior SAD components in those systems and the more careful segmentation for the child language and web video domains in DIHARD II. Error rates are noticeably higher for tracks 3 and 4, reaching 50.85% and 77.34% respectively, though, again, these rates are roughly in line with those observed for the best DIHARD I systems on the two most difficult domains in that challenge: restaurant and child language.…”
Section: Baseline Resultssupporting
confidence: 61%
“…The speech enhancement module is used only for tracks 2 and 4 as a pre-processing front-end for the SAD pipeline as the diarization system did not show improvements using the enhanced audio. The scores obtained by the challenge baseline are quite high, with track 1 DER roughly in line with the performance of the best DIHARD I systems [14,15,25] and track 2 DER 5% higher than for DIHARD I (15% without enhancement), which we suspect reflects a combination of superior SAD components in those systems and the more careful segmentation for the child language and web video domains in DIHARD II. Error rates are noticeably higher for tracks 3 and 4, reaching 50.85% and 77.34% respectively, though, again, these rates are roughly in line with those observed for the best DIHARD I systems on the two most difficult domains in that challenge: restaurant and child language.…”
Section: Baseline Resultssupporting
confidence: 61%
“…These clustering-based diarization methods have shown themselves to be The first author performed the work while at Center for Language and Speech Processing, Johns Hopkins University as a Visiting Scholar. effective on various datasets (see the DIHARD Challenge 2018 activities, e.g., [23][24][25]).…”
Section: Introductionmentioning
confidence: 99%
“…It is compared with two conventional methods and two simple extension of the conventional method for block processing: (i) PIT and (ii) RSAN applied to the whole mixture (called PIT batch and RSAN batch, hereafter), extensions of RSAN to perform diarization in (iii) block-online and (iv) blockoffline manners. These simple extensions are 2-stage methods similar to the conventional methods in [1,[10][11][12][13], which, based on NN, first separate the speakers, estimate associated speaker embedding vectors, and then cluster the vectors to estimate the correct association of speaker identity information among blocks. The methods (iii) and (iv) are referred to as online and offline clustering, hereafter.…”
Section: Methodsmentioning
confidence: 99%