The Speaker and Language Recognition Workshop (Odyssey 2018) 2018
DOI: 10.21437/odyssey.2018-20
|View full text |Cite
|
Sign up to set email alerts
|

Low-latency speaker spotting with online diarization and detection

Abstract: This paper introduces a new task termed low-latency speaker spotting (LLSS). Related to security and intelligence applications, the task involves the detection, as soon as possible, of known speakers within multi-speaker audio streams. The paper describes differences to the established fields of speaker diarization and automatic speaker verification and proposes a new protocol and metrics to support exploration of LLSS. These can be used together with an existing, publicly available database to assess the perf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 28 publications
0
7
0
Order By: Relevance
“…Online speaker diarization, which outputs the diarization result right after the audio segment arrives, is not an easy task, since that future information is unavailable when analyzing the current segment [215]. In history, a number of online speaker diarization and speaker tracking solutions have been reported [216,217]. Here we focus on the deep learning based ones, which can be categorized to stage-wise online diarization and end-to-end online diarization methods.…”
Section: Online Speaker Diarizationmentioning
confidence: 99%
“…Online speaker diarization, which outputs the diarization result right after the audio segment arrives, is not an easy task, since that future information is unavailable when analyzing the current segment [215]. In history, a number of online speaker diarization and speaker tracking solutions have been reported [216,217]. Here we focus on the deep learning based ones, which can be categorized to stage-wise online diarization and end-to-end online diarization methods.…”
Section: Online Speaker Diarizationmentioning
confidence: 99%
“…guests) leading to an open-set identification task. Another related task is low-latency speaker spotting [3], where a previously registered target speaker has to be detected in an audio stream.…”
Section: Introductionmentioning
confidence: 99%
“…State-of-art speaker diarization systems mostly concentrate on integrating several components: voice activity detection, speaker change detection, feature representation, and clustering [11,12]. Current research focuses primarily on the speaker model or speaker embeddings, such as Gaussian mixture models (GMM) [8,13], i-vector [14][15][16], d-vector [17,18], and x-vector [19,20], and on a better clustering method such as agglomerative hierarchical clustering or spectral clustering [19,[21][22][23]. The issue with these methods is that they cannot directly minimize the diarization error because they are based on an unsupervised algorithm.…”
Section: Introductionmentioning
confidence: 99%