2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2017
DOI: 10.1109/waspaa.2017.8169991
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting the intermittency of speech for joint separation and diarization

Abstract: Natural conversations are spontaneous exchanges involving two or more people speaking in an intermittent manner. Therefore one expects such conversation to have intervals where some of the speakers are silent. Yet, most (multichannel) audio source separation (MASS) methods consider the sound sources to be continuously emitting on the total duration of the processed mixture. In this paper we propose a probabilistic model for MASS where the sources may have pauses. The activity of the sources is modeled as a hid… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…There are also recent researches to jointly perform speech separation and speaker diarization. Kounades-Bastian et al [165,166] proposed to incorporate a speech activity model into speech separation based on the spatial covariance model with non-negative matrix factorization. They derived the EM algorithm to estimate separated speech and speech activity of each speaker from the multi-channel overlapped speech.…”
Section: Joint Speech Separation and Diarizationmentioning
confidence: 99%
“…There are also recent researches to jointly perform speech separation and speaker diarization. Kounades-Bastian et al [165,166] proposed to incorporate a speech activity model into speech separation based on the spatial covariance model with non-negative matrix factorization. They derived the EM algorithm to estimate separated speech and speech activity of each speaker from the multi-channel overlapped speech.…”
Section: Joint Speech Separation and Diarizationmentioning
confidence: 99%
“…Very recently, joint processing of the two tasks have been proposed in [64] (for over-determined mixtures) and in [68,69]. The models in [68] and [69] combine a diarization state model (that encodes the combination of active sources within a given set of maximum size N ) with the multichannel LGM+NMF model of [107] and with the full-rank spatial covariance matrix model of [37], respectively. In contrast to [108,58], modelling the activity of all sources jointly using a diarization state enables to exploit the potential correlations on speaker activity.…”
Section: Varying Number Of (Active) Sourcesmentioning
confidence: 99%
“…Here, block-offline processing is allowed to utilize future data, while the block-online processing is not. In [17], joint separation and diarization is attempted using spatial mixture models. This, however, requires multichannel input and does not exploit spectral information for speaker re-identification.…”
Section: Introductionmentioning
confidence: 99%