ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746394
|View full text |Cite
|
Sign up to set email alerts
|

Continuous Speech Separation with Recurrent Selective Attention Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…A recent speech separation study [31] proposed the sourceaggregated SDR, which computes the mean over the SDR values of each output channel, this works as long as the target speaker is not absent in at least one output channel. Other studies [36], [37] used the mean squared error loss on the estimated mask or spectrogram to avoid the absent target speaker problem, while it remains unclear whether this is better as it is not directly optimizing the signal quality of the extracted speech waveform. In another study of the VAD-SE network [75], a target speaker voice activity detection (VAD) module and a speaker extraction network [75] are jointly trained.…”
Section: B Absent Target Speaker Speech Mixture With Audio Cuementioning
confidence: 99%
“…A recent speech separation study [31] proposed the sourceaggregated SDR, which computes the mean over the SDR values of each output channel, this works as long as the target speaker is not absent in at least one output channel. Other studies [36], [37] used the mean squared error loss on the estimated mask or spectrogram to avoid the absent target speaker problem, while it remains unclear whether this is better as it is not directly optimizing the signal quality of the extracted speech waveform. In another study of the VAD-SE network [75], a target speaker voice activity detection (VAD) module and a speaker extraction network [75] are jointly trained.…”
Section: B Absent Target Speaker Speech Mixture With Audio Cuementioning
confidence: 99%
“…Many studies have been proposed to improve different aspects of the CSS framework [3,4,5,6,7,8,9,10]. We introduced a modulation factor based on segment overlap ratio to dynamically adjust the separation loss [3].…”
Section: Introductionmentioning
confidence: 99%
“…We introduced a modulation factor based on segment overlap ratio to dynamically adjust the separation loss [3]. In [4], a recurrent selective attention network is used to separate one speaker at a time. The work in [5] and [6] proposed new training criteria that generalizes PIT to capture long speech contexts.…”
Section: Introductionmentioning
confidence: 99%