2019
DOI: 10.48550/arxiv.1905.06286
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

End-to-End Multi-Channel Speech Separation

Rongzhi Gu,
Jian Wu,
Shi-Xiong Zhang
et al.

Abstract: The end-to-end approach for single-channel speech separation has been studied recently and shown promising results. This paper extended the previous approach and proposed a new endto-end model for multi-channel speech separation. The primary contributions of this work include 1) an integrated waveform-in waveform-out separation system in a single neural network architecture. 2) We reformulate the traditional short time Fourier transform (STFT) and inter-channel phase difference (IPD) as a function of time-doma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
44
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 25 publications
(46 citation statements)
references
References 22 publications
2
44
0
Order By: Relevance
“…Because the source code of TSB is not publicly available, we implemented this algorithm in [26] and the reproduced results shown in "TSB (our), TSB-IPD (our)" are slightly better than the ones in [9]. Instead of using IPD to exploit the multi-channel spatial information, the parallel encoder (Para-Enc) [16] was proposed Table 1: SDR/SI-SDR (dB) performance of different target speech extraction systems. "SA" represents the speaker adaptation is also performed on the parallel encoder output.…”
Section: Results In Sdr/si-sdrmentioning
confidence: 99%
See 1 more Smart Citation
“…Because the source code of TSB is not publicly available, we implemented this algorithm in [26] and the reproduced results shown in "TSB (our), TSB-IPD (our)" are slightly better than the ones in [9]. Instead of using IPD to exploit the multi-channel spatial information, the parallel encoder (Para-Enc) [16] was proposed Table 1: SDR/SI-SDR (dB) performance of different target speech extraction systems. "SA" represents the speaker adaptation is also performed on the parallel encoder output.…”
Section: Results In Sdr/si-sdrmentioning
confidence: 99%
“…This decorrelation is performed on each dimension of all the multi-channel encoder representations of input mixtures, it is used to extract the inter-channel differential spatial information to learn difference between individual source signals of input mixture. Results in [15] have already shown that our original CD significantly improved the TSE performance over IPD features, however, the performance gains over the parallel encoder architecture [16] are still limited.…”
Section: Introductionmentioning
confidence: 90%
“…In our model, both STFT and iSTFT are implemented by convolu- Architecture of the proposed HGCN. tion [17]. So, the input to the encoder is the noisy complex spectrum, denoted as S = Cat(S r , S i ) ∈ R T ×2F , where S r and S i represent the real and imaginary parts of the spectrum respectively.…”
Section: Coarse Enhancement Modulementioning
confidence: 99%
“…As is illustrated in Figure 2, three types of audio features including the complex spectrum, the inter-microphone phase differences (IPDs) [25] and location-guided angle feature (AF) [26,27] are adopted as the audio inputs. The complex spectrum of all the microphone array channels are first computed through short-time Fourier transform (STFT).…”
Section: Audio Inputsmentioning
confidence: 99%