2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7471664
|View full text |Cite
|
Sign up to set email alerts
|

Neural network based spectral mask estimation for acoustic beamforming

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
346
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 422 publications
(348 citation statements)
references
References 14 publications
1
346
0
1
Order By: Relevance
“…In the case of competing speakers, it is necessary that the network be given some additional information to identify the target. In previous works, the input of the network was either the magnitude spectrum of the mixture [7] or of the mixture processed with a simple delay-and-sum beamformer [13]. We propose to combine the magnitude spectra of the mixture observed at the omnidirectional channel W , x W (t, f ), and of the ouput of the HOA beamformer pointing toward the target,ŝ(t, f ) :…”
Section: Structure Of the Solutionmentioning
confidence: 99%
See 2 more Smart Citations
“…In the case of competing speakers, it is necessary that the network be given some additional information to identify the target. In previous works, the input of the network was either the magnitude spectrum of the mixture [7] or of the mixture processed with a simple delay-and-sum beamformer [13]. We propose to combine the magnitude spectra of the mixture observed at the omnidirectional channel W , x W (t, f ), and of the ouput of the HOA beamformer pointing toward the target,ŝ(t, f ) :…”
Section: Structure Of the Solutionmentioning
confidence: 99%
“…The application of deep neural networks (DNNs) to source separation has allowed for drastic improvement of ASR accuracy in real-world conditions [7]. DNNs were originally applied to single-channel inputs to derive a singlechannel filter, a.k.a.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…All these algorithms require the knowledge of either the direction of arrival (DOA) or the speech activity to compute the filters and are sensitive to signal mismatches [14] or detection errors [6]. Deep learning-based approaches have been proposed to estimate accurately these quantities through the prediction of a time-frequency (TF) mask [15,16,17] or of the spectrum of the desired signals [18]. Although often used in a multichannel context, most of these solutions use single-channel data as input of their deep neural networks (DNNs).…”
Section: Introductionmentioning
confidence: 99%
“…For robust ASR of a single (i.e., not overlapped) speaker, mask-based adaptive beamforming [1]- [3] has recently turned out to be highly effective. This approach was employed in the best-performing system [2], [4] in CHiME-3 [5] and CHiME-4.…”
Section: Introductionmentioning
confidence: 99%