ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054481
|View full text |Cite
|
Sign up to set email alerts
|

3-D Acoustic Modeling for Far-Field Multi-Channel Speech Recognition

Abstract: Automatic speech recognition in multi-channel reverberant conditions is a challenging task. The conventional way of suppressing the reverberation artifacts involves a beamforming based enhancement of the multi-channel speech signal, which is used to extract spectrogram based features for a neural network acoustic model. In this paper, we propose to extract features directly from the multi-channel speech signal using a multi variate autoregressive (MAR) modeling approach, where the correlations among all the th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
1

Relationship

3
0

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 36 publications
0
7
0
Order By: Relevance
“…This is the same frequency decomposition used in the FDLP and FDLP-dereverberation experiments. The acoustic model is the 2-D CLSTM network described in Purushothaman et al (2020).…”
Section: Experiments and Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…This is the same frequency decomposition used in the FDLP and FDLP-dereverberation experiments. The acoustic model is the 2-D CLSTM network described in Purushothaman et al (2020).…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…We use FDLP features (Purushothaman et al, 2020) for far-field speech. This paper extends the prior work done in (Purushothaman et al, 2020b) by proposing a joint neural dereverberation which forms an elegant neural learning framework.…”
Section: Related Prior Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We use FDLP features [31] for far-field speech. This paper extends the prior work done in [32] by proposing a joint neural dereverberation which forms an elegant neural learning framework.…”
Section: Related Prior Workmentioning
confidence: 99%
“…The architecture of the acoustic model is based on convolutional long short term memory (CLSTM) networks (Figure 1). The acoustic model corresponds to 2-D CLSTM network described in [31], consisting of 4 layers of CNN, a layer of LSTM with 1024 units performing recurrence over frequency and 3 fully connected layers with batch normalization.…”
Section: Acoustic Modelmentioning
confidence: 99%