Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network

Qi, Jun; Hu, Hu; Wang, Yannan; Yang, Chao-Han Huck; Siniscalchi, Sabato Marco; Lee, Chin‐Hui

doi:10.1109/icassp40776.2020.9052938

Cited by 19 publications

(8 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The matrix associated with a DNN hidden layer corresponds to two matrices given the ranks, and the DNN input vector is reshaped into a higher-order input tensor. We have shown that the TT decomposition can keep the representation power of DNN [17]. In [17], we have also demonstrated that for a tensor-to-vector function…”

Section: Dnn-tt Based Tensor-to-vector Regressionmentioning

confidence: 73%

“…We have shown that the TT decomposition can keep the representation power of DNN [17]. In [17], we have also demonstrated that for a tensor-to-vector function…”

Section: Dnn-tt Based Tensor-to-vector Regressionmentioning

confidence: 73%

“…Besides, TT-DNN is a compact representation for a fully-connected (FC) layers of DNN into a tensor-train (TT) format [16]. In [17], we were the first to attempt a tensor-train deep neural network (TT-DNN) to tackle the multi-channel speech enhancement task and also demonstrate that the TT representation of a DNN does not cause the quality degradation of the enhanced speech, and it also results in a significant reduction of the model parameters. More importantly, the quality of speech enhancement can be improved over the DNN counterpart by allowing the TT-DNN parameters to grow.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

Wang

et al. 2020

Interspeech 2020

Self Cite

View full text Add to dashboard Cite

This paper investigates different trade-offs between the number of model parameters and enhanced speech qualities by employing several deep tensor-to-vector regression models for speech enhancement. We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality and a tensor-train (TT) output layer on the top to reduce model parameters. We first derive a new upper bound on the generalization power of the convolutional neural network (CNN) based vector-to-vector regression models. Then, we provide experimental evidence on the Edinburgh noisy speech corpus to demonstrate that, in singlechannel speech enhancement, CNN outperforms DNN at the expense of a small increment of model sizes. Besides, CNN-TT slightly outperforms the CNN counterpart by utilizing only 32% of the CNN model parameters. Besides, further performance improvement can be attained if the number of CNN-TT parameters is increased to 44% of the CNN model size. Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

show abstract

Section: Dnn-tt Based Tensor-to-vector Regressionmentioning

confidence: 73%

“…We have shown that the TT decomposition can keep the representation power of DNN [17]. In [17], we have also demonstrated that for a tensor-to-vector function…”

Section: Dnn-tt Based Tensor-to-vector Regressionmentioning

confidence: 73%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

Wang

et al. 2020

Interspeech 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…Since it has been shown that RNNs overfit very quickly [11], various regularization methods, such as early stopping or small and under-specified models [12], have to be used during the RNN training stage. Although dropout is normally taken as a simple and effective regularization to overcome the problem of overfitting in deep neural networks [13,14], it has been concluded that the naive dropout regularization to recurrent weights in RNNs cannot reliably solve the RNN overfitting problem because noise added in the recurrent connections leads to model instabilities [15].…”

Section: Introductionmentioning

confidence: 99%

Variational Inference-Based Dropout in Recurrent Neural Networks for Slot Filling in Spoken Language Understanding

Qi¹,

Liu²,

Tejedor³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

This paper proposes to generalize the variational recurrent neural network (RNN) with variational inference (VI)-based dropout regularization employed for the long short-term memory (LSTM) cells to more advanced RNN architectures like gated recurrent unit (GRU) and bi-directional LSTM/GRU. The new variational RNNs are employed for slot filling, which is an intriguing but challenging task in spoken language understanding. The experiments on the ATIS dataset suggest that the variational RNNs with the VIbased dropout regularization can significantly improve the naive dropout regularization RNNs-based baseline systems in terms of F-measure. Particularly, the variational RNN with bi-directional LSTM/GRU obtains the best F-measure score.

show abstract

“…Thus, this paper aims at bridging this gap. In particular, we investigate MAE and MSE in terms of performance error bounds and robustness against various noises in the context of the deep neural network (DNN) based vector-to-vector regression, since DNNs offer better representation power and generalization capability in large-scale regression problems, such as those addressed in [18]- [21].…”

Section: Introductionmentioning

confidence: 99%

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression

Qi,

Du,

Siniscalchi

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for the deep neural network (DNN) based vector-to-vector regression. The goal of this work is two-fold: (i) presenting performance bounds of MAE, and (ii) demonstrating new properties of MAE that make it more appropriate than mean squared error (MSE) as a loss function for DNN based vector-to-vector regression. First, we show that a generalized upper-bound for DNN-based vectorto-vector regression can be ensured by leveraging the known Lipschitz continuity property of MAE. Next, we derive a new generalized upper bound in the presence of additive noise. Finally, in contrast to conventional MSE commonly adopted to approximate Gaussian errors for regression, we show that MAE can be interpreted as an error modeled by Laplacian distribution. Speech enhancement experiments are conducted to corroborate our proposed theorems and validate the performance advantages of MAE over MSE for DNN based regression.

show abstract

Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network

Cited by 19 publications

References 25 publications

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

Variational Inference-Based Dropout in Recurrent Neural Networks for Slot Filling in Spoken Language Understanding

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression

Contact Info

Product

Resources

About