Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1417
|View full text |Cite
|
Sign up to set email alerts
|

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks

Abstract: Time Delay Neural Networks (TDNNs), also known as onedimensional Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural network architecture for speech recognition. We introduce a factored form of TDNNs (TDNN-F) which is structurally the same as a TDNN whose layers have been compressed via SVD, but is trained from a random start with one of the two factors of each matrix constrained to be semi-orthogonal. This gives substantial improvements over TDNNs and performs about as well a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
279
0
2

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 415 publications
(283 citation statements)
references
References 15 publications
2
279
0
2
Order By: Relevance
“…The mouth ROI of the target speaker is fed into the LipNet to generated the visual features. The RecogNet is a TDNN network with factored time-delay neural network (TDNN-F) [33] components, which has been shown to be effective in modeling long range temporal dependencies [33]. In our experiments, the hybrid TDNN AVSR system trained with LF-MMI criterion demonstrates the stateof-the-art performance on the LRS2 dataset.…”
Section: Audio-visual Speech Recognitionmentioning
confidence: 83%
“…The mouth ROI of the target speaker is fed into the LipNet to generated the visual features. The RecogNet is a TDNN network with factored time-delay neural network (TDNN-F) [33] components, which has been shown to be effective in modeling long range temporal dependencies [33]. In our experiments, the hybrid TDNN AVSR system trained with LF-MMI criterion demonstrates the stateof-the-art performance on the LRS2 dataset.…”
Section: Audio-visual Speech Recognitionmentioning
confidence: 83%
“…Two types of neural network-based acoustic model architectures were evaluated: (1) the recently proposed TDNN-F models [21], which have been shown to be effective in underresourced scenarios, and (2) TDNN-F with added convolutional layers (CNN-TDNN-F). It has recently been shown that the locality, weight sharing and pooling properties of the convolutional layers have potential to improve the performance of ASR [26].…”
Section: Acoustic Modellingmentioning
confidence: 99%
“…The initial baseline system [11] of the CHiME-5 challenge uses a Time Delay Neural Network (TDNN) acoustic model (AM). However, recently it has been shown that introducing factorized layers into the TDNN architecture facilitates training deeper networks and also improves the ASR performance [25]. This architecture has been employed in the new baseline system for the challenge [10].…”
Section: Acoustic Modelmentioning
confidence: 99%