Mingyu Cui scite author profile

Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two types of hyperparameters of state-of-the-art factored time delay neural networks (TDNNs): i) the left and right splicing context offsets; and ii) the dimensionality of the bottleneck linear projection at each hidden layer. These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training; Gumbel-Softmax and pipelined DARTS reducing the confusion over candidate architectures and improving the generalization of architecture selection; and Penalized DARTS incorporating resource constraints to adjust the trade-off between performance and system complexity. Parameter sharing among candidate architectures allows efficient search over up to 7 28 different TDNN systems. Experiments conducted on the 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems using manual network design or random architecture search after LHUC speaker adaptation and RNNLM rescoring. Absolute word error rate (WER) reductions up to 1.0% and relative model size reduction of 28% were obtained. Consistent performance improvements were also obtained on a UASpeech disordered speech recognition task using the proposed NAS approaches.

show abstract

A Metaheuristic for No-wait Flowshops with Variable Processing Times

Shi

et al. 2018

View full text Add to dashboard Cite

Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition

Hu¹,

Liu²,

Xie³

et al. 2022

View full text Add to dashboard Cite

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks

Xie

Liu

et al. 2022

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mingyu Cui

Recent Progress in the CUHK Dysarthric Speech Recognition System

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks

A Metaheuristic for No-wait Flowshops with Variable Processing Times

Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks

Contact Info

Product

Resources

About