2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003805
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition

Abstract: Despite significant efforts over the last few years to build a robust automatic speech recognition (ASR) system for different acoustic settings, the performance of the current state-of-the-art technologies significantly degrades in noisy reverberant environments. Convolutional Neural Networks (CNNs) have been successfully used to achieve substantial improvements in many speech processing applications including distant speech recognition (DSR). However, standard CNN architectures were not efficient in capturing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 38 publications
0
6
0
Order By: Relevance
“…Furthermore, different methods have been proposed to train deeper yet more generalized models, namely L1 and L2 regularizations, batch normalization, dropout layers, and more data-side solutions like data augmentation [20,21]. Owing to the ability to efficiently pick up intricate features of data, CNNs have been widely deployed in many complex machine learning tasks from image processing to speech recognition [22][23][24] . Figure 2 demonstrates our feature extraction and CNN pipeline, with a relatively shallow architecture.…”
Section: Rule-based Machine Learning Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, different methods have been proposed to train deeper yet more generalized models, namely L1 and L2 regularizations, batch normalization, dropout layers, and more data-side solutions like data augmentation [20,21]. Owing to the ability to efficiently pick up intricate features of data, CNNs have been widely deployed in many complex machine learning tasks from image processing to speech recognition [22][23][24] . Figure 2 demonstrates our feature extraction and CNN pipeline, with a relatively shallow architecture.…”
Section: Rule-based Machine Learning Algorithmsmentioning
confidence: 99%
“…Classical EEG oscillations range from below 1 Hz to around 100 Hz which have been extensively studied for various medical applications. These classical frequency bands include delta (0.5-4 Hz), theta (4-8 Hz), alpha (8)(9)(10)(11)(12), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma (30-100 Hz) [5], each of which exhibits unique underlying physiological mechanisms. EEG is used to evaluate several brain disorders such as epilepsy, Alzheimer's disease and lesions of the brain which reveals as seizure or unusually slow waves in EEG.…”
Section: Introductionmentioning
confidence: 99%
“…Human brain is capable of focusing on a single talker in a multi-speaker environment, recognizing both the identity of the talker and also the content of the speech. However, the performance of speech analysis technologies such as speaker diarization, identification and Automatic Speech Recognition (ASR) is adversely affected in presence of co-channel speech [2,3,4]. In speaker diarization, the existence of overlapping speech in the training dataset leads to generating impure speaker models, which increases diarization error [5].…”
Section: Introductionmentioning
confidence: 99%
“…Current state-of-the-art neural network-based ASR systems have advanced to nearly human performance in several eval-This project was supported in part by AFRL under contract FA8750-15-1-0205, and partially by the University of Texas at Dallas from the Distinguished University Chair in Telecommunications Engineering held by J. H. L. Hansen. uation settings [1,2]; however, these systems perform poorly for domains 1 that are not included in the original training data [3,4,5,6]. For example, if we train an ASR system using a U.S. English dataset, the performance of the system significantly degrades for other English accents (e.g., Australian, Indian, and Hispanic).…”
Section: Introductionmentioning
confidence: 99%
“…uation settings [1,2]; however, these systems perform poorly for domains 1 that are not included in the original training data [3,4,5,6]. For example, if we train an ASR system using a U.S. English dataset, the performance of the system significantly degrades for other English accents (e.g., Australian, Indian, and Hispanic).…”
Section: Introductionmentioning
confidence: 99%