2022
DOI: 10.1121/10.0009405
|View full text |Cite
|
Sign up to set email alerts
|

Deep neural architectures for dialect classification with single frequency filtering and zero-time windowing feature representations

Abstract: The goal of this study is to investigate advanced signal processing approaches [single frequency filtering (SFF) and zero-time windowing (ZTW)] with modern deep neural networks (DNNs) [convolution neural networks (CNNs), temporal convolution neural networks (TCN), time-delay neural network (TDNN), and emphasized channel attention, propagation and aggregation in TDNN (ECAPA-TDNN)] for dialect classification of major dialects of English. Previous studies indicated that SFF and ZTW methods provide higher spectro-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 51 publications
0
6
0
Order By: Relevance
“…The study in [17] suggests a set of deep neural models to classify the most known English dialects. The used deep classifiers are the time-delay neural network (TDNN), the convolution neural network (CNN), the temporal convolution neural network (TCN), and the TDNN with emphasized channel attention (ECAPA-TDNN.…”
Section: Related Workmentioning
confidence: 99%
“…The study in [17] suggests a set of deep neural models to classify the most known English dialects. The used deep classifiers are the time-delay neural network (TDNN), the convolution neural network (CNN), the temporal convolution neural network (TCN), and the TDNN with emphasized channel attention (ECAPA-TDNN.…”
Section: Related Workmentioning
confidence: 99%
“…Many feature sets have been proposed with statistical and deep learning-based classifiers. A few widely used feature sets are as follows: Mel frequency cepstrum coefficients (MFCCs); inverse MFCCs (IMFCCs) [ 15 ]; linear frequency cepstrum coefficients (LFCCs); constant Q cepstrum coefficients (CQCCs) [ 16 ]; log-power spectrum using discrete Fourier transform (DFT) [ 17 ]; Gammatonegram, group delay over the frame, referred to as GD-gram [ 18 ]; modified group delay; All-Pole Group Delay [ 19 ]; Cochlear Filter Cepstral Coefficient—Instantaneous Frequency [ 20 ]; cepstrum coefficients using single-frequency filtering [ 21 , 22 ]; Zero-Time Windowing (ZTW) [ 23 ]; Mel-frequency cepstrum using ZTW [ 24 ]; and polyphase IIR filters [ 25 ]. The human ear uses Fourier transform magnitude and neglects the phase information [ 26 ].…”
Section: Related Workmentioning
confidence: 99%
“…[1] used TDNN to predict the active power demand on a P4 bus in President Prudente. Experimental results demonstrated its validity [11]. [21]used TDNN as a facial expression classifier for an intelligent robot to establish command laws by analyzing and recognizing facial expressions to translate expressions into robot-recognizable language.…”
Section: Introductionmentioning
confidence: 99%
“…The paper [11] investigated the performance of deep neural networks, convolutional neural networks, temporal convolutional neural networks, and TDNN for English dialect classification. The results showed that TDNN and ECAPA-TDNN classifiers capture a wider temporal context, further improving the performance of the classification models.…”
Section: Introductionmentioning
confidence: 99%