2023
DOI: 10.1109/taslp.2023.3237167
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 69 publications
0
1
0
Order By: Relevance
“…For acoustic models, a multi-stream network based on Fourier transform uses two independent branches to process the real and imaginary parts and then fuse them [54]. SHC features are used as inputs to this framework, called MS.…”
Section: Baselinesmentioning
confidence: 99%
“…For acoustic models, a multi-stream network based on Fourier transform uses two independent branches to process the real and imaginary parts and then fuse them [54]. SHC features are used as inputs to this framework, called MS.…”
Section: Baselinesmentioning
confidence: 99%
“…Finally, to put the reported numbers in context, Table 3 demonstrates the performance of the proposed systems along with previous studies on TORGO [20,24,33].…”
Section: Multi-stream Adsr Systemsmentioning
confidence: 99%
“…Raw signal representations such as raw waveform [14][15][16][17], raw magnitude [18], raw phase [19], raw real and imaginary parts [20] and, raw source and filter components [21] have been recently applied in acoustic modelling for typical speech. Compared with the task-blind hand-crafted features such as MFCC, the raw representations are richer information-wise.…”
Section: Introductionmentioning
confidence: 99%
“…Their experimental results based on the TORGO database showed that parametric CNNs outperform non-parametric CNNs, with an average WER reaching up to 35.9% tested on dysarthric speech. Loweimi, et al [103] used the raw real and imaginary parts of the Fourier transform of speech signals to investigate the multi-stream acoustic modelling approach. In their framework, the real and imaginary parts are treated as two streams of information, pre-processed via separate convolutional networks, and they combined at an optimal level of abstraction, followed by further post-processing via recurrent and fully connected layers of neural networks.…”
Section: Deep Learning Technologies Of Asr For Dysarthric Speechmentioning
confidence: 99%