ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
DOI: 10.1109/icassp49357.2023.10096553
|View full text |Cite
|
Sign up to set email alerts
|

Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses

Abstract: This paper presents a novel neural speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is a core module for direct wrapped phase prediction. This architecture consists of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 35 publications
0
2
0
Order By: Relevance
“…On the DNS Challenge dataset, as seen in Table 4, our model improved PESQ without significantly increasing the number of model parameters and maintained an advanced level in other objective metrics. On the Voicebank + DEMAND dataset, as shown in Table 5, except for slight inferiority to DEMUCS [30] in CSIG, our model similarly shows significant improvements in all metrics. Compared to other studies, the advantage of our proposed method lies in considering the crucial role of harmonic information under noise masking for speech restoration.…”
Section: Comparison With Other Modelsmentioning
confidence: 76%
See 1 more Smart Citation
“…On the DNS Challenge dataset, as seen in Table 4, our model improved PESQ without significantly increasing the number of model parameters and maintained an advanced level in other objective metrics. On the Voicebank + DEMAND dataset, as shown in Table 5, except for slight inferiority to DEMUCS [30] in CSIG, our model similarly shows significant improvements in all metrics. Compared to other studies, the advantage of our proposed method lies in considering the crucial role of harmonic information under noise masking for speech restoration.…”
Section: Comparison With Other Modelsmentioning
confidence: 76%
“…The resulting output is up sampled to obtain the predicted CIRM spectrum's real part output M r . To avoid issues of non-structural and wrapping phase jumps [30], the Imag branch employs a phase decoder with dilated DenseNet [31]. After the deconvolution block, parallel dual 2D convolution layers output pseudo-components, and a dual-parameter arctangent function activates these two components to obtain the predicted CIRM imaginary spectrum output M i , where instance normalization layers are connected to standardize the network's intermediate features.…”
Section: The Harmonic Repair Large Frame Enhancement Model Hrlf-netmentioning
confidence: 99%