Improved CEM for Speech Harmonic Enhancement in Single Channel Noise Suppression

Song, Yanjue; Madhu, Nilesh

doi:10.1109/taslp.2022.3190725

Cited by 4 publications

(14 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, according to the source-filter model, the enhanced signal is decomposed into the excitation signal

and the envelope

, and each component can be enhanced individually. The enhancement of the speech excitation signal has been discussed in References [ 4 , 5 , 23 ], showing that the idealised excitation signal

brings the benefit of recovering the weak or lost harmonics in the initial speech estimate.…”

Section: Speech Enhancement Frameworkmentioning

confidence: 99%

“…While the excitation signal can be modeled by straightforward mathematical equations due to its periodic nature in the voiced frames with the largest energy [ 4 , 5 , 23 ], data-driven methods are more common in the estimation of the speech envelopes as in References [ 10 , 11 , 12 , 13 ]. If the underlying clean-speech envelope can be accurately estimated from the distorted or noisy signal envelope, it should improve the final speech estimate.…”

Section: Speech Enhancement Frameworkmentioning

confidence: 99%

“…Then, according to the source-filter model, the enhanced signal is decomposed into the excitation signal R l (m) and the envelope H l (m), and each component can be enhanced individually. The enhancement of the speech excitation signal has been discussed in References [4,5,23], showing that the idealised excitation signal R l (m) brings the benefit of recovering the weak or lost harmonics in the initial speech estimate. While the excitation signal can be modeled by straightforward mathematical equations due to its periodic nature in the voiced frames with the largest energy [4,5,23], data-driven methods are more common in the estimation of the speech envelopes as in References [10][11][12][13].…”

Section: Speech Enhancement Frameworkmentioning

confidence: 99%

“…The enhancement of the speech excitation signal has been discussed in References [4,5,23], showing that the idealised excitation signal R l (m) brings the benefit of recovering the weak or lost harmonics in the initial speech estimate. While the excitation signal can be modeled by straightforward mathematical equations due to its periodic nature in the voiced frames with the largest energy [4,5,23], data-driven methods are more common in the estimation of the speech envelopes as in References [10][11][12][13]. If the underlying clean-speech envelope can be accurately estimated from the distorted or noisy signal envelope, it should improve the final speech estimate.…”

Section: Speech Enhancement Frameworkmentioning

confidence: 99%

“…In our recent work [ 4 ], for example, we improve the speech harmonic recovery method termed cepstral excitation manipulation (CEM) [ 5 ] using the source-filter model of speech production to highlight its periodic structure. In this model, the speech signal is decomposed into an excitation and an envelope component in order to represent the excitation source and the vocal tract filter, respectively.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement

Song

Madhu

2023

Sensors

Self Cite

View full text Add to dashboard Cite

Using the source-filter model of speech production, clean speech signals can be decomposed into an excitation component and an envelope component that is related to the phoneme being uttered. Therefore, restoring the envelope of degraded speech during speech enhancement can improve the intelligibility and quality of output. As the number of phonemes in spoken speech is limited, they can be adequately represented by a correspondingly limited number of envelopes. This can be exploited to improve the estimation of speech envelopes from a degraded signal in a data-driven manner. The improved envelopes are then used in a second stage to refine the final speech estimate. Envelopes are typically derived from the linear prediction coefficients (LPCs) or from the cepstral coefficients (CCs). The improved envelope is obtained either by mapping the degraded envelope onto pre-trained codebooks (classification approach) or by directly estimating it from the degraded envelope (regression approach). In this work, we first investigate the optimal features for envelope representation and codebook generation by a series of oracle tests. We demonstrate that CCs provide better envelope representation compared to using the LPCs. Further, we demonstrate that a unified speech codebook is advantageous compared to the typical codebook that manually splits speech and silence as separate entries. Next, we investigate low-complexity neural network architectures to map degraded envelopes to the optimal codebook entry in practical systems. We confirm that simple recurrent neural networks yield good performance with a low complexity and number of parameters. We also demonstrate that with a careful choice of the feature and architecture, a regression approach can further improve the performance at a lower computational cost. However, as also seen from the oracle tests, the benefit of the two-stage framework is now chiefly limited by the statistical noise floor estimate, leading to only a limited improvement in extremely adverse conditions. This highlights the need for further research on joint estimation of speech and noise for optimum enhancement.

show abstract

“…Then, according to the source-filter model, the enhanced signal is decomposed into the excitation signal

and the envelope

, and each component can be enhanced individually. The enhancement of the speech excitation signal has been discussed in References [ 4 , 5 , 23 ], showing that the idealised excitation signal

brings the benefit of recovering the weak or lost harmonics in the initial speech estimate.…”

Section: Speech Enhancement Frameworkmentioning

confidence: 99%

Section: Speech Enhancement Frameworkmentioning

confidence: 99%

Section: Speech Enhancement Frameworkmentioning

confidence: 99%

Section: Speech Enhancement Frameworkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement

Song

Madhu

2023

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

Source number of single-channel signals intelligent estimation via signal reconstruction

Zhang

Gao

2023

Digital Signal Processing

View full text Add to dashboard Cite

Aiding Speech Harmonic Recovery in DNN-Based Single Channel Noise Reduction Using Cepstral Excitation Manipulation (CEM) Components

Song

Madhu

2023

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Weak harmonics of voiced speech segments are often lost during the process of noise suppression -especially at low SNRs. This leads to a distortion in the harmonic structure, and an accompanying loss in quality. In this paper, inspired by previous work on speech harmonic enhancement using statistical methods, we present a loss function component we term cepstral excitation manipulation (CEM) loss, which is constructed based on the fundamental frequency-related cepstral coefficients. This component can be introduced to the training of state-of-the-art architectures and its benefit is benchmarked, here, on CRUSE. Experiments show that the proposed loss function component nicely supplements standard loss functions and the harmonic structure is better preserved. On average, the best system improves by 0.4 on PESQ and 0.47 on DNSMOS compared to the noisy input. Substantial improvements are primarily in low SNRs (-5 dB to 5 dB) -the range for which harmonic recovery is most required.

show abstract

Improved CEM for Speech Harmonic Enhancement in Single Channel Noise Suppression

Cited by 4 publications

References 19 publications

Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement

Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement

Source number of single-channel signals intelligent estimation via signal reconstruction

Aiding Speech Harmonic Recovery in DNN-Based Single Channel Noise Reduction Using Cepstral Excitation Manipulation (CEM) Components

Contact Info

Product

Resources

About