A uniform phase representation for the harmonic model in speech synthesis applications

Degottex, Gilles; Erro, Daniel

doi:10.1186/s13636-014-0038-1

Cited by 50 publications

(63 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…4, the clean spectral phase exhibits a low variance with a visible harmonic structure compared to the large variance in the noisy speech. In particular, as reported in [25], at voiced frames, the shape of the glottal pulse changes smoothly hence a low variance of phase is observed.…”

Section: Harmonic Structure In Phase and Motivationsupporting

confidence: 66%

“…The harmonic structure in the clean spectral phase across time or frequency as well as across harmonics at voiced frames captured by the low variance of phase [25], inspires us to propose a time-frequency smoothing filtering approach and apply it at speech harmonics at least for voiced speech segments in order to obtain enhanced phase estimates at harmonics. In this work, we propose a method to smooth the harmonic phase across time and frequency to reduce the variance of the noisy phase at the signal harmonics.…”

Section: Harmonic Structure In Phase and Motivationmentioning

confidence: 99%

“…For phase decomposition purpose, we consider the pitch-synchronous signal segmentation proposed in [28] and reported as a favorable choice for speech analysis/synthesis using a harmonic model [25]:…”

Section: A Speech Harmonic Modelmentioning

confidence: 99%

See 2 more Smart Citations

Harmonic Phase Estimation in Single-Channel Speech Enhancement Using Phase Decomposition and SNR Information

Mowlaee

Kulmer

2015

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

In conventional single-channel speech enhancement, typically the noisy spectral amplitude is modified while the noisy phase is used to reconstruct the enhanced signal. Several recent attempts have shown the effectiveness of utilizing an improved spectral phase for phase-aware speech enhancement and consequently its positive impact on the perceived speech quality. In this paper, we present a harmonic phase estimation method relying on fundamental frequency and signal-to-noise ratio (SNR) information estimated from noisy speech. The proposed method relies on SNR-based time-frequency smoothing of the unwrapped phase obtained from the decomposition of the noisy phase. To incorporate the uncertainty in the estimated phase due to unreliable voicing decision and SNR estimate, we propose a binary hypothesis test assuming speech-present and speech-absent classes representing high and low SNRs. The effectiveness of the proposed phase estimation method is evaluated for both phase-only enhancement of noisy speech and in combination with an amplitude-only enhancement scheme. We show that by enhancing the noisy phase both perceived speech quality as well as speech intelligibility are improved as predicted by the instrumental metrics and justified by subjective listening tests.

show abstract

Section: Harmonic Structure In Phase and Motivationsupporting

confidence: 66%

Section: Harmonic Structure In Phase and Motivationmentioning

confidence: 99%

See 1 more Smart Citation

Harmonic Phase Estimation in Single-Channel Speech Enhancement Using Phase Decomposition and SNR Information

Mowlaee

Kulmer

2015

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…One of the issues with current day speech synthesizers is generation of excitation source signal, which is mainly manifested in the phase spectrum of the speech signal. Attempts to incorporate phase of speech signals into synthesis systems include usage of complex cepstrum [66], adding instantaneous phase randomness features to HMM based synthesis [67] and compensating phase mismatches in concatenative synthesis [68]. Alternatively, the proposed AP filter, which models the phase spectral characteristics of speech signal, can be used for speech synthesis.…”

Section: Speech Synthesismentioning

confidence: 99%

Analysis of Phase Spectrum of Speech Signals Using Allpass Modeling

Vijayan¹,

Murty²

2015

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

The phase spectrum of Fourier transform has received lesser prominence than its magnitude counterpart in speech processing. In this paper, we propose a method for parametric modeling of the phase spectrum, and discuss its applications in speech signal processing. The phase spectrum is modeled as the response of an allpass (AP) filter, whose coefficients are estimated from the knowledge of speech production process, especially the impulse-like nature of excitation source. A signal retaining only the phase spectral component of speech signal is derived by suppressing the magnitude spectral component, and is modeled as the output of an AP filter excited with a sequence of impulses. Entropy of energy of the input signal is minimized to estimate the coefficients of the AP filter. The resulting objective function, being nonconvex in nature, is minimized using particle swarm optimization. The group delay response of estimated AP filters can be used for accurate analysis of resonances of the vocal-tract system (VTS). The error signal associated with AP modeling provides unambiguous evidence about the instants of significant excitation of the VTS. The applications of the proposed AP modeling include, but not limited to, formant tracking, extraction of glottal closure instants, speaker verification and speech synthesis.

show abstract

“…relative phase shift [7], group delay [8], phase dispersion [9], phase distortion [10] and the complex cepstrum [6] for speech synthesis. For example, in [6] and [11], complex cepstra or a cepstrum-like representation calculated from the standard deviation of phase distortion have been modelled, respectively, using an additional independent stream in HMM-based statistical parametric speech synthesis (SPSS) to improve the quality of the vocoded speech.…”

Section: Introductionmentioning

confidence: 99%

Initial investigation of speech synthesis based on complex-valued neural networks

Yamagishi

Richmond

et al. 2016

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Although frequency analysis often leads us to a speech signal in the complex domain, the acoustic models we frequently use are designed for real-valued data. Phase is usually ignored or modelled separately from spectral amplitude. Here, we propose a complex-valued neural network (CVNN) for directly modelling the results of the frequency analysis in the complex domain (such as the complex amplitude). We also introduce a phase encoding technique to map real-valued data (e.g. cepstra or log amplitudes) into the complex domain so we can use the same CVNN processing seamlessly. In this paper, a fully complex-valued neural network, namely a neural network where all of the weight matrices, activation functions and learning algorithms are in the complex domain, is applied for speech synthesis. Results show its ability to model both complex-valued and real-valued data.

show abstract

A uniform phase representation for the harmonic model in speech synthesis applications

Cited by 50 publications

References 59 publications

Harmonic Phase Estimation in Single-Channel Speech Enhancement Using Phase Decomposition and SNR Information

Harmonic Phase Estimation in Single-Channel Speech Enhancement Using Phase Decomposition and SNR Information

Analysis of Phase Spectrum of Speech Signals Using Allpass Modeling

Initial investigation of speech synthesis based on complex-valued neural networks

Contact Info

Product

Resources

About