The DYPSA algorithm for estimation of glottal closure instants in voiced speech

Kounoudes, Anastasis; Naylor,; Brookes,

doi:10.1109/icassp.2002.1005748

Cited by 33 publications

(13 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, the method is especially promising for application to the "classic" low-frequency problem of the inversion to the vocal tract shape from the speech signal, 45 although further consideration must be given to the deconvolution of the glottal wave form 46,47 and to calibration of the scale factor.…”

Section: Discussionmentioning

confidence: 98%

Inverse potential scattering in duct acoustics

Forbes¹,

Pike

Sharp

et al. 2006

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

The inverse problem of the noninvasive measurement of the shape of an acoustical duct in which one-dimensional wave propagation can be assumed is examined within the theoretical framework of the governing Klein-Gordon equation. Previous deterministic methods developed over the last 40 years have all required direct measurement of the reflectance or input impedance but now, by application of the methods of inverse quantum scattering to the acoustical system, it is shown that the reflectance can be algorithmically derived from the radiated wave. The potential and area functions of the duct can subsequently be reconstructed. The results are discussed with particular reference to acoustic pulse reflectometry.

show abstract

Section: Discussionmentioning

confidence: 98%

Inverse potential scattering in duct acoustics

Forbes¹,

Pike

Sharp

et al. 2006

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

show abstract

“…A similar idea for evaluating epoch extraction methods focused on determining GCIs was introduced in [33], being used later in [22] and [35]. However, other works, such as [30] and [43], propose to first align the marks before conducting the comparison. Nevertheless, the misalignments may lead to inaccurate evaluation results.…”

Section: Evaluation Measurementioning

confidence: 99%

“…Pitch marking algorithms are focused on determining the temporal position of the frame periods of voiced speech [30], according to a predefined local criterion, e.g., 1) the maximum positive/negative peak [18], [28], [31], [32], 2) the minimum before the zero crossing [10], 3) the GCI estimated from the speech signal [26], [30], [33], its wavelet transform [34], [35], or the EGG signal [10], [11], [25], [28], among others.…”

Section: Towards Reliable Pitch Markingmentioning

confidence: 99%

“…As a first step towards obtaining reliable pitch marks, most current pitch markers incorporate a post-processing step for global correction, generally based on dynamic programming (DP) (following the methodology introduced in [37]). DP is used to find the sequence of candidate pitch marks that minimizes a cost function, which weights candidates reliability (e.g., [18], [28], [30], [33]). Nevertheless, these cost functions are quite complex stands for pitch marks located at local maxima.…”

Section: Towards Reliable Pitch Markingmentioning

confidence: 99%

See 1 more Smart Citation

Reliable Pitch Marking of Affective Speech at Peaks or Valleys Using Restricted Dynamic Programming

Álías

Munné

2010

IEEE Trans. Multimedia

View full text Add to dashboard Cite

The affective communication channel plays a key role in multimodal human-computer interaction. In this context, the generation of realistic talking-heads expressing emotions both in appearance and speech is of great interest. The synthetic speech of talking-heads is generally obtained from a text-to-speech (TTS) synthesizer. One of the dominant techniques for achieving high-quality synthetic speech is unit-selection TTS (US-TTS) synthesis. Affective US-TTS systems are driven by affective annotated speech databases. Since affective speech involves higher acoustic variability than neutral speech, achieving trustworthy speech labeling is a more challenging task. To that effect, this paper introduces a methodology for achieving reliable pitch marking on affective speech. The proposal adjusts the pitch marks at the signal peaks or valleys after applying a three-stage restricted dynamic programming algorithm. The methodology can be applied as a post-processing of any pitch determination and pitch marking algorithm (with any local criterion for locating pitch marks), or their merging. The experiments show that the proposed methodology significantly improves the results of the input state-of-the-art markers on affective speech.Index Terms-Affective speech, dynamic programming, pitch marking, speech analysis, unit-selection text-to-speech synthesis.

show abstract

“…A dynamic programming projected phase-slope algorithm (DYPSA) for automatic estimation of glottal closure instants in voiced speech was presented in Kounoudes et al (2002) and Naylor et al (2007). The candidates for GCI were obtained from the zero-crossings of the phase-slope function derived from the energy weighted group-delay, and were refined by employing a dynamic programming algorithm.…”

Section: Dypsa Algorithm For Epoch Extractionmentioning

confidence: 99%

Epoch-based analysis of speech signals

Yegnanarayana

Gangashetty

2011

Sadhana

View full text Add to dashboard Cite

Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated. Significance of epochs in speech analysisSpeech is the output of a time-varying vocal tract system excited by a time-varying excitation. In the resulting speech signal, the information of the speech production system is embedded as relations in the sequence of values of the signal at different instants of sampling the signal. The main objective of speech signal processing is to extract the information of the time varying characteristics of the speech production system. The information is represented in the form of parameters or features derived from the signal. Knowledge at different levels, such as acoustic-phonetic, prosody, lexical, syntactic, etc. is used to interpret the message in the speech signal from the sequence of parameter or feature vectors. Thus, an algorithmic way of extracting the information in the speech signal involves operations of representation (interms of extracted parameters or features) and processing (to extract the information or message), in that order.

show abstract

The DYPSA algorithm for estimation of glottal closure instants in voiced speech

Cited by 33 publications

References 6 publications

Inverse potential scattering in duct acoustics

Inverse potential scattering in duct acoustics

Reliable Pitch Marking of Affective Speech at Peaks or Valleys Using Restricted Dynamic Programming

Epoch-based analysis of speech signals

Contact Info

Product

Resources

About