Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated.
Significance of epochs in speech analysisSpeech is the output of a time-varying vocal tract system excited by a time-varying excitation. In the resulting speech signal, the information of the speech production system is embedded as relations in the sequence of values of the signal at different instants of sampling the signal. The main objective of speech signal processing is to extract the information of the time varying characteristics of the speech production system. The information is represented in the form of parameters or features derived from the signal. Knowledge at different levels, such as acoustic-phonetic, prosody, lexical, syntactic, etc. is used to interpret the message in the speech signal from the sequence of parameter or feature vectors. Thus, an algorithmic way of extracting the information in the speech signal involves operations of representation (interms of extracted parameters or features) and processing (to extract the information or message), in that order.