Voiced speech is produced by excitation of the vocal tract system with the quasiperiodic vibrations of the vocal folds at the glottis. These excitations have become significantly stronger when the vocal folds are fully opened or about to be closed. In this work, the focus is on estimating these instants of significant excitation using temporal phase periodicity present in the speech signal. Assuming the quasiperiodic vibrations of the vocal folds as a slowly varying sinusoid, the phase of this signal is computed using the phase of the first frequency component of the discrete Fourier transform. At the peaks of the speech signal, i.e., at the locations of significant instants, the phase of this component is expected to be zero. Temporal phase function is evaluated by moving the analysis window sample by sample and the instants at which this phase function crosses zero are the significant instants in the speech signal. To analyze the performance of this technique, 30 seconds of speech data from TIMIT speech corpus is considered, uttered by both male and female speakers. The performance of this technique is compared with the manually marked instants of significant excitation, and is found to be promising. The effectiveness of this technique for different durations of analysis window is also discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.