A new approach for text-independent speech segmentation is proposed. The novelty consists in a preprocessing based on critical-band perceptual analysis and an original algorithm for the individuation of phoneme boundaries. The results are promising since the method gives -74% of correct segmentation without presenting over-segmentation.
The article compares two approaches to the description of ultrasound vocal tract images for application in a "silent speech interface," one based on tongue contour modeling, and a second, global coding approach in which images are projected onto a feature space of Eigentongues. A curvaturebased lip profile feature extraction method is also presented. Extracted visual features are input to a neural network which learns the relation between the vocal tract configuration and line spectrum frequencies (LSF) contained in a one-hour speech corpus. An examination of the quality of LSF's derived from the two approaches demonstrates that the eigentongues approach has a more efficient implementation and provides superior results based on a normalized mean squared error criterion.Index Terms-image processing, speech synthesis, neural network applications, communication systems, silent speech interface
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.