This paper presents a method for, a n d performance of, p h o n e m e s e g m e n t a t i o n b y a n e x p e r t s y s t e m u t i l i z i n g spectrogram reading strategy a n d knowledge. T h e e x p e r t system detects phonemes i n a spectrogram a n d determines their b o u n d a r i e s as well as their c o a r s e categories. T o simulate a human expert spectrogram reading process, the system performs assumption-based inference with certainty factors, a n d top-down acoustic f e a t u r e extraction u n d e r phonetic c o n t e x t hypotheses. T h e system, i n t o w h i c h J a p a n e s e c o n s o n a n t s e g m e n t a t i o n k n o w l e d g e is incorporated, is able to detect ahout 90% of the phonemes correctly. I n particular, t h e phoneme boundaries detected b y the system a r e as accurate as those detected b y h u m a n experts. T h e result is that the phonemes obtained b y t h e expert system can h e identified using a stochastic phoneme recognition method.
The Voice Activity Detection (VAD) problem is placed into a decision theoretic framework, and the Gaussian VAD model of Sohn et al. is then shown to fit well with the framework. It is argued that the Gaussian model can be made more robust to correlation and expected spectral shapes of speech and noise by using a differential spectral representation. Such a model is formulated theoretically. The differential spectral VAD is then shown by experiment to be consistently superior to the basic Gaussian VAD in a speech recognition setting, especially for noisy environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.