SUMMARY This paper addresses the problem of voice activity detection (VAD) in noisy environments. The VAD method proposed in this paper is based on a statistical model approach, and estimates statistical models sequentially without a priori knowledge of noise. Namely, the proposed method constructs a cleanpeech/silence state transition model beforehand, and sequentially adapts the model to the noisy environment by using a switching Kalman filter when a signal is observed. In this paper, we carried out two evaluations. In the first, we observed that the proposed method significantly outperforms conventional methods as regards voice activity detection accuracy in simulated noise environments. Second, we evaluated the proposed method on a VAD evaluation framework, CENSREC-l-C. The evaluation results revealed that the proposed method significantly outperforms the baseline results of CENSREC-l-C as regards VAD accuracy in real environments. In addition, we confirmed that the proposed method helps to improve the accuracy of concatenated speech recognition in real environments. key words: voice activity detection, statistical model, switching Kalman filter, noisy environment, CENSREC-l-C
This paper describes a longitudinal analysis of the vowel development of two Japanese infants in terms of spectral resonant peaks. This study aims to investigate when and how the two infants become able to produce categorically separated vowels, and covers the ages of 4 to 60 months in order to provide detailed findings on the developmental process of speech production. The two lower spectral peaks were estimated from vowels extracted from natural spontaneous speech produced by the infants. Phoneme labeled and transcription-independent unlabeled data analyses were conducted. The labeled data analysis revealed longitudinal trends in the developmental change, which correspond to the articulation positions of the tongue and the rapid enlargement of the articulatory organs. In addition, the distribution of the two spectral peaks demonstrates the vowel space expansion that occurs with age. An unlabeled data analysis technique derived from the linear discriminant analysis method was introduced to measure the vowel space expansion quantitatively. It revealed that the infant's vowel space becomes similar to that of an adult in the early stages. In terms of both labeled and unlabeled properties, these results suggested that infants become capable of producing categorically separated vowels by 24 months.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.