Takemaru-kun system is a real world speech-oriented guidance system located at the Ikoma-city North Community Center. The system has been operated daily from November, 2002 to provide visitors a speech interface for information retrievals. This system also aims at the field test of a speech interface and collecting actual utterance data. By analyzing and evaluation of the collected utterances, necessities of flexible processing according to the user's age group are discovered. It becomes impossible to disregard the increase of child users when the system is installed in a public place. This paper proposes an automatic approach discriminating speakers between adult and child users, which is based on a statistical learning. This proposal realizes a flexible spoken dialogue to both adult and child users. As for parameter vectors in machine learning, acoustic and linguistic properties extracted from speech recognition logarithm likelihood scores are adopted to discriminate user's age group. Although GMM-based recognition uses only acoustic properties, this method can also consider linguistic properties. In the experiments with the SVM-based screening, we obtained 92.4% discrimination rate to the actual users' utterances. The advantage of using linguistic properties is also shown. This paper also describes an overview of the Takemaru-kun system and the data collection status via the field test. Performances of child speech recognition are evaluated using collected utterances.
Intelligent robots will make a chance f o r us to use a computer in daily life. W e implemented a humanoid robot, ASKA, in our university reception desk for the computerized university guidance. ASKA can recognize a user's question utterance, and answer the user's question by its text-to-speech voice with its hand gesture and head movement. This paper describes the speech related parts of ASKA. ASKA can deal with wide task domain with 20k large vocabulary using a word trigram model and an elaborated speaker-independent acoustic model. ASKA can also make a response with keyword and key-phrase detection in the N-best speech recognition results. The word recognition rate for the reception task is 90.9%, and the rate for the out-of-domain task is 78.9%. The correct response rate for the reception task is 61.7%. Users can enjoy their questionanswering with ASKA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.