Abstract. In this paper we present a system for automatically predicting prosodic breaks in synthesized speech using the Random Forests classifier. In our experiments the classifier is trained on a large dataset consisting of audiobooks, which is automatically labeled with phone, word, and pause labels. To provide part of speech (POS) tags in the text, a rule-based POS tagger is used. We use crossvalidation in order to be able to examine not only the results for a specific subset of data but also the systems reliability across the dataset. The experimental results demonstrate that the system shows good and consistent results on the audiobook database; the results are poorer and less robust on a smaller database of read speech even though part of that database was labeled manually.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.