In this paper we cope with the task of modeling phoneme duration for Greek speech synthesis. In particular we apply well established machine learning approaches to the WCL-1 prosodic database for predicting segmental durations from shallow morphosyntactic and prosodic features. We employ decision trees, instance based learning and linear regression. Trained on a 5500 word database, both CART and linear regression models proved to be the most effective in terms for the task with a root mean square error of 0.0252 and 0.0251 respectively.
We provide large-sample distribution theory for support vector regression (SVR) with l1-norm along with error bars for the SVR regression coefficients. Although a classical Wald confidence interval obtains from our theory, its implementation inherently depends on the choice of a tuning parameter that scales the variance estimate and thus the width of the error bars. We address this shortcoming by further proposing an alternative large-sample inference method based on the inversion of a novel test statistic that displays competitive power properties and does not depend on the choice of a tuning parameter.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.