The precise simulation of voice production is a challenging task, especially when real-time performances are sought. To fulfill real-time constraints, most articulatory vocal synthesizers have to rely on highly simplified acoustic and anatomical models, based on 1D wave propagation and on the usage of vocal tract area functions. In this work, we present a 2D propagation model, designed to simulate the air flow traveling through the midsagittal contour of the vocal tract. Building on the work by Allen et al. [Andrew Allen and Nikunj Raghuvanshi, “Aerophones in flatland: Interactive wave simulation of wind instruments,” ACM Trans. Graph. 34, Article 134 (2015)], we leverage OpenGL and GPU parallelism for a real-time precise 2D airwave simulation. The domain is divided into cells according to a Finite-Difference Time-Domain scheme and coupled with a self-oscillating two-mass vocal fold model. To investigate the system’s ability to simulate the physiology of the vocal tract and its aerodynamics, two studies are presented. First, we compare the performances in vowel production of our 2D approach with other 1D wave propagation systems in literature, using area functions. Subsequently, this case is extended by replacing area functions with 2D vocal tract contours derived from 3D MRI data.
A balance between the simplicity and speed of lumped-element vocal fold models and the completeness and complexity of continuum-models is required to achieve fast high-quality articulatory speech synthesis. We develop and implement a novel self-oscillating vocal-fold model, composed of a 1D unsteady fluid model loosely coupled with a 2D FEM structural model. The flow model is capable of robustly handling irregular geometries, different boundary conditions, closure of the glottis and unsteady flow states. A method for a fast decoupled solution of the flow equations that does not require the computation of the Jacobian is provided. The model is coupled with a 2D real-time finite-difference wave-solver for simulating vocal tract acoustics and a 1D wave-reflection analog representation of the trachea. The simulation results are shown to agree with existing data in literature, and give realistic pressure-velocity distributions, glottal width and glottal flow values. In addition, the model is more than an order of magnitude faster to run than comparable 2D Navier-Stokes fluid solvers, while better capturing transitional flow than simple Bernoulli-based flow models. The vocal fold model provides an alternative to simple lumped-element models for faster higher-quality articulatory speech synthesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.