Since the 1970s, researchers have endeavoured to recreate speech algorithmically on a digital avatar. Sparked by applications in human-robot interaction, virtual secretaries, web navigation assistance and e-learning, the ever increasing appearance of virtual characters in video games and film has made speech synthesis an area of growth. Due to major challenges such as the sensitivity of viewers to subtle nuance in speech and the complexity of mouth anatomy, realistic speech synthesis has yet to be realized.A realistic speech synthesis tool could be used in dynamic therapeutic applications as well as revolutionize the animation pipeline in the entertainment industry. In this thesis, we propose a data-driven speech synthesis method that uses Viseme Transition Units (3D animation data describing the transition between mouth shapes) in the stead of the static visemes used in classic data-driven speech synthesis methods. To test this method, viseme transitions were recorded using optical flow and blob tracking algorithms, analyzed, and imported into Autodesk Maya to dynamically animate a custom mouth rig based on user input.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.