“…This in contrast to deep learning approaches, which can jointly model timbre and F0 from a set of natural songs. Other machine learning approaches, such as those based on Hidden Markov Models (HMMs) (Saino et al, 2006;Oura et al, 2010;Nakamura et al, 2014;Pucher et al, 2016;Li & Wang, 2016), offer similar flexibility in this aspect, but tend to have other drawbacks. Notably, HMM-based methods tend to have a series of limitations inherent to the model itself, such as modeling each phoneme as a small number of discrete states with constant statistics, and many others.…”