Nowadays text-to-speech synthesis (TTS) systems are most commonly trained using phonetic input. This is mostly due to the poor performance of the letter-to-sound (L2S) mapping (in particular with languages with opaque orthography) performed by end-to-end TTS: the empirical distribution of the words sampled in the sole training corpus cannot compete with pronunciation dictionaries. Taylor and Richmond [1] actually reported letter-to-sound errors -implicitly performed by end-to-end systems from raw text input -close to 10%.This paper nevertheless shows that speakers produce lawful phonological variations and that end-to-end TTS systems trained to accept text input -once trained adequately -can capture these variations of pronunciation that are strong markers of sociolinguistic features. We illustrate such variations on liaisons and schwas in French and r-linking in British English. We therefore advocate for restoring text input for TTS, so that the many aspects of style variations (produced by speakers as well as stylistic variations) encoded by suprasegmental features can also be reflected in actual variations of pronunciation.