“…The popular text to spectrogram models include Tacotron2 , Transformer-TTS (Li et al, 2019), FastSpeech2 (Ren et al, 2020), Fast-Pitch (Łańcucki, 2021), and Glow-TTS . In terms of voice quality the Tacotron2 model is still competitive with other models and less prone to over-fitting in low resource settings (Favaro et al, 2021;Abdelali et al, 2022;García et al, 2022;Finkelstein et al, 2022). There are multiple options for the vocoder as well like Clarinet (Ping et al, 2018), Waveglow (Prenger et al, 2019), MelGAN (Kumar et al, 2019), HiFiGAN , StyleMelGAN (Mustafa et al, 2021), and ParallelWaveGAN (Yamamoto et al, 2020).…”