“…While the speech processing community explores end-to-end methods to detect and control the overall personal and emotional aspects of speech, including fine-grained features like pitch, tone, speech rate, cadence, and accent (Valle et al, 2020), applied linguists and digital humanists still rely on rule-based tools (Plecháč, 2020;Anttila and Heuser, 2016;Kraxenberger and Menninghaus, 2016), some with limited generality (Navarro-Colorado, 2018;Navarro et al, 2016), or without proper evaluation (Bobenhausen, 2011). Other approaches to computational prosody make use of lexical resources with stress annotation, such as the CMU dictionary (Hopkins and Kiela, 2017;Ghazvininejad et al, 2016), are based on words in prose rather than syllables in poetry (Talman et al, 2019;Nenkova et al, 2007), are in need of an aligned audio signal (Rosenberg, 2010;Rösiger and Riester, 2015), or only model narrow domains such as iambic pentameter (Greene et al, 2010;Hopkins and Kiela, 2017;Lau et al, 2018) or Middle High German (Estes and Hench, 2016).…”