“…Word-based truecasing has been the dominant approach for a long time since the introduction of the task by Lita et al (2003). Word-based models can be further categorized into generative models such as HMMs (Lita et al, 2003;Gravano et al, 2009;Beaufays and Strope, 2013;Nebhi et al, 2015) and discriminative models such as Maximum-Entropy Markov Models (Chelba and Acero, 2004), Conditional Random Fields (Wang et al, 2006), and most recently Transformer neural network models (Nguyen et al, 2019;Rei et al, 2020;Sunkara et al, 2020). Word-based models need to refine the class of mixed case words because there is a combinatorial number of possibilities of case mixing for a word (e.g., LaTeX).…”