“…In these approaches, a single neural network is trained to recognize graphemes or even words from speech directly. Especially, the model using semantically meaningful units, such as words or sub-word (Sennrich et al, 2015), rather than graphemes have been showing promising results (Audhkhasi et al, 2017b;Li et al, 2018;Soltau et al, 2016;Zenkel et al, 2017;Palaskar and Metze, 2018;Sanabria and Metze, 2018;Rao et al, 2017;Zeyer et al, 2018).…”