Abstract-Modern automatic speech recognition systems handle large vocabularies of words, making it infeasible to collect enough repetitions of each word to train individual word models. Instead, large-vocabulary recognizers represent each word in terms of sub-word units. Typically the sub-word unit is the phone, a basic speech sound such as a single consonant or vowel. Each word is then represented as a sequence, or several alternative sequences, of phones specified in a pronunciation dictionary. Other choices of sub-word units have been studied as well. The choice of sub-word units, and the way in which the recognizer represents words in terms of combinations of those units, is the problem of sub-word modeling. Different sub-word models may be preferable in different settings, such as high-variability conversational speech, high-noise conditions, low-resource settings, or multilingual speech recognition. This article reviews past, present, and emerging approaches to sub-word modeling. In order to make clean comparisons between many approaches, the review uses the unifying language of graphical models.