The ability of state-of-the-art automatic speech recognition (ASR) systems, which use deep neural networks (DNN), has recently been approaching that of human auditory systems. On the other hand, although measuring the intelligibility of enhanced speech signals is important for developing auditory algorithms and devices, the current measurement methods mainly rely on subjective experiments. Therefore, it would be preferable to employ an ASR system to predict the subjective speech intelligibility (SI) of enhanced speech. In this study, we evaluate the SI prediction performance of DNN-based ASR systems using phone accuracies. We found that an ASR system with multicondition training achieves the best SI prediction accuracy for enhanced speech when compared with conventional methods (STOI, HASPI) and a recently proposed technique (GEDI). In addition, since our ASR system uses only a phone language model, it offers the advantage of being able to predict intelligibility independently of prior knowledge of words.