We propose a method to extend a phoneme set by using a large amount of broadcast data to improve the performance of Korean spontaneous speech recognition. In the proposed method, we first extract variable-length phoneme-level segments from broadcast data and then convert them into fixed-length embedding vectors based on a long short-term memory architecture. We use decision tree-based clustering to find acoustically similar embedding vectors and then build new acoustic subword units by gathering the clustered vectors. To update the lexicon of a speech recognizer, we build a lookup table between the tri-phone units and the units derived from the decision tree. Finally, the proposed lexicon is obtained by updating the original phoneme-based lexicon by referencing the lookup table. To verify the performance of the proposed unit, we compare the proposed unit with the previous units obtained by using the segment-based k-means clustering method or the frame-based decision-tree clustering method. As a result, the proposed unit is shown to produce better performance than the previous units in both spontaneous, and read Korean speech recognition tasks.In spontaneous speech recognition, the phoneme unit has a problem of acoustically low discrimination. In more detail, the phoneme unit in spontaneous speech has a smaller inter-unit distance and a larger variance than the phoneme unit in read speech, which is one of the major factors contributing to the decrease in recognition accuracy [7,8]. In general, using a decision tree in the implicit method shows improved speech recognition accuracy when segmented from acoustically discriminative units. This is also confirmed by the fact that a speech recognizer for read speech has shown better performance when segmented based on the phoneme unit instead of the grapheme unit. Thus, if we build an acoustically discriminative unit by clustering common spectral patterns from spontaneous speech, we can expect an improvement in the performance of spontaneous speech recognition.We propose a method to improve the performance of spontaneous speech recognition by extending the phoneme set with a large amount of Korean broadcast data. The proposed unit is extracted in three steps. We first extract variable-length phoneme-level segments and then convert them into fixed-length latent vectors based on a long short-term memory (LSTM) architecture [9]. Finally, we use the decision tree-based clustering algorithm [4,10] to cluster acoustically similar latent vectors and then build a new acoustic subword unit by gathering the clustered vectors. In the unit derivation experiments, we compare the proposed and previous approaches [9,11] in terms of the fixed-length vector extraction and the clustering algorithm. The proposed unit is shown to produce better performance than the acoustic subword units obtained by previous methods, in both spontaneous and read speech recognition tasks.This paper is an extension of our previous conference paper [9] that improves the clustering method from a k-means clusteri...