One of the authentication models that are currently often used is based on biometrics, such as eye retina, fingerprint, and speech recognition. Moreover, textindependent speaker identification is one of the domains of speech recognition that has been widely studied. Short speech duration in the speaker identification process is one of the challenges in the field of speaker recognition. Accuracy is a great issue when speech duration shorter, besides identification system has to be general enough to process various languages with different dialects which have their own characteristic based on tribe and region. Therefore, the author of this study introduces the speaker identification system in multi languages that comprise of regional, Indonesian, and English with short utterance. Researchers used MFCC technique to extract voice features and CNN as the classification model. There are two kinds of dataset used, open dataset for regional and English language, and own dataset for Indonesian. Own dataset used is a voice recording of 18 persons of different gender who each read the text in several paragraphs of sentences in Indonesian. Whereas public dataset of regional language used consisted of 80 speakers, 41 Sundanese and 39 Javanese. As for English dataset, 126 male speakers and 125 female speakers were taken from LibriSpeech. Tests are carried out separately with variety of language and speech duration, about 3 seconds in English and regional languages, 1 and 3 seconds in Indonesian. The result, best accuracy obtained by each dataset is 95% (regional dataset), 94% (English dataset), and 98% (private dataset).