This study is based on the premise that it is possible to train computers to predict the language of a word (textual or audio) by learning from its character n‐gram pattern, without recourse to the language's dictionary. With the growth of multilingual collections and a need for automatic means of cleaning textual datasets, this paper presents a strategy for language identification of individual words in a body of texts. This strategy is suitable for resource‐scarce languages that do not have large electronic datasets that are required for machine learning and natural language processing studies and whose dictionaries may not be available. In this study, we focused on three African languages, namely Hausa, Igbo, and Yoruba. A training corpus in each of these languages was used to obtain the probabilities of character trigrams in the language. Given that English is a common language that is often mixed with these resource‐scarce languages in texts, we also obtained the probabilities of trigrams in an English training corpus. These probabilities were then used in identifying the language of each word in test corpora containing bilingual texts. Our strategy achieved average precision, recall and F1 values of about 97%, 91% and 94% respectively.