Massive Open Online Courses (MOOCs) have a great potential for sustainable education. Millions of learners annually enrol on MOOCs designed to meet the needs of an increasingly diverse and international student population. Participants’ backgrounds vary by factors including age, education, location, and first language. MOOC authors address consequent needs by ensuring courses are well-organised. Learning is structured into discrete steps, prioritising clear communication; video components incorporate subtitles. Variability in participants’ language abilities inevitably create barriers to learning, a problem most extreme for those studying in a language which is not their first. This paper investigates how to identify ESL participants and how best to predict factors associated with their course completion. This study proposes a novel method for automatically categorising (English as Primary and Official Language; English as Official but not Primary Language; and English as a second Language groups) 25,598 participants studying FutureLearn “Understanding Language: Learning and Teaching” MOOC using natural language processing. We compared algorithms’ performance when extracting discernible features in participants’ engagement. Engagement in discussions at the end of the first week is one of the strongest predictive features, while overall, learner behaviours in the first two weeks were identified as the most strongly predictive feature.