This study assesses the reliability 1 of the LLAMA aptitude tests (Meara, 2005). The LLAMA tests were designed as shorter, free, language-neutral tests loosely based on the MLAT tests (Carroll & Sapon, 1959). They contain four sub-components: vocabulary acquisition, sound recognition, sound-symbol correspondence and grammatical inferencing. Granena (2013) and Rogers et al. (2016) provided initial results regarding factors which might influence LLAMA test scores. This paper develops this previous work by examining some of issues raised with a larger cohort and focuses on the following research questions. Data were collected from 240 participants aged 10-75 for RQ1-3. We found no significant differences in terms of language background (RQ1) but instructed second language learners significantly outperformed monolinguals (RQ2). For RQ3 we found that the younger groups were outperformed by all the other groups.For RQ4, we investigated how much variance in LLAMA test results six individual background factors could explain. We combined data from Rogers et al. (2016) and this study giving 404 participants in total. Using a multiple regression analysis, we found that prior L2 instruction predicted more of the variance (6%) than any other factor. We suggest that when using the LLAMA tests, researchers should consider controlling for language learning experience.This study scrutinises the components of the LLAMA tests with a large set of data. We conclude that the results are robust across a range of individual differences but suggest that different norms may be needed for younger age groups and those who have received prior L2 instruction.