The current study presents two meta‐analyses to explore what underlies the assessment and teaching of comprehensible and nativelike pronunciation among English‐as‐a‐Second‐Language speakers. In Study 1, listener studies (n = 37) were retrieved examining the influence of segmental, prosodic, and temporal features on listeners’ intuitive judgements of comprehensibility and nativelikeness/accentedness as per different listener backgrounds (expert, mixed, L2). In Study 2, training studies (n = 17) were retrieved examining the effects of segmental, prosodic, and temporal‐based instruction on ESL learners’ pronunciation. The results showed that (a) comprehensibility judgements were related to a range of segmental, prosodic, and temporal features; (b) accentedness judgements were strongly tied to participants’ correct pronunciation of consonants and vowels; and (c) instruction led to larger gains in comprehensibility than in nativelikeness. Moderator analyses demonstrated that expert listeners were more reliant on phonological information. Greater effects of instruction on comprehensibility than nativelikeness became clearer, especially when the treatment targeted prosodic accuracy. The findings suggest that ESL practitioners should prioritize suprasegemental practice to help students achieve comprehensible L2 pronunciation. The attainment of nativelike pronunciation, by contrast, may require an exclusive focus on the refinement of segmental accuracy, which is resistant to the influence of instruction.