This work tests the relative role of perception- and production-based predictors, and the relationship between them, in imitation of artificial accents varying in voice onset time (VOT), using a paradigm designed to target distinct sub-processes of imitation. We examined how explicit imitation of sentences differing systematically in voice onset time (VOT) was influenced by the type of VOT manipulation (lengthened vs. shortened) and by the presence vs. absence of voice-related variability in exposure. In contrast to previous work, participants imitated shortened as well as lengthened VOT, albeit with both qualitative and quantitative differences across the two manipulation types. The presence of voice-related variability inhibited imitation, but this inhibition was mitigated by a preceding session with no voice-related variability (i.e., sentences were acoustically identical except for VOT). We then tested the extent to which individual performance on the accent imitation task was related to performance on three other tasks: 1) discrimination of the target accents, 2) imitation of words in isolation drawn from a VOT continuum, and 3) discrimination of these same words. Performance on the accent discrimination task and the word-level imitation task, but not the word-level discrimination task, were independently predictive of accent imitation. Results are consistent with a conceptualization of explicit imitation as the sum of automatic phonetic convergence processes overlaid with distinct, controlled perceptual and articulatory factors that pattern differently across individuals. Phonetic imitation should not be considered as a monolithic skill, and models predicting variation in imitative ability must consider not only the potential sources of individual variability, but also at what level these sources of variability exert their influence.