In several applied contexts (e.g., earwitness testimony), the accurate recognition of unfamiliar voices can be a critical part of the person identification process. However, recognising unfamiliar voices is prone to error. While such errors could be reduced by testing the proficiency of listeners, the established tests of unfamiliar voice matching (BVMT) and memory (GVMT) may be limited by their choice of stimuli (i.e., vowel‐sounds) and their design (i.e., using identical sounds at learning and test; GVMT). Here, we examine whether these sound‐based tests are predictive of performance on more naturalistic speech‐based tasks, and whether performance is consistent across task‐domain (matching/memory) and task‐modality (voices/faces). The findings show that while the BVMT was a robust predictor of speech‐based voice matching, this was not the case for the GVMT and speech‐based voice memory. In addition, we provide evidence for a potential common person recognition factor ‘p’. The theoretical and applied implications are discussed.