Children perform more poorly than adults on a wide range of masked speech perception paradigms, but this effect is particularly pronounced when the masker itself is also composed of speech. The present study evaluated two factors that might contribute to this effect: the ability to perceptually isolate the target from masker speech, and the ability to recognize target speech based on sparse cues (glimpsing). Speech reception thresholds (SRTs) were estimated for closed-set, disyllabic word recognition in children (5-16 years) and adults in a one-or two-talker masker. Speech maskers were 60 dB sound pressure level (SPL), and they were either presented alone or in combination with a 50-dB-SPL speech-shaped noise masker. There was an age effect overall, but performance was adult-like at a younger age for the one-talker than the two-talker masker. Noise tended to elevate SRTs, particularly for older children and adults, and when summed with the one-talker masker. Removing timefrequency epochs associated with a poor target-to-masker ratio markedly improved SRTs, with larger effects for younger listeners; the age effect was not eliminated, however. Results were interpreted as indicating that development of speech-in-speech recognition is likely impacted by development of both perceptual masking and the ability recognize speech based on sparse cues.