Human infants are predisposed to rapidly acquire their native language. The nature of these predispositions is poorly understood, but is crucial to our understanding of how infants unpack their speech input to recover the fundamental word-like units, assign them referential roles, and acquire the rules that govern their organization. Previous researchers have demonstrated the role of general distributional computations in prelinguistic infants' parsing of continuous speech. We extend these findings to more naturalistic conditions, and find that 6-mo-old infants can simultaneously segment a nonce auditory word form from prosodically organized continuous speech and associate it to a visual referent. Crucially, however, this mapping occurs only when the word form is aligned with a prosodic phrase boundary. Our findings suggest that infants are predisposed very early in life to hypothesize that words are aligned with prosodic phrase boundaries, thus facilitating the word learning process. Further, and somewhat paradoxically, we observed successful learning in a more complex context than previously studied, suggesting that learning is enhanced when the language input is well matched to the learner's expectations.A cquiring a language includes learning mappings from sounds (or signs) to meanings. However, words-the principle units of meaning-are not given directly in the input, but are embedded in a speech signal whose structure is governed by grammatical processes operating at multiple levels. One of the primary steps in language acquisition, therefore, is to discover the sound sequences that define words. However, as any adult confronted with a foreign language can attest, it is hard to perceive unfamiliar speech as sequences of words. Additionally, the language learner must also discover what the words refer to, a particularly tricky problem given the innumerable possible referential features in the world (1, 2). Nevertheless, by 6 mo of age, infants have spontaneously extracted and begun to understand their first words, including highly frequent items such as "no", "Mommy", and the child's own name (3).Here we provide evidence that 6-mo-olds can rapidly extract a statistically defined, novel auditory word form from running speech and simultaneously map it onto a visual referent in an array of objects. Moreover, we find this dual process of word segmentation and referent mapping only when the statistically defined words are aligned with phrasal prosodic constituents, a universal structural property of natural languages. These findings build on three key results from past research: (i) 7-to 8-mo-old infants can extract statistically defined syllable sequences from fluent speech as candidate auditory word forms (4, 5), (ii) by 14 mo, infants can reliably map isolated auditory word forms onto visual referents (6-8), and (iii) by 17 mo, toddlers can extract auditory word forms on the basis of syllable statistics and subsequently map them onto candidate visual referents (9). We demonstrate all of these behaviors simulta...