Fundamental frequency (ƒ0), perceived as pitch, is the first and arguably most salient auditory component humans are exposed to since the beginning of life. It carries multiple linguistic (e.g., word meaning) and paralinguistic (e.g., speakers’ emotion) functions in speech and communication. The mappings between these functions and ƒ0 features vary within a language and differ cross-linguistically. For instance, a rising pitch can be perceived as a question in English but a lexical tone in Mandarin. Such variations mean that infants must learn the specific mappings based on their respective linguistic and social environments. To date, canonical theoretical frameworks and most empirical studies do not view or consider the multi-functionality of ƒ0, but typically focus on individual functions. More importantly, despite the eventual mastery of ƒ0 in communication, it is unclear how infants learn to decompose and recognize these overlapping functions carried by ƒ0. In this paper, we review the symbioses and synergies of the lexical, intonational, and emotional functions that can be carried by ƒ0 and are being acquired throughout infancy. On the basis of our review, we put forward the Learnability Hypothesis that infants decompose and acquire multiple ƒ0 functions through native/environmental experiences. Under this hypothesis, we propose representative cases such as the synergy scenario, where infants use visual cues to disambiguate and decompose the different ƒ0 functions. Further, viable ways to test the scenarios derived from this hypothesis are suggested across auditory and visual modalities. Discovering how infants learn to master the diverse functions carried by ƒ0 can increase our understanding of linguistic systems, auditory processing and communication functions.