Experienced listeners of a particular acoustic cue in either speech or music appear to have an advantage when perceiving a similar cue in the other domain (i.e., they exhibit cross-domain transfer). One explanation for cross-domain transfer relates to the acquisition of the foundations of speech and music: if acquiring pitch-based elements in speech or music results in heightened attention to pitch in general, then cross-domain transfer of pitch may be observed, which may explain the cross-domain phenomenon seen among listeners of a tone language and listeners with musical training. Here, we investigate this possibility in naïve adult learners, who were trained to acquire pitch-based elements using a distributional learning paradigm, to provide a proof-of-concept for the explanation. Learners were exposed to a stimulus distribution spanning either a Thai lexical tone minimal pair or a novel musical chord minimal pair. Within each domain, the distribution highlights pitch to facilitate learning of two different sounds (Bimodal distribution) or the distribution minimizes pitch so that the input is inferred to be from a single sound (Unimodal distribution). Learning was assessed before and after exposure to the distribution using discrimination tasks with both Thai tone and musical chord minimal pairs. We hypothesize: (i) distributional learning for learners in both the tone and the chord distributions, that is, pre-to-post improvement in discrimination after exposure to the Bimodal but not the Unimodal distribution; and (ii) for both the tone and chord conditions, learners in the Bimodal conditions but not those in the Unimodal conditions will show cross-domain transfer, as indexed by improvement in discrimination of test items in the domain other than what they were trained on. The results support both hypotheses, suggesting that distributional learning is not only used to acquire the foundations of speech and music, but may also play a role in cross-domain transfer: as a result of learning primitives based on a particular cue, learners show heightened attention to that cue in any auditory signal.