Language evolution studies are hindered by the circumstance that language does not fossilize, literally speaking. Nonetheless, different proxies or windows to language evolution have been posited, including the so-called linguistic “fossils,” i.e. aspects of present-day languages which can be regarded as approximations of early forms and uses of language. Among them, ideophones stand out as a particularly promising window because of their distinctive features, including their sound-symbolic nature, ample use of reduplication, reliance on the simplest possible combinatorial processes, attachment to emotional content, and presumed bootstrapping effects on language acquisition. It is of special relevance that these features of ideophones highlight their continuity with primate communication systems, including the reported co-occurrence with gestures. In addition to continuity with other species, our proposal also focuses on the role of ideophones in cross-modality and multi-modality, and on their interaction with the evolution of human brains, as envisioned in the framework of the human self-domestication (HSD) hypothesis, according to which humans evolved traits similar to those found in animal domesticates, including decreased reactive aggression. Our framework implicates the cortico-striatal brain networks, whose enhanced connectivity is a mechanism for the suppression of reactive aggression, via the enhanced cortical control of subcortical regions, but also a mechanism for cross-modality, and language processing more generally, via the enhancement of the dialogue among distant brain regions. Our main claim will be that, within this evolutionary framework, ideophones can be regarded not only as informative proxies or windows into the early stages in the evolution of human languages, but also as scaffolds supporting the evolution of more complex forms of language and cognition, just as they serve as useful scaffolds in child language acquisition.