Monkeys can easily form lasting central representations of visual and tactile stimuli, yet they seem unable to do the same with sounds. Humans, by contrast, are highly proficient in auditory long-term memory (LTM). These mnemonic differences within and between species raise the question of whether the human ability is supported in some way by speech and language, e.g., through subvocal reproduction of speech sounds and by covert verbal labeling of environmental stimuli. If so, the explanation could be that storing rapidly fluctuating acoustic signals requires assistance from the motor system, which is uniquely organized to chain-link rapid sequences. To test this hypothesis, we compared the ability of normal participants to recognize lists of stimuli that can be easily reproduced, labeled, or both (pseudowords, nonverbal sounds, and words, respectively) versus their ability to recognize a list of stimuli that can be reproduced or labeled only with great difficulty (reversed words, i.e., words played backward). Recognition scores after 5-min delays filled with articulatory-suppression tasks were relatively high (75-80% correct) for all sound types except reversed words; the latter yielded scores that were not far above chance (58% correct), even though these stimuli were discriminated nearly perfectly when presented as reversed-word pairs at short intrapair intervals. The combined results provide preliminary support for the hypothesis that participation of the oromotor system may be essential for laying down the memory of speech sounds and, indeed, that speech and auditory memory may be so critically dependent on each other that they had to coevolve.evolution | mimic | arcuate fasciculus T he proficiency with which monkeys perform tests of both visual and tactile recognition does not extend to auditory recognition. In vision and touch, monkeys master the rule for one-trial recognition memory extremely rapidly, within several daily sessions (1, 2); and once they have learned the rule, it can be shown that they have stimulus-retention thresholds (performance at 75% accuracy) of 10-20 min after viewing or palpating a novel stimulus for only 1-2 s (3, 4). In audition, by contrast, monkeys acquire the rule for one-trial memory exceedingly slowly, requiring a full year or two of training before they can master it, if they succeed at all; and if they do succeed, their stimulus-retention thresholds are found to extend no longer than 30-40 s after stimulus presentation (5). This marked disparity in mnemonic ability across sensory modalities suggests that, in audition alone, monkeys seem unable to store stimulus representations in long-term memory (LTM) and, consequently, appear to be limited mnemonically to the time period covered by short-term memory. Humans, on the other hand, are highly proficient at storing lasting representations of auditory stimuli, such as words and tunes, thereby enabling their later recognition. What accounts for these striking mnemonic differences between audition and other sensory modalities in...