This study investigated the encoding of the surface form of spoken words using a continuous recognition memory task. The purpose was to compare and contrast three sources of stimulus variability-talker, speaking rate, and overall amplitude-to determine the extent to which each source of variability is retained in episodic memory. In Experiment 1, listeners judged whether each word in a list of spoken words was "old" (had occurred previously in the list) or "new." Listeners were more accurate at recognizing a word as old if it was repeated by the same talker and at the same speaking rate; however, there was no recognition advantage for words repeated at the same overall amplitude. In Experiment 2, listeners were first asked to judge whether each word was old or new, as before, and then they had to explicitly judge whether it was repeated by the same talker, at the same rate, or at the same amplitude. On the first task, listeners again showed an advantage in recognition memory for words repeated by the same talker and at same speaking rate, but no advantage occurred for the amplitude condition. However, in all three conditions, listeners were able to explicitly detect whether an old word was repeated by the same talker, at the same rate, or at the same amplitude. These data suggest that although information about all three properties of spoken words is encoded and retained in memory, each source of stimulus variation differs in the extent to which it affects episodic memory for spoken words.A long-standing problem for theories of speech perception and spoken word recognition has been perceptual constancy in the face of a highly variable speech signal. Listeners extract stable linguistic percepts from an acoustic speech signal that varies substantially due to idiosyncratic differences in the size and shape of individual talkers' vocal tracts as well as to differences within and among talkers in factors such as speaking rate, dialect, speaking style, and vocal effort. Traditionally, researchers have adopted an abstractionist approach to the problem of perceptual constancy, assuming that variability in the speech signal is perceptual "noise" that must be "stripped away" during perception to arrive at a series of abstract, canonical linguistic units (see Pisoni, 1997). Research has typically focused either on searching for sets of acoustic, articulatory, or relational invariants hypothesized to allow access to phoneme-and ultimately word-sized units (e.g., Kewley-Port, 1983;Stevens & Blumstein, 1978) or on normalization algorithms and processes that would successfully filter out stimulus variation to arrive at the abstract units thought to underlie further