Both multisensory and lexical information are known to influence the perception of speech. However, an open question remains: is either source more fundamental to perceiving speech? In this perspective, we review the literature and argue that multisensory information plays a more fundamental role in speech perception than lexical information. Three sets of findings support this conclusion: first, reaction times and electroencephalographic signal latencies indicate that the effects of multisensory information on speech processing seem to occur earlier than the effects of lexical information. Second, non-auditory sensory input influences the perception of features that differentiate phonetic categories; thus, multisensory information determines what lexical information is ultimately processed. Finally, there is evidence that multisensory information helps form some lexical information as part of a phenomenon known as sound symbolism. These findings support a framework of speech perception that, while acknowledging the influential roles of both multisensory and lexical information, holds that multisensory information is more fundamental to the process.