Using a cross-modal word identification task and an eye-tracking visual-world experiment, we investigated the importance of phonological context in the recovery of tap variants of /t/- and /d/-final words in American English. In Experiment 1, listeners were less accurate when they heard a tap variant of a /t/ word in a non-licensing environment (before a consonant) than when they heard it in a licensing environment (before an unstressed vowel). Contrastively, there was no difference in accuracy for tap variants of /d/ words across different contexts. Similarly, in Experiment 2, listeners looked less often at the target word when they heard tap variants of /t/ words in a mismatching context than a matching one. A mismatch context, however, did not result in fewer looks to the target with tap variants of /d/ words. Importantly, both accuracy and proportion of looks to the target word were higher in the mismatch phonological context than when presented with mispronounced forms. Our results contrast with previous findings on tap variants of /t/. These findings also suggest that contextual information is less important when a surface form is a closer perceptual match to the lexical representation (canonical stops and tap variants of /d/). Thus a model of word recognition must take into account both frequency of a variant in context and the perceptual distance between a variant and its lexical representation.