The Intermodal Preferential Looking paradigm provides a sensitive measure of a child's online word comprehension. To complement existing recommendations (Fernald, Zangl, Portillo, & Marchman, 2008), the present study evaluates the impact of experimental noise generated by two aspects of the visual stimuli on the robustness of familiar word recognition with and without mispronunciations: the presence of a central fixation point and the level of visual noise in the pictures (as measured by luminance saliency). Twenty-month-old infants were presented with a classic word recognition IPL procedure in 3 conditions: without a fixation stimulus (No Fixation - noisiest condition), with a fixation stimulus before trial onset (Fixation, intermediate), and with a fixation stimulus, a neutral background and equally salient images (Fixation Plus - least noisy). Data were systematically analyzed considering a range of data selection criteria and dependent variables (proportion of looking time towards the target, longest look, and time-course analysis). Critically, the expected pronunciation and naming interaction was only found in the Fixation Plus condition. We discuss the impact of data selection criteria and the dependent variable choice on the modulation of these effects across the different conditions.