Visual statistical learning (VSL) is an example of incidental learning that reflects learning of temporal or spatial stimulus co-occurrence. Real-world stimuli over which VSL may take place typically have rich interrelationships, such as similarity or categorization. In the present work, we asked whether similarity of constituent items affects VSL. Participants were shown creature stimuli composed of distinct features (e.g., head orientation), with each feature having two possible feature options (e.g., head facing up or head facing to the right). The specific discrete features allowed for systematic manipulation of similarity between paired items in terms of the number of shared and distinct features. Participants viewed stimuli one at a time, in a stream composed of temporally paired items that were either similar (shared majority features) or dissimilar (shared few features). In a test phase, participants performed a forced-choice recognition task, choosing between a target pair previously presented, or a matched-similarity foil pair composed of previously presented items that had been recomposed. Across three experiments, similar pairs were recognized at a higher rate than dissimilar pairs. These results provide evidence of the impact of inter-item similarity on VSL, which may play a strong role in determining the outcomes of everyday VSL.