Ramscar, Yarlett, Dye, Denny, and Thorpe (2010) showed how, consistent with the predictions of error‐driven learning models, the order in which stimuli are presented in training can affect category learning. Specifically, learners exposed to artificial language input where objects preceded their labels learned the discriminating features of categories better than learners exposed to input where labels preceded objects. We sought to replicate this finding in two online experiments employing the same tests used originally: A four pictures test (match a label to one of four pictures) and a four labels test (match a picture to one of four labels). In our study, only findings from the four pictures test were consistent with the original result. Additionally, the effect sizes observed were smaller, and participants over‐generalized high‐frequency category labels more than in the original study. We suggest that although Ramscar, Yarlett, Dye, Denny, and Thorpe (2010) feature‐label order predictions were derived from error‐driven learning, they failed to consider that this mechanism also predicts that performance in any training paradigm must inevitably be influenced by participant prior experience. We consider our findings in light of these factors, and discuss implications for the generalizability and replication of training studies.