High phonological neighborhood density has been associated with both advantages and disadvantages in early word learning. High density may support the formation and fine-tuning of new word sound memories-a process termed lexical configuration (e.g., Storkel, 2004). However, new high-density words are also more likely to be misunderstood as instances of known words, and may therefore fail to trigger the learning process (e.g., Swingley & Aslin, 2007). To examine these apparently contradictory effects, we trained an autoencoder neural network on 587,954 word tokens (5,497 types, including mono-and multisyllabic words of all grammatical classes) spoken by 279 caregivers to English-speaking children aged 18-24 months. We then simulated a communicative development inventory administration and compared network performance to that of 2,292 children aged 18-24 months. We argue that autoencoder performance illustrates concurrent density advantages and disadvantages, in contrast to prior behavioral and computational literature treating such effects independently. Low network error rates signal a configuration advantage for high-density words, while high network error rates signal a triggering advantage for low-density words. This interpretation is consistent with the application of autoencoders in academic research and industry, for simultaneous feature extraction (i.e., configuration) and anomaly detection (i.e., triggering). Autoencoder simulation therefore illustrates how apparently contradictory density and distinctiveness effects can emerge from a common learning mechanism.