Are similar, or even identical, mechanisms used in the computational modeling of speech segmentation, serial image processing, and music processing? We address this question by exploring how TRACX2, a recognition‐based, recursive connectionist autoencoder model of chunking and sequence segmentation, which has successfully simulated speech and serial‐image processing, might be applied to elementary melody perception. The model, a three‐layer autoencoder that recognizes “chunks” of short sequences of intervals that have been frequently encountered on input, is trained on the tone intervals of melodically simple French children's songs. It dynamically incorporates the internal representations of these chunks into new input. Its internal representations cluster in a manner that is consistent with “human‐recognizable” melodic categories. TRACX2 is sensitive to both contour and proximity information in the musical chunks that it encounters in its input. It shows the “end‐of‐word” superiority effect demonstrated by Saffran et al. (1999) for short musical phrases. The overall findings suggest that the recursive autoassociative chunking mechanism, as implemented in TRACX2, may be a general segmentation and chunking mechanism, underlying not only word‐ and image‐chunking, but also elementary melody processing.