Abstract-Humans can often learn high-level features of a piece of music, such as beats, from only a few seconds of audio. If robots could obtain this information just as rapidly, they would be more capable of musical interaction without needing long lead times to learn the music. The presence of robot ego noise, however, makes accurately analyzing music more difficult. In this paper, we focus on the task of learning musical beats, which are often identifiable to humans even in noisy environments such as bars. Learning beats would not only help robots to synchronize their responses to music, but could lead to learning other aspects of musical audio, such as other repeated events, timbrel aspects, and more. We introduce a novel algorithm utilizing stacked spectrograms, in which each column contains frequency bins from multiple instances in time, as well as Probabilistic Latent Component Analysis (PLCA) to learn beats in noisy audio. The stacked spectrograms are exploited to find time-varying spectral characteristics of acoustic components, and PLCA is used to learn and separate the components and find those containing beats. We demonstrate that this system can learn musical beats even when only provided with a few seconds of noisy audio.