Enthusiasm plays an important role in engaging communication. It enables speakers to be distinguished and remembered, creating an emotional bond that inspires and motivates their addressees to act, listen, and coordinate (Bettencourt et al., 1983). Although people can easily identify enthusiasm, this is a rather difficult task for machines due to the lack of resources and models that can help them understand or generate enthusiastic behavior. We introduce Entheos, the first multimodal dataset for studying enthusiasm composed of video, audio, and text. We present several baseline models and an ablation study using different features, showing the importance of pitch, loudness, and discourse relation parsing in distinguishing enthusiastic communication.