In this article we introduce Mementos: the first multimodal corpus for computational modeling of affect and memory processing in response to video content. It was collected online via crowdsourcing and captures 1995 individual responses collected from 297 unique viewers responding to 42 different segments of music videos. Apart from webcam recordings of their upper-body behavior (totaling 2012 minutes) and self-reports of their emotional experience, it contains detailed descriptions of the occurrence and content of 989 personal memories triggered by the video content. Finally, the dataset includes self-report measures related to individual differences in participants' background and situation (Demographics, Personality, and Mood), thereby facilitating the exploration of important contextual factors in research using the dataset. We describe 1) the construction and contents of the corpus itself, 2) analyse the validity of its content by investigating biases and consistency with existing research on affect and memory processing, 3) review previously published work that demonstrates the usefulness of the multimodal data in the corpus for research on automated detection and prediction tasks, and 4) provide suggestions for how the dataset can be used in future research on modeling Video-Induced Emotions, Memory-Associated Affect, and Memory Evocation.