Recognizing emotional reactions of movie audiences to affective movie content is a challenging task in affective computing. Previous research on induced emotion recognition has mainly focused on using audiovisual movie content. Nevertheless, the relationship between the perceptions of the affective movie content (perceived emotions) and the emotions evoked in the audiences (induced emotions) is unexplored. In this work, we studied the relationship between perceived and induced emotions of movie audiences. Moreover, we investigated multimodal modelling approaches to predict movie induced emotions from movie content based features, as well as physiological and behavioral reactions of movie audiences. To carry out analysis of induced and perceived emotions, we first extended an existing database for movie affect analysis by annotating perceived emotions in a crowd-sourced manner. We find that perceived and induced emotions are not always consistent with each other. In addition, we show that perceived emotions, movie dialogues, and aesthetic highlights are discriminative for movie induced emotion recognition besides spectators' physiological and behavioral reactions. Also, our experiments revealed that induced emotion recognition could benefit from including temporal information and performing multimodal fusion. Moreover, our work deeply investigated the gap between affective content analysis and induced emotion recognition by gaining insight into the relationships between aesthetic highlights, induced emotions, and perceived emotions.