The problem of continuous emotion recognition has been the subject of several studies. The proposed affective computing approaches employ sequential machine learning algorithms for improving the classification stage, accounting for the time ambiguity of emotional responses. Modeling and predicting the affective state over time is not a trivial problem because continuous data labeling is costly and not always feasible. This is a crucial issue in real-life applications, where data labeling is sparse and possibly captures only the most important events rather than the typical continuous subtle affective changes that occur. In this work, we introduce a framework from the machine learning literature called Multiple Instance Learning, which is able to model time intervals by capturing the presence or absence of relevant states, without the need to label the affective responses continuously (as required by standard sequential learning approaches). This choice offers a viable and natural solution for learning in a weakly supervised setting, taking into account the ambiguity of affective responses. We demonstrate the reliability of the proposed approach in a gold-standard scenario and towards real-world usage by employing an existing dataset (DEAP) and a purposely built one (Consumer). We also outline the advantages of this method with respect to standard supervised machine learning algorithms.