Recently, there is an increasing tendency to embed functionalities for recognizing emotions from user generated media content in automated systems such as call-centre operations, recommendations and assistive technologies, providing richer and more informative user and content profiles. However, to date, adding these functionalities was a tedious, costly, and time consuming effort, requiring identification and integration of diverse tools with diverse interfaces as required by the use case at hand. The MixedEmotions Toolbox leverages the need for such functionalities by providing tools for text, audio, video, and linked data processing within an easily integrable plug-and-play platform. These functionalities include: (i) for text processing: emotion and sentiment recognition, (ii) for audio processing: emotion, age, and gender recognition, (iii) for video processing: face detection and tracking, emotion recognition, facial landmark localization, head pose estimation, face alignment, and body pose estimation, and (iv) for linked data: knowledge graph integration. Moreover, the MixedEmotions Toolbox is open-source and free. In this article, we present this toolbox in the context of the existing landscape, and provide a range of detailed benchmarks on standard test-beds showing its state-of-the-art performance. Furthermore, three real-world use-cases show its effectiveness, namely emotion-driven smart TV, call center monitoring, and brand reputation analysis.