The development of anticipatory user interfaces is a key issue in human-centred computing. Building systems that allow humans to communicate with a machine in the same natural and intuitive way as they would with each other requires detection and interpretation of the user's affective and social signals. These are expressed in various and often complementary ways, including gestures, speech, mimics etc. Implementing fast and robust recognition engines is not only a necessary, but also challenging task. In this article, we introduce our Social Signal Interpretation (SSI) tool, a framework dedicated to support the development of such online recognition systems. The paper at hand discusses the processing of four modalities, namely audio, video, gesture and biosignals, with focus on affect recognition, and explains various approaches to fuse the extracted information to a final decision.