This study proposes a music-aided framework for affective interaction of service robots with humans. The framework consists of three systems, respectively, for perception, memory, and expression on the basis of the human brain mechanism. We propose a novel approach to identify human emotions in the perception system. The conventional approaches use speech and facial expressions as representative bimodal indicators for emotion recognition. But, our approach uses the mood of music as a supplementary indicator to more correctly determine emotions along with speech and facial expressions. For multimodal emotion recognition, we propose an effective decision criterion using records of bimodal recognition results relevant to the musical mood. The memory and expression systems also utilize musical data to provide natural and affective reactions to human emotions. For evaluation of our approach, we simulated the proposed human-robot interaction with a service robot, iRobiQ. Our perception system exhibited superior performance over the conventional approach, and most human participants noted favorable reactions toward the music-aided affective interaction.