Abstract. This position paper provides an overview of current research endeavors and existing solutions in multimodal music retrieval, where the term "multimodal" relates to two aspects. The first one is taking into account the music context of a piece of music or an artist, while the second aspect tackled is that of the user context. The music context is introduced as all information important to the music, albeit not directly extractable from the audio signal (such as editorial or collaboratively assembled meta-data, lyrics in textual form, cultural background of an artists, or images of album covers). The user context, in contrast, is defined by various external factors that influence how a listener perceives music. It is therefore strongly related to user modeling and personalization, both facets of music information research that have not gained large attention by the MIR community so far. However, we are confident that adding personalization aspects to existing music retrieval systems (such as playlist generators, recommender systems, or visual browsers) is key to the future of MIR. In this vein, this contribution aims at defining the foundation for future research directions and applications related to multimodal music information systems.