This article presents an empirical investigation into spectator interpretation of multimodal artworks. Specifically, this article explores perception of the artwork in two different sensorial modalities — sight and sound — and the effect of their interaction. We selected four abstract paintings and created four acousmatic musical pieces that were composed to reflect the content of the artwork. We then ran a between-subjects experimental study in three conditions: visual only, music only, and a combination of both. A total of 48 participants completed an online survey in which they were asked to report their interpretations of the shown artworks. Following a thematic analysis on the collected data, we clustered participants’ interpretation into two main categories: reflective and perceptive. The combination increased spectators’ attention to the artworks, affected the temporality of the artworks, and created richer understandings of the multimodal works. The study provides knowledge for the inclusion of multimodal experiences in the presentation of art expanding the possibilities for inter-sensory dialogue in the arts.