Studies have suggested that visual imagery forms an important part of the listening experience, and might be one of the mechanisms by which music induces emotions in a listener. However, little is known about the content, prevalence and functions of visual imagery during music listening. To that end, an online survey was constructed to explore music-related visual imagery. This included 24 statements about the visual imagery based on prior research and an open question about the content of the inner images. Several standardized questionnaires (VVIQ, Gold-MSI) were included as well to investigate the link to visual imagery in general and musical training. In total, 669 participants provided responses to an online survey. A factorial structure of music and visual imagery statements provided a 3-factor structure consisting of vivid, soothing and disruptive visual imagery, although the actual factor structure was non-identical between the musically trained and untrained respondents. Separate analyses of factor for musically trained and untrained participants yielded a more parsimonious structure of visual imagery, which consisted of vivid and soothing visual imagery. These two factors exhibited consistently different weights across the items; for musically trained participants, the vivid imagery was more related to modulating the arousal. The ability to conjure up vivid visual imagery was only weakly related to the music-related visual imagery. A content analysis of the open question revealed common themes that related to a mixture of concrete visual imagery (landscapes, images of people, scenes from past events) and abstract visual imagery (shapes, objects and colours). Implications of these findings for further studies on music-induced emotions are discussed with a focus on a recent constructionist account of emotional meanings in music.