The objective of this study was to improve user experience when appreciating visual artworks with soundscape music chosen by a deep neural network based on weakly supervised learning. We also propose a multi-faceted approach to measuring ambiguous concepts, such as the subjective fitness, implicit senses, immersion, and availability. We showed improvements in appreciation experience, such as the metaphorical and psychological transferability, time distortion, and cognitive absorption, with in-depth experiments involving 70 participants. Our test results were similar to those of “Bunker de Lumières: van Gogh”, which is an immersive media artwork directed by Gianfranco lannuzzi; the fitness scores of our system and “Bunker de Lumières: van Gogh” were 3.68/5 and 3.81/5, respectively. Moreover, the concordance of implicit senses between artworks and classical music was measured to be 0.88%, and the time distortion and cognitive absorption improved during the immersion. Finally, the proposed method obtained a subjective satisfaction score of 3.53/5 in the evaluation of its usability. Our proposed method can also help spread soundscape-based media art by supporting traditional soundscape design. Furthermore, we hope that our proposed method will help people with visual impairments to appreciate artworks through its application to a multi-modal media art guide platform.