In the last decades, there has been a growing interest in crossmodal correspondences, including those involving temperature. However, only a few studies have explicitly examined the underlying mechanisms behind temperature-related correspondences. In the present study, we investigated the relative roles of affective and semantic mechanisms in crossmodal correspondences between visual textures and temperature concepts using an associative learning paradigm. We conducted two online experiments using visual textures previously shown to be associated with low (i.e., crystalline) and high (i.e., furry) temperatures (Experiment 1; N = 300), and visual textures (i.e., stained, wrinkled) without prior associations to temperature concepts (Experiment 2; N = 300). In both experiments, participants completed a speeded categorisation task before and after an associative learning task, in which they learned mappings between the visual textures and specific affective (e.g., sad vs. happy facial expressions) or semantic (e.g., fur vs. metal) stimuli related to low and high temperatures. The results revealed that, across the two experiments, both the affective and semantic mappings influenced the explicit temperature categorisation responses, but not reaction times, in the corresponding direction. Moreover, the effect of learning semantic mappings was larger than that of affective mappings in both experiments. These results suggest that a semantic mechanism has more weight in the formation of crossmodal associations between visual textures and temperature concepts than an affective mechanism. We advance the research on temperature-related crossmodal correspondences by using, for the first time, a learning paradigm to investigate the relative mechanisms of crossmodal associations. We demonstrate that the crossmodal associations studied here can be strengthened and created through learning related to mechanisms behind these associations.