Virtual reality enables the creation of personalized users experiences that put together people of different cultures and ethnicity. We consider a novel concept of virtual reality innovation in museums, which is cognitively grounded, and based on data and their semantics, to enable users to share their experiences, as well as to assume the perspective of other users, with the ultimate goal of increasing social cohesion. The implementation of this scenario requires an autonomous artificial system capable to detect emotions and values from a dialogue involving museum visitors who express their personal point of view, listen to the point of view of other visitors, and assume others' perspectives. An important feature of this system is the ability of detecting similarity and dissimilarity between users' perspectives expressed by speech when exposed to artworks. This aids in defining an effective strategy for sharing diverse user perspectives for increasing social cohesion. Moreover, it enables an unbiased quantification of the success of the interaction in terms of change in the user perspective. Based on results from previous work, we employ the Ekman emotion model and Haidt moral value model to detect emotional and moral value profiles from user descriptions of artworks. We propose a novel method for measuring the similarity between user perspectives by comparing emotional and moral value profiles. Our results show that the employment of unsupervised text classification models is a promising research direction for this task.