In this paper, we present a novel framework that can produce a visual description of a tourist a raction by choosing the most diverse pictures from community-contributed datasets, that describe di erent details of the queried location. e main strength of the proposed approach is its exibility that permits to lter out non-relevant images, and to obtain a reliable set of diverse and relevant images by rst clustering similar images according to their textual descriptions and their visual content, and then extracting images from di erent clusters according to a measure of user's credibility. Clustering is based on a two-step process where textual descriptions are used rst, and the clusters are then re ned according to the visual features. e degree of diversi cation can be further increased by exploiting users' judgments on the results produced by the proposed algorithm through a novel approach, where users not only provide a relevance feedback, but also a diversity feedback. Experimental results performed on the MediaEval 2015 "Retrieving Diverse Social Images" dataset show that the proposed framework can achieve very good performance both in the case of automatic retrieval of diverse images, and in the case of the exploitation of the users' feedback. e e ectiveness of the proposed approach has been also con rmed by a small case study involving a number of real users.