The synthesis process of document content and its visualization play a basic role in the context of knowledge representation and retrieval. Existing methods for tag-clouds generations are mostly based on text content of documents, others also consider statistical or semantic information to enrich the document summary, while precious information deriving from multimedia content is often neglected. In this paper we present a document summarization and visualization technique based on both statistical and semantic analysis of textual and visual contents. The result of our framework is a Visual Semantic Tag Cloud based on the highlighting of relevant terms in a document using some features (font size, color, etc.) showing the importance of a term compared to other ones. The semantic information is derived from a knowledge base where concepts are represented through several multimedia items. The Visual Semantic Tag Cloud can be used not only to synthesize a document but also to represent a set of documents grouped by categories using a topic detection technique based on textual and visual analysis of multimedia features. Our work aims at demonstrating that with the help of semantic analysis and the combination of textual and visual features it is possible to improve the user knowledge acquisition by means of a synthesized visualization. The whole strategy has been evaluated by means of a ground truth and compared with similar approaches. Experimental results show the effectiveness of our approach, which outperforms state-of-art algorithms in topic detection combining both visual and semantic information.