Abstract-The size and importance of visual multimedia collections grew rapidly over the last years, creating a need for sophisticated multimedia analytics systems enabling large-scale, interactive, and insightful analysis. These systems need to integrate the human's natural expertise in analyzing multimedia with the machine's ability to process large-scale data. The paper starts off with a comprehensive overview of representation, learning, and interaction techniques from both the human's and the machine's point of view. To this end, hundreds of references from the related disciplines (visual analytics, information visualization, computer vision, multimedia information retrieval) have been surveyed. Based on the survey, a novel general multimedia analytics model is synthesized. In the model, the need for semantic navigation of the collection is emphasized and multimedia analytics tasks are placed on the explorationsearch axis. The axis is composed of both exploration and search in a certain proportion which changes as the analyst progresses towards insight. Categorization is proposed as a suitable umbrella task realizing the exploration-search axis in the model. Finally, the pragmatic gap, defined as the difference between the tight machine categorization model and the flexible human categorization model is identified as a crucial multimedia analytics topic.Index Terms-Multimedia (image/video/music) visualization, machine learning.