Many people take photos and videos with smartphones and more recently with 360 • cameras at popular places and events, and share them in social media. Such visual content is produced in large volumes in urban areas, and it is a source of information that online users could exploit to learn what has got the interest of the general public on the streets of the cities where they live or plan to visit. A key step to providing users with that information is to identify the most popular k spots in specified areas. In this paper, we propose a clustering and incremental sampling (C&IS) approach that trades off accuracy of top-k results for detection speed. It uses clustering to determine areas with high density of visual content, and incremental sampling, controlled by stopping criteria, to limit the amount of computational work. It leverages spatial metadata, which represent the scenes in the visual content, to rapidly detect the hotspots, and uses a recently proposed Gaussian probability model to describe the capture intention distribution in the query area. We evaluate the approach with metadata, derived from a non-synthetic, user-generated dataset, for regular mobile and 360 • visual content. Our results show that the C&IS approach offers 2.8×-19× reductions in processing time over an optimized baseline, while in most cases correctly identifying 4 out of 5 top locations.