Recording videos with smartphones at large-scale events such as concerts and festivals is very common nowadays. The se videos register the atmosphere of the event as it is experienced by the crowd and offer a perspective that is hard to capture by the professional cameras installed throughout the venue. In this article, we present a framework to collect videos from smartphones in the public and blend these into a mosaic that can be readily mixed with professional camera footage and shown on displays during the event. The video upload is prioritized by matching requests of the event director with video metadata, while taking into account the available wireless network capacity. The proposed framework's main novelty is its scalability, supporting the real-time transmission, processing and display of videos recorded by hundreds of simultaneous users in ultra-dense Wi-Fi environments, as well as its proven integration in commercial production environments. The framework has been extensively validated in a controlled lab setting with up to 1 000 clients as well as in a field trial where 1 183 videos were collected from 135 participants recruited from an audience of 8 050 people. 90 % of those videos were uploaded within 6.8 minutes.