Smartphones as vibration measurement instruments form a large-scale, citizen-induced, and mobile wireless sensor network (WSN) for system identification and structural health monitoring (SHM) applications. Crowdsourcing-based SHM is possible with a decentralized system granting citizens with operational responsibility and control. Yet, citizen initiatives introduce device mobility, drastically changing SHM results due to uncertainties in the time and the space domains. This paper proposes a modal identification strategy that fuses spatiotemporally sparse SHM data collected by smartphone-based WSNs. Multichannel data sampled with the time and the space independence is used to compose the modal identification parameters such as frequencies and mode shapes. Structural response time history can be gathered by smartphone accelerometers and converted into Fourier spectra by the processor units. Timestamp, data length, energy to power conversion address temporal variation, whereas spatial uncertainties are reduced by geolocation services or determining node identity via QR code labels. Then, parameters collected from each distributed network component can be extended to global behavior to deduce modal parameters without the need of a centralized and synchronous data acquisition system. The proposed method is tested on a pedestrian bridge and compared with a conventional reference monitoring system. The results show that the spatiotemporally sparse mobile WSN data can be used to infer modal parameters despite non-overlapping sensor operation schedule.