Crowdsensing allows citizens to contribute to the monitoring of their living environment using the sensors embedded in their mobile devices, e.g., smartphones. However, crowdsensing at scale involves significant communication, computation, and financial costs due to the dependence on cloud infrastructures for the analysis (e.g., interpolation and aggregation) of spatio-temporal data. This limits the adoption of crowdsensing by activists although sorely needed to inform our knowledge of the environment. As an alternative to the centralized analysis of crowdsensed observations, this paper introduces a fully distributed interpolation-mediated aggregation approach running on smartphones. To achieve so efficiently, we model the interpolation as a distributed tensor completion problem, and we introduce a lightweight aggregation strategy that anticipates the likelihood of future encounters according to the quality of the interpolation. Our approach thus shifts the centralized postprocessing of crowdsensed data to distributed pre-processing on the move, based on opportunistic encounters of crowdsensors through state-of-the-art D2D networking. The evaluation using a dataset of quantitative environmental measurements collected from 550 crowdsensors over 1 year shows that our solution significantly reduces -and may even eliminate-the dependence on the cloud infrastructure, while it incurs a limited resource cost on end devices. Meanwhile, the overall data accuracy remains comparable to that of the centralized approach.