Abstract. Under the Copernicus programme, an operational CO2 monitoring system (CO2MVS) is being developed and will exploit data from future satellites monitoring the amount of CO2 within the atmosphere. Methods for estimating CO2 emissions from significant local emitters (hotspots, i.e. cities or power plants) can greatly benefit from the availability of such satellite images, displaying atmospheric plumes of CO2. Indeed, local emissions are strongly correlated to the size, shape and concentrations distribution of the corresponding plume, the visible consequence of the emission. The estimation of emissions from a given source can therefore directly benefit from the detection of its associated plumes in the satellite image. In this study, we address the problem of plume segmentation, i.e. the problem of finding all pixels in an image that constitute a city or power plant plume. This represents a significant challenge, as the signal from CO2 plumes induced by emissions from cities or power plants is inherently difficult to detect since it rarely exceeds values of a few ppm and is perturbed by variable regional CO2 background signals and observation errors. To address this key issue, we investigate the potential of deep learning methods and in particular convolutional neural networks to learn to distinguish plume-specific spatial features from background or instrument features. Specifically, a U-net algorithm, an image-to-image convolutional neural network, with a state-of-the-art encoder, is used to transform an XCO2 field into an image representing the positions of the targeted plume. Our models are trained on hourly 1 km simulated XCO2 fields in the regions of Paris, Berlin and several German power plants. Each field represents the plume of the hotspot, the background consisting of the signal of anthropogenic and biogenic CO2 surface fluxes near or far from the targeted source and the simulated satellite observation errors. The performance of the deep learning method is thereafter evaluated and compared with a plume segmentation technique based on thresholding in two contexts: the first where the model is trained and tested on data from the same region, and the second where the model is trained and tested in two different regions. In both contexts, our method outperforms the usual segmentation technique based on thresholding and demonstrates its ability to generalise in various cases: city plumes, power plant plumes, and areas with multiple plumes. Although less accurate than in the first context, the ability of the algorithm to extrapolate on new geographical data is conclusive, paving the way to a promising universal segmentation model, trained on a well-chosen sample of power plants and cities, and able to detect the majority of the plumes from all of them. Finally, the highly accurate results for segmentation suggest a significant potential of convolutional neural networks for estimating local emissions from spaceborne imagery.