OpenStreetMap (OSM) is a well-known example of volunteered geographic information. It has evolved to one of the most used geographic databases. As data quality of OSM is heterogeneous both in space and across different thematic domains, data quality assessment is of high importance for potential users of OSM data. As use cases differ with respect to their requirements, it is not data quality per se that is of interest for the user but fitness for purpose. We investigate the fitness for purpose of OSM to derive land-use and land-cover labels for remote sensing-based classification models. Therefore, we evaluated OSM land-use and land-cover information by two approaches: (1) assessment of OSM fitness for purpose for samples in relation to intrinsic data quality indicators at the scale of individual OSM objects and (2) assessment of OSM-derived multi-labels at the scale of remote sensing patches ($$1.22 \times 1.22$$
1.22
×
1.22
km) in combination with deep learning approaches. The first approach was applied to 1000 randomly selected relevant OSM objects. The quality score for each OSM object in the samples was combined with a large set of intrinsic quality indicators (such as the experience of the mapper, the number of mappers in a region, and the number of edits made to the object) and auxiliary information about the location of the OSM object (such as the continent or the ecozone). Intrinsic indicators were derived by a newly developed tool based on the OSHDB (OpenStreetMap History DataBase). Afterward, supervised and unsupervised shallow learning approaches were used to identify relationships between the indicators and the quality score. Overall, investigated OSM land-use objects were of high quality: both geometry and attribute information were mostly accurate. However, areas without any land-use information in OSM existed even in well-mapped areas such as Germany. The regression analysis at the level of the individual OSM objects revealed associations between intrinsic indicators, but also a strong variability. Even if more experienced mappers tend to produce higher quality and objects which underwent multiple edits tend to be of higher quality, an inexperienced mapper might map a perfect land-use polygon. This result indicates that it is hard to predict data quality of individual land-use objects purely on intrinsic data quality indicators. The second approach employed a label-noise robust deep learning method on remote sensing data with OSM labels. As the quality of the OSM labels was manually assessed beforehand, it was possible to control the amount of noise in the dataset during the experiment. The addition of artificial noise allowed for an even more fine-grained analysis on the effect of noise on prediction quality. The noise-tolerant deep learning method was capable to identify correct multi-labels even for situations with significant levels of noise added. The method was also used to identify areas where input labels were likely wrong. Thereby, it is possible to provide feedback to the OSM community as areas of concern can be flagged.