Abstract. Roadside noise barriers (RNBs) are important urban
infrastructures to ensure that cities remain liveable. However, the absence of accurate and large-scale geospatial data on RNBs has impeded the increasing progress of rational urban planning, sustainable cities, and healthy environments. To address this problem, this study creates a vectorized RNB dataset in China using street view imagery and a geospatial artificial intelligence framework. First, intensive sampling is performed on the road network of each city based on OpenStreetMap, which is used as the georeference for downloading 6×106 Baidu Street View (BSV) images. Furthermore, considering the prior geographic knowledge contained in street view images, convolutional neural networks incorporating image context information (IC-CNNs) based on an ensemble learning strategy are developed to detect RNBs from the BSV images. The RNB dataset presented by polylines is generated based on the identified RNB locations, with a total length of 2667.02 km in 222 cities. Last, the quality of the RNB dataset is evaluated from two perspectives, i.e., the detection accuracy and the completeness and positional accuracy. Specifically, based on a set of randomly selected samples containing 10 000 BSV images, four quantitative metrics are calculated, with an overall accuracy of 98.61 %, recall of 87.14 %, precision of 76.44 %, and F1 score of 81.44 %. A total length of 254.45 km of roads in different cities are manually surveyed using BSV images to evaluate the mileage deviation and overlap level between the generated and surveyed RNBs. The root mean squared error for the mileage deviation is 0.08 km, and the intersection over union for overlay level is 88.08 % ± 2.95 %. The evaluation results suggest that the generated RNB dataset is of high quality and can be applied as an accurate and reliable dataset for a variety of large-scale urban studies, such as estimating the regional solar photovoltaic potential, developing 3D urban models, and designing rational urban layouts. Besides that, the benchmark dataset of the labeled BSV images can also support more work on RNB detection, such as developing more advanced deep learning algorithms, fine-tuning the existing computer vision models, and analyzing geospatial scenes in BSV. The generated vectorized RNB dataset and the benchmark dataset of labeled BSV imagery are publicly available at https://doi.org/10.11888/Others.tpdc.271914 (Chen, 2021).