The auto-generation of urban roads can greatly improve efficiency and productivity in urban planning and designing. However, it has also raised concerns amongst researchers over the past decade. In this paper, we present an image-based urban road network generation framework using conditional diffusion models. We first trained a diffusion model capable of generating road images with similar characteristics to the ground truth using four context factors. Then, we used the trained model as the generator to synthesize road images conditioned in a geospatial context. Finally, we converted the generated road images into road networks with several post-processes. The experiments conducted in five cities of the United States showed that our model can generate reasonable road networks, maintaining the layouts and styles of real examples. Moreover, our model has the ability to show the obstructive effect of geographic barriers on urban roads. By comparing models with different context factors as input, we find that the model that considers all four factors generally performs the best. The most important factor in guiding the shape of road networks is intersections, implying that the development of urban roads is not only restricted by the natural environment but is more strongly influenced by human design.