The visual qualities of the urban environment influence people’s perception and reaction to their surroundings; hence the visual quality of the urban environment affects people’s mental states and can have detrimental societal effects. Therefore, people’s perception and understanding of the urban environment are necessary. This study used a deep learning-based approach to address the relationship between effective spatial criteria and people’s visual perception, as well as spatial modeling and preparing a potential map of people’s visual perception in urban environments. Dependent data on people’s visual perception of Tehran, Iran, was gathered through a questionnaire that contained information about 663 people, 517 pleasant places, and 146 unpleasant places. The independent data consisted of distances to industrial areas, public transport stations, recreational attractions, primary streets, secondary streets, local passages, billboards, restaurants, shopping malls, dilapidated areas, cemeteries, religious places, traffic volume, population density, night light, air quality index (AQI), and normalized difference vegetation index (NDVI). The convolutional neural network (CNN) algorithm created the potential map. The potential visual perception map was evaluated using the receiver operating characteristic (ROC) curve and area under the curve (AUC), with the estimates of AUC of 0.877 and 0.823 for pleasant and unpleasant visuals, respectively. The maps obtained using the CNN algorithm showed that northern, northwest, central, eastern, and some southern areas of the city are potent in pleasant sight, and southeast, some central, and southern regions had unpleasant sight potential. The OneR method results demonstrated that distance to local passages, population density, and traffic volume is most important for pleasant and unpleasant sights.