Although reliable and accurate inventorying of sidewalks is time consuming, it can aid urban planners in decision making for infrastructure development. Recent advancements in computer vision and machine learning algorithms have improved the reliability and accuracy of automated inventorying. This research uses a deep learning architecture-based semantic segmentation model (i.e., HRNet + OCR) to automate sidewalk identification using Google Street View (GSV) images. The results show that retraining the model using local training images yields 114.16% and 178.11% higher performance in terms of intersection over union (IoU) metric compared to pretrained model using Cityscapes and Mapillary datasets, respectively. The developed model showed excellent performance in predicting the presence of sidewalks in an image by achieving high accuracy (0.9557), precision (0.9447), recall (0.9900), and F1- score (0.9668). Further, in comparison with EfficientNet, a computationally efficient image classification model, the present model showed superior performance in predicting sidewalk presence at the image level. Therefore, integrating local training images containing minimum required labels (in this study, roads, sidewalks, buildings, and walls) with publicly available training datasets can help increase the performance of the semantic segmentation model for extracting the required features (in this study, roads and sidewalks) from GSV images, especially in developing countries like Bangladesh. This study generates sidewalk maps on a neighborhood scale, which can be useful visualization tools for researchers and practitioners to understand the existing pedestrian infrastructure and plan for future improvements.