Models for estimating urban rental house prices in the real estate market continue to pose a challenging problem due to the insufficiency of algorithms and comprehensive perspectives. Existing rental house price models based on either the geographically weighted regression (GWR) or deep-learning methods can hardly predict very satisfactory prices, since the rental house prices involve both complicated nonlinear characteristics and spatial heterogeneity. The linear-based GWR model cannot characterize the nonlinear complexity of rental house prices, while existing deep-learning methods cannot explicitly model the spatial heterogeneity. This paper proposes a fully connected neural network–geographically weighted regression (FCNN–GWR) model that combines deep learning with GWR and can handle both of the problems above. In addition, when calculating the geographical location of a house, we propose a set of locational and neighborhood variables based on the quantities of nearby points of interests (POIs). Compared with traditional locational and neighborhood variables, the proposed “quantity-based” locational and neighborhood variables can cover more geographic objects and reflect the locational characteristics of a house from a comprehensive geographical perspective. Taking four major Chinese cities (Wuhan, Nanjing, Beijing, and Xi’an) as study areas, we compare the proposed method with other commonly used methods, and this paper presents a more precise estimation model for rental house prices. The method proposed in this paper may serve as a useful reference for individuals and enterprises in their transactions relevant to rental houses, and for the government in terms of the policies and positions of public rental housing.