Flood nowcasting refers to near-future prediction of flood status as an extreme weather event unfolds to enhance situational awareness. The objective of this study was to adopt and test a novel structured deep-learning model for urban flood nowcasting by integrating physics-based and human-sensed features. We present a new computational modeling framework including an attention-based spatial–temporal graph convolution network (ASTGCN) model and different streams of data that are collected in real-time, preprocessed, and fed into the model to consider spatial and temporal information and dependencies that improve flood nowcasting. The novelty of the computational modeling framework is threefold: first, the model is capable of considering spatial and temporal dependencies in inundation propagation thanks to the spatial and temporal graph convolutional modules; second, it enables capturing the influence of heterogeneous temporal data streams that can signal flooding status, including physics-based features (e.g., rainfall intensity and water elevation) and human-sensed data (e.g., residents’ flood reports and fluctuations of human activity) on flood nowcasting. Third, its attention mechanism enables the model to direct its focus to the most influential features that vary dynamically and influence the flood nowcasting. We show the application of the modeling framework in the context of Harris County, Texas, as the study area and 2017 Hurricane Harvey as the flood event. Three categories of features are used for nowcasting the extent of flood inundation in different census tracts: (i) static features that capture spatial characteristics of various locations and influence their flood status similarity, (ii) physics-based dynamic features that capture changes in hydrodynamic variables, and (iii) heterogeneous human-sensed dynamic features that capture various aspects of residents’ activities that can provide information regarding flood status. Results indicate that the ASTGCN model provides superior performance for nowcasting of urban flood inundation at the census-tract level, with precision 0.808 and recall 0.891, which shows the model performs better compared with other state-of-the-art models. Moreover, ASTGCN model performance improves when heterogeneous dynamic features are added into the model that solely relies on physics-based features, which demonstrates the promise of using heterogenous human-sensed data for flood nowcasting. Given the results of the comparisons of the models, the proposed modeling framework has the potential to be more investigated when more data of historical events are available in order to develop a predictive tool to provide community responders with an enhanced prediction of the flood inundation during urban flood.