Geo-sensory time series, such as the air quality and water distribution, are collected from numerous sensors at different geospatial locations in the same time interval. Each sensor monitors multiple parameters and generates multivariate time series. These time series change over time and vary geographically; hence, geo-sensory time series contain multi-scale spatial-temporal correlations, namely inter-sensor spatial-temporal correlations and intra-sensor spatial-temporal correlations. To capture spatial-temporal correlations, although various deep learning models have been developed, few of the models focus on capturing both correlations. To solve this problem, we propose simultaneously capture the inter- and intra-sensor spatial-temporal correlations by designing a joint network of non-linear graph attention and temporal attraction force(J-NGT) consisting two graph attention mechanisms. The non-linear graph attention mechanism can characterize node affinities for adaptively selecting the relevant exogenous series and relevant sensor series. The temporal attraction force mechanism can weigh the effect of past values on current values to represent the temporal correlation. To prove the superiority and effectiveness of our model, we evaluate our model in three real-world datasets from different fields. Experimental results show that our model can achieve better prediction performance than eight state-of-the-art models, including statistical models, machine learning models, and deep learning models. Furthermore, we conducted experiments to capture inter- and intra-sensor spatial-temporal correlations. Experimental results indicate that our model significantly improves performance by capturing both inter- and intra-sensor spatial-temporal correlations. This fully shows that our model has a greater advantage in geo-sensory time series prediction.