Basin‐centric long short‐term memory (LSTM) network models have recently been shown to be an exceptionally powerful tool for stream temperature (Ts) temporal prediction (training in one period and predicting in another period at the same sites). However, spatial extrapolation is a well‐known challenge to modelling Ts and it is uncertain how an LSTM‐based daily Ts model will perform in unmonitored or dammed basins. Here we compiled a new benchmark dataset consisting of >400 basins across the contiguous United States in different data availability groups (DAG, meaning the daily sampling frequency) with and without major dams, and studied how to assemble suitable training datasets for predictions in basins with or without temperature monitoring. For prediction in unmonitored basins (PUB), LSTM produced a root‐mean‐square error (RMSE) of 1.129°C and an R2 of 0.983. While these metrics declined from LSTM's temporal prediction performance, they far surpassed traditional models' PUB values, and were competitive with traditional models' temporal prediction on calibrated sites. Even for unmonitored basins with major reservoirs, we obtained a median RMSE of 1.202°C and an R2 of 0.984. For temporal prediction, the most suitable training set was the matching DAG that the basin could be grouped into (for example, the 60% DAG was most suitable for a basin with 61% data availability). However, for PUB, a training dataset including all basins with data was consistently preferred. An input‐selection ensemble moderately mitigated attribute overfitting. Our results indicate there are influential latent processes not sufficiently described by the inputs (e.g., geology, wetland covers), but temporal fluctuations can still be predicted well, and LSTM appears to be a highly accurate Ts modelling tool even for spatial extrapolation.