Forecasting distribution shifts under novel environmental conditions is a major task for ecologists and conservationists. Researchers forecast distribution shifts using several tools including: predicting from an empirical relationship between a summary of distribution (population centroid) and annual time series (“annual regression,” AR); or fitting a habitat‐envelope model to historical distribution and forecasting given predictions of future environmental conditions (“habitat envelope,” HE). However, surprisingly little research has estimated forecast skill by fitting to historical data, forecasting distribution shifts and comparing forecasts with subsequent observations of distribution shifts. I demonstrate the important role of retrospective skill testing by forecasting poleward movement over 1‐, 2‐ or 3‐year periods for 20 fish and crab species in the Eastern Bering Sea and comparing forecasts with observed shifts. I specifically introduce an alternative vector‐autoregressive spatio‐temporal (VAST) forecasting model, which can include species temperature responses, and compare skill for AR, HE and VAST forecasts. Results show that the HE forecast has 30%–43% greater variance than predicting that future distribution is identical to the estimated distribution in the final year (a “persistence” forecast). Meanwhile, the AR explains 2%–6% and VAST explains 8%–25% of variance in poleward movement, and both have better performance than a persistence forecast. HE and AR both generate forecast intervals that are too narrow, while VAST models with or without temperature have appropriate width for forecast intervals. Retrospective skill testing for more regions and taxa should be used as a test bed to guide future improvements in methods for forecasting distribution shifts.