Decisions in humans and other organisms depend, in part, on learning and using models that capture the statistical structure of the world, including the long-run expected outcomes of our actions. One prominent approach to forecasting such long-run outcomes is the successor representation (SR), which predicts future states aggregated over multiple timesteps. Although much behavioral and neural evidence suggests that people and animals use such a representation, it remains unknown how they acquire it. It has frequently been assumed to be learned by temporal difference bootstrapping (SR-TD(0)), but this assumption has largely not been empirically tested or compared to alternatives including eligibility traces (SR-TD(λ > 0)). Here we address this gap by leveraging trial-by-trial reaction times in graph sequence learning tasks, which are favorable for studying learning dynamics because the long horizons in these studies differentiate the transient update dynamics of different learning rules. We examined the behavior of SR-TD(λ) on a probabilistic graph learning task alongside a number of alternatives, and found that behavior was best explained by a hybrid model which learned via SR-TD(λ) alongside an additional zeroth-order predictive model. The relatively large λ we estimate indicates a predominant role of eligibility trace mechanisms over the bootstrap-based chaining typically assumed. Our results provide insight into how humans learn predictive representations, and demonstrate that people simultaneously learn the SR alongside lower-order predictions.