One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ability of model-based agents in comparison to their model-free counterparts. We focus our analysis on MuZero [60], a powerful model-based agent, and evaluate its performance on both procedural and task generalization. We identify three factors of procedural generalization-planning, self-supervised representation learning, and procedural data diversity-and show that by combining these techniques, we achieve state-of-the art generalization performance and data efficiency on Procgen [9]. However, we find that these factors do not always provide the same benefits for the task generalization benchmarks in Meta-World [74], indicating that transfer remains a challenge and may require different approaches than procedural generalization. Overall, we suggest that building generalizable agents requires moving beyond the single-task, model-free paradigm and towards self-supervised model-based agents that are trained in rich, procedural, multi-task environments.