Contemporaneous collecting of the publicly available web has provided researchers with an invaluable source with which to interpret various aspects of the recent past. With millions of websites gathered, stored and made accessible in national web archives over the past 25 years, this paper argues for the need to reflect upon, and respond to, the biases, inequalities and silences that exist in these vast repositories. This article presents a research agenda for web archivists and web historians to together think broadly about the social, material and technical dimensions that shape what is included in web archives, and what is excluded. A key challenge impacting this effort is that various complexities and contingencies of archival formation are obscured. These include wider social inequalities, the entanglement of human and machine decision-making in the archiving process, changing dynamics of power over information online and the environmental impact of technical systems. Accounting for these social, material and technical factors that shape the formation of web archives provides opportunities to develop and use archives in ways that better acknowledge both the strengths and limitations of national web archives as a proxy for the web’s past.