IntroductionTherapeutic options for type 2 diabetes mellitus (T2DM) have expanded over the last decade with the emergence of cardioprotective novel agents, but without such data for older drugs, leaving a critical gap in our understanding of the relative effects of T2DM agents on cardiovascular risk.Methods and analysisThe large-scale evidence generations across a network of databases for T2DM (LEGEND-T2DM) initiative is a series of systematic, large-scale, multinational, real-world comparative cardiovascular effectiveness and safety studies of all four major second-line anti-hyperglycaemic agents, including sodium–glucose co-transporter-2 inhibitor, glucagon-like peptide-1 receptor agonist, dipeptidyl peptidase-4 inhibitor and sulfonylureas. LEGEND-T2DM will leverage the Observational Health Data Sciences and Informatics (OHDSI) community that provides access to a global network of administrative claims and electronic health record data sources, representing 190 million patients in the USA and about 50 million internationally. LEGEND-T2DM will identify all adult, patients with T2DM who newly initiate a traditionally second-line T2DM agent. Using an active comparator, new-user cohort design, LEGEND-T2DM will execute all pairwise class-versus-class and drug-versus-drug comparisons in each data source, producing extensive study diagnostics that assess reliability and generalisability through cohort balance and equipoise to examine the relative risk of cardiovascular and safety outcomes. The primary cardiovascular outcomes include a composite of major adverse cardiovascular events and a series of safety outcomes. The study will pursue data-driven, large-scale propensity adjustment for measured confounding, a large set of negative control outcome experiments to address unmeasured and systematic bias.Ethics and disseminationThe study ensures data safety through a federated analytic approach and follows research best practices, including prespecification and full disclosure of results. LEGEND-T2DM is dedicated to open science and transparency and will publicly share all analytic code from reproducible cohort definitions through turn-key software, enabling other research groups to leverage our methods, data and results to verify and extend our findings.