BackgroundThe available evidence on the benefits and harms of novel drugs and therapeutic biologics at the time of approval is reported in publicly available documents provided by the US Food and Drug Administration (FDA). We aimed to create a comprehensive database providing the relevant information required to systematically analyze and assess this early evidence in meta-epidemiological research.MethodsWe designed a modular and flexible database of systematically collected data. We identified all novel cancer drugs and therapeutic biologics approved by the FDA between 2000 and 2016, recorded regulatory characteristics, acquired the corresponding FDA approval documents, identified all clinical trials reported therein, and extracted trial design characteristics and treatment effects. Herein, we describe the rationale and design of the data collection process, particularly the organization of the data capture, the identification and eligibility assessment of clinical trials, and the data extraction activities.DiscussionWe established a comprehensive database on the comparative effects of drugs and therapeutic biologics approved by the FDA over a time period of 17 years for the treatment of cancer (solid tumors and hematological malignancies). The database provides information on the clinical trial evidence available at the time of approval of novel cancer treatments. The modular nature and structure of the database and the data collection processes allow updates, expansions, and adaption for a continuous meta-epidemiological analysis of novel drugs.The database allows us to systematically evaluate benefits and harms of novel drugs and therapeutic biologics. It provides a useful basis for meta-epidemiological research on the comparative effects of innovative cancer treatments and continuous evaluations of regulatory developments.Electronic supplementary materialThe online version of this article (10.1186/s13063-018-2877-z) contains supplementary material, which is available to authorized users.