With so much genomics data being produced, it might be wise to pause and consider what purpose this data can or should serve. Some improve annotations, others predict molecular interactions, but few add directly to existing knowledge. This is because sequence annotations do not always implicate function, and molecular interactions are often irrelevant to a cell's or organism's survival or propagation. Merely correlative relationships found in big data fail to provide answers to the Why questions of human biology. Instead, those answers are expected from methods that causally link DNA changes to downstream effects without being confounded by reverse causation. These approaches require the controlled measurement of the consequences of DNA variants, for example, either those introduced in single cells using CRISPR/Cas9 genome editing or that are already present across the human population. Inferred causal relationships between genetic variation and cellular phenotypes or disease show promise to rapidly grow and underpin our knowledge base.Single-gene studies in model or cellular systems have substantially advanced knowledge in the life sciences. Progress has relied on scientific acumen and on technological advances that provide detailed insights into processes at the atomic, molecular, multisubunit complex, cellular and sometimes organismal levels. These many successes, however, should not blind us as to how our knowledge is incomplete and error-prone. Virtually all (99.85%) protein sequences have no associated experimental evidence at the protein level and for 52% their annotations are flagged as containing possible errors (www.ebi.ac.uk/uniprot/TrEMBLstats). Furthermore, scientific knowledge from targeted studies has been gained unevenly: of all human brain-expressed genes for example, science has focused on very few, with the top 5% of such genes being the subject of 70% of the literature [1].Whole-genome experiments seek to address these deficiencies of uneven coverage and incompleteness. These are aided by technological innovations that inexorably generate ever larger data sets. Critically, however, big data analysis per se reveals not mechanistic causes, but rather correlations and patterns, and leaves questions starting Why unanswered [2]. Even when subsequent experiments address more narrowly defined hypotheses while exploiting this data, these also often fail to determine causality. Correlations and patterns may describe the data set well, but they need to be supplemented by causal inferences in order to predict phenomena reliably. The transformation of large, unstructured data sets to insights (Figure 1) and predictive biology is challenging and rarely attained.In human genomics, data and annotations have grown rapidly. The 3.2 billion base reference genome is partitioned currently into 20 338 protein-coding and 22 521 non-protein-coding gene annotations that are transcribed into 200 310 transcripts (www.ensembl.org/Homo_sapiens/Info/ Annotation) that start from 308 214 locations [3]. Binding sites, often...