Studying cancer genomesThe human genome contains the instructions to build, maintain and reproduce a human organism, and is encoded in deoxyribonucleic acid (DNA). DNA is a polymer of nucleotides (adenine (A), cytosine (C), thymine (T) and guanine (G)) that coils together to form a double helix. The average human genome consists of roughly 6 billion nucleotides that are organized over 23 pairs of chromosomes. About 1-2% of the genome codes for the ~20,000 genes, most of which are eventually transcribed into mRNA and subsequently translated into proteins. The remainder of the genome (at least in part) plays an important role in regulating the expression of genes [1].In normal cells, growth and division (together called cell proliferation) is tightly controlled by hundreds of genes. However, mutations in these genes (or its regulatory elements) can lead to the outgrowth of cells that are able to proliferate uncontrollably and avoid cell death; in other words, cancer. Cancerous cells may eventually escape their original environment and colonize other tissues (i.e. metastasize), which can be driven by mutations but also other factors such as environmental pressures [2]. The mutations that contribute to cancer are primarily those acquired over the lifetime of an individual (somatic mutations), though inherited germline mutations can also increase cancer risk [3].DNA sequencing is commonly used to study mutations in cancer. The first generation of sequencing technology (Sanger sequencing) was used to determine the full sequence of the human genome, but was limited to sequencing one DNA fragment of up to 1,000 bases at a time. The next-generation sequencing (NGS) technologies that emerged shortly after allowed for massively parallel sequencing (thousands to millions) of multiple short DNA fragments (tens to hundreds of bases). These next generation 'short-read' sequencing techniques typically involve breaking the DNA of a sample into random short fragments that are amplified and then sequenced to produce 'reads'. Mapping assembly is then performed, whereby reads are compared ('mapped') to a reference genome and pieced together to form a continuous genomic sequence of the sample which allows for the detection of genetic differences (mutations) [4]. NGS technology enabled fast and cost effective whole genome sequencing (WGS), which has recently led to large scale pan-cancer studies involving thousands of cancer patients [5,6].Typically, to study the full spectrum of mutations in cancer patients, a tumor and normal sample (commonly blood or nearby healthy tissue) are both sequenced and mapped to a human reference genome. Important to note is that the human reference genome is an aggregate of the full genome sequences across multiple donors and thus does not represent the genome of one individual [7]. Changes in nucleotide sequences between the sample and reference genome are considered 'variants'. Variants found in the normal sample are considered germline mutations, and these are subtracted from those found in the tumor t...