BackgroundLow levels of sample contamination can have disastrous effects on the accurate identification of somatic variation in tumor samples. Detection of sample contamination in DNA is generally based on observation of low frequency variants that suggest more than a single source of DNA is present. This strategy works with standard DNA samples but is especially problematic in solid tumor FFPE samples because there can be huge variations in allele frequency (AF) due to massive copy number changes arising from large gains and losses across the genome. The tremendously variable allele frequencies make detection of contamination challenging. A method not based on individual AF is needed for accurate determination of whether a sample is contaminated and to what degree.MethodsWe used microhaplotypes to determine whether sample contamination is present. Microhaplotypes are sets of variants on the same sequencing read that can be unambiguously phased. Instead of measuring AF, the number and frequency of microhaplotypes is determined. Contamination detection becomes based on fundamental genomic properties, linkage disequilibrium (LD) and the diploid nature of human DNA, rather than variant frequencies. We optimized microhaplotype content based on 164 single nucleotide variant sets located in genes already sequenced within a cancer panel. Thus, contamination detection uses existing sequence data and does not require sequencing of any extraneous regions. The content is chosen based on LD data from the 1000 Genomes Project to be ancestry agnostic, providing the same sensitivity for contamination detection with samples from individuals of African, East Asian, and European ancestry.ResultsDetection of contamination at 1% and below is possible using this design. The methods described here can also be extended to other DNA mixtures such as forensic and non-invasive prenatal testing samples where DNA mixes of 1% or less can be similarly detected.ConclusionsThe microhaplotype method allows sensitive detection of DNA contamination in FFPE tumor samples. These methods provide a foundation for examining DNA mixtures in a variety of contexts. With the appropriate panels and high sequencing depth, low levels of secondary DNA can be detected and this can be valuable in a variety of applications.