Linking epigenetic marks to clinical outcomes improves insight into molecular processes, disease prediction, and therapeutic target identification. Here, a statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly. Implicitly adjusting for probe correlations, data structure (cell-count or relatedness), and single-nucleotide polymorphism (SNP) marker effects, improves association estimates and in 9,448 individuals, 75.7% (95% CI 71.70–79.3) of body mass index (BMI) variation and 45.6% (95% CI 37.3–51.9) of cigarette consumption variation was captured by whole blood methylation array data. Pathway-linked probes of blood cholesterol, lipid transport and sterol metabolism for BMI, and xenobiotic stimuli response for smoking, showed >1.5 times larger associations with >95% posterior inclusion probability. Prediction accuracy improved by 28.7% for BMI and 10.2% for smoking over a LASSO model, with age-, and tissue-specificity, implying associations are a phenotypic consequence rather than causal.
Highlights d Bronze Age (BA) Helladic, Cycladic, and Minoan genomes from the Aegean were sequenced d 3,000 BCE Aegeans are homogeneous and derive ancestry mainly from Neolithic farmers d Neolithic Caucasus-like and BA Pontic-Caspian Steppe-like gene flow shaped the Aegean d Present-day Greeks are genetically similar to 2,000 BCE Aegeans from Northern Greece
Summary
We introduce mapache, a flexible, robust, and scalable pipeline to map, quantify and impute ancient and present-day DNA in a reproducible way. Mapache is implemented in the workflow manager Snakemake and is optimized for low-space consumption, allowing to efficiently (re)map large data sets—such as reference panels and multiple extracts and libraries per sample—to one or several genomes. Mapache can easily be customized or combined with other Snakemake tools.
Availability
Mapache is freely available on GitHub (https://github.com/sneuensc/mapache).
Supplementary information
The list of software (and their references) used in the pipeline and the benchmark results can be found in the Supplementary files. An extensive manual is provided at https://github.com/sneuensc/mapache/wiki.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.