We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the African-descended populations, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes while the rest appear to be intergenic.
The African Diaspora in the Western Hemisphere represents one of the largest forced migrations in history and had a profound impact on genetic diversity in modern populations. To date, the fine-scale population structure of descendants of the African Diaspora remains largely uncharacterized. Here we present genetic variation from deeply sequenced genomes of 642 individuals from North and South American, Caribbean and West African populations, substantially increasing the lexicon of human genomic variation and suggesting much variation remains to be discovered in African-admixed populations in the Americas. We summarize genetic variation in these populations, quantifying the postcolonial sex-biased European gene flow across multiple regions. Moreover, we refine estimates on the burden of deleterious variants carried across populations and how this varies with African ancestry. Our data are an important resource for empowering disease mapping studies in African-admixed individuals and will facilitate gene discovery for diseases disproportionately affecting individuals of African ancestry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.