BACKGROUND
The 1000 Genomes Project provides a database of genomic variants from whole genome sequencing of 2504 individuals across five continental superpopulations. This database can enrich our background knowledge of worldwide blood group variant geographic distribution and identify novel variants of potential clinical significance.
STUDY DESIGN AND METHODS
The 1000 Genomes database was analyzed to 1) expand knowledge about continental distributions of known blood group variants, 2) identify novel variants with antigenic potential and their geographic association, and 3) establish a baseline scaffold of chromosomal coordinates to translate next‐generation sequencing output files into a predicted red blood cell (RBC) phenotype.
RESULTS
Forty‐two genes were investigated. A total of 604 known variants were mapped to the GRCh37 assembly; 120 of these were reported by 1000 Genomes in at least one superpopulation. All queried variants, including the ACKR1 promoter silencing mutation, are located within exon pull‐down boundaries. The analysis yielded 41 novel population distributions for 34 known variants, as well as 12 novel blood group variants that warrant further validation and study. Four prediction algorithms collectively flagged 79 of 109 (72%) known antigenic or enzymatically detrimental blood group variants, while 4 of 12 variants that do not result in an altered RBC phenotype were flagged as deleterious.
CONCLUSION
Next‐generation sequencing has known potential for high‐throughput and extended RBC phenotype prediction; a database of GRCh37 and GRCh38 chromosomal coordinates for 120 worldwide blood group variants is provided as a basis for this clinical application.