22Approximately 5% of the human genome consists of structural variants, which are enriched for 23 genes involved in the immune response and cell-cell interactions. A well-established region of 24 extensive structural variation is the glycophorin gene cluster, comprising three tandemly-25 repeated regions about 120kb in length, carrying the highly homologous genes GYPA, GYPB 26 and GYPE. Glycophorin A and glycophorin B are glycoproteins present at high levels on the 27 surface of erythrocytes, and they have been suggested to act as decoy receptors for viral 28 pathogens. They act as receptors for invasion of a causative agent of malaria, Plasmodium 29 falciparum. A particular complex structural variant (DUP4) that creates a GYPB/GYPA fusion 30 gene is known to confer resistance to malaria. Many other structural variants exist, and remain 31 poorly characterised. Here, we analyse sequences from 6466 genomes from across the world for 32 structural variation at the glycophorin locus, confirming 15 variants in the 1000 Genomes 33 project cohort, discovering 9 new variants, and characterising a selection using fibre-FISH and 34 breakpoint mapping. We identify variants predicted to create novel fusion genes and a common 35 inversion duplication variant at appreciable frequencies in West Africans. We show that almost 36 all variants can be explained by unequal cross over events (non-allelic homologous 37 recombination, NAHR) and. by comparing the structural variant breakpoints with 38 Page 2 of 32 recombination hotspot maps, show the importance of a particular meiotic recombination 39 hotspot on structural variant formation in this region. 40 41
Glycophorin A and glycophorin B are red blood cell surface proteins and are both receptors for the parasite Plasmodium falciparum , which is the principal cause of malaria in sub-Saharan Africa. DUP4 is a complex structural genomic variant that carries extra copies of a glycophorin A-glycophorin B fusion gene and has a dramatic effect on malaria risk by reducing the risk of severe malaria by up to 40%. Using fiber-FISH and Illumina sequencing, we validate the structural arrangement of the glycophorin locus in the DUP4 variant and reveal somatic variation in copy number of the glycophorin B-glycophorin A fusion gene. By developing a simple, specific, PCR-based assay for DUP4, we show that the DUP4 variant reaches a frequency of 13% in the population of a malaria-endemic village in south-eastern Tanzania. We genotype a substantial proportion of that village and demonstrate an association of DUP4 genotype with hemoglobin levels, a phenotype related to malaria, using a family-based association test. Taken together, we show that DUP4 is a complex structural variant that may be susceptible to somatic variation and show that DUP4 is associated with a malarial-related phenotype in a longitudinally followed population.
Glycophorin A and glycophorin B are red blood cell surface proteins that are both receptors for the parasite Plasmodium falciparum, which is the principal cause of malaria in sub-Saharan Africa. DUP4 is a complex structural genomic variant that carries extra copies of a glycophorin A - glycophorin B fusion gene, and has a dramatic effect on malaria risk by reducing the risk of severe malaria by up to 40%. Using fiber-FISH and Illumina sequencing, we validate the structural arrangement of the glycophorin locus in the DUP4 variant, and reveal somatic variation in copy number of the glycophorin A-glycophorin B fusion gene. By developing a simple, specific, PCR-based assay for DUP4 we show the DUP4 variant reaches a frequency of 13% in a village in south-eastern Tanzania. We genotype a substantial proportion of that village and demonstrate an association of DUP4 genotype with hemoglobin levels, a phenotype related to malaria, using a family-based association test. Taken together, we show that DUP4 is a complex structural variant that may be susceptible to somatic variation, and show that it is associated with a malarial-related phenotype in a non-hospitalized population.Significance statementPrevious work has identified a human complex genomic structural variant called DUP4, which includes two novel glycophorin A-glycophorin B fusion genes, is associated with a profound protection against severe malaria. In this study, we present data showing the molecular basis of this complex variant. We also show evidence of somatic variation in the copy number of the fusion genes. We develop a simple robust assay for this variant and demonstrate that DUP4 is at an appreciable population frequency in Tanzania and that it is associated with higher hemoglobin levels in a malaria-endemic village. We suggest that DUP4 is therefore protective against malarial anemia.
Background Approximately 5% of the human genome shows common structural variation, which is enriched for genes involved in the immune response and cell-cell interactions. A well-established region of extensive structural variation is the glycophorin gene cluster, comprising three tandemly-repeated regions about 120 kb in length and carrying the highly homologous genes GYPA, GYPB and GYPE. Glycophorin A (encoded by GYPA) and glycophorin B (encoded by GYPB) are glycoproteins present at high levels on the surface of erythrocytes, and they have been suggested to act as decoy receptors for viral pathogens. They are receptors for the invasion of the protist parasite Plasmodium falciparum, a causative agent of malaria. A particular complex structural variant, called DUP4, creates a GYPB-GYPA fusion gene known to confer resistance to malaria. Many other structural variants exist across the glycophorin gene cluster, and they remain poorly characterised. Results Here, we analyse sequences from 3234 diploid genomes from across the world for structural variation at the glycophorin locus, confirming 15 variants in the 1000 Genomes project cohort, discovering 9 new variants, and characterising a selection of these variants using fibre-FISH and breakpoint mapping at the sequence level. We identify variants predicted to create novel fusion genes and a common inversion duplication variant at appreciable frequencies in West Africans. We show that almost all variants can be explained by non-allelic homologous recombination and by comparing the structural variant breakpoints with recombination hotspot maps, confirm the importance of a particular meiotic recombination hotspot on structural variant formation in this region. Conclusions We identify and validate large structural variants in the human glycophorin A-B-E gene cluster which may be associated with different clinical aspects of malaria.
Structural variation in the human genome can affect risk of disease. An example is a complex structural variant of the human glycophorin gene cluster, called DUP4, which is associated with a clinically significant level of protection against severe malaria. The human glycophorin gene cluster harbours at least 23 distinct structural variants, and accurate genotyping of this complex structural variation remains a challenge. Here, we use a polymerase chain reaction‐based strategy to genotype structural variation at the human glycophorin gene cluster, including the alleles responsible for the U– blood group. We validate our approach, based on a triplex paralogue ratio test, on publically available samples from the 1000 Genomes project. We then genotype 574 individuals from a longitudinal birth cohort (Tori‐Bossito cohort) using small amounts of DNA at low cost. Our approach readily identifies known deletions and duplications, and can potentially identify novel variants for further analysis. It will allow exploration of genetic variation at the glycophorin locus, and investigation of its relationship with malaria, in large sample sets at minimal cost, using standard molecular biology equipment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.