Introduction: Patients with sickle cell disease (SCD) are at increased risk of alloimmunization. Platelet refractoriness is a serious known complication and often seen in SCD patients who are heavily transfused and/or in the bone marrow transplantation (BMT) setting. Next generation sequencing (NGS) is an emerging and promising genotyping strategy in the context of blood typing, due its high throughput and its ability to detect both known and novel variants in the patient and donor population. Here we describe an algorithm to predict common and rare human platelet antigens (HPA) from NGS data, and its validation through Sanger sequencing. Design/Methods: Whole genome sequencing (WGS) was performed on stored blood samples from 621 SCD patients enrolled in 2 IRB-approved clinical studies. Our open source software RyLAN (Red Cell and Lymphocyte Antigen prediction from NGS) was utilized to translate WGS data into predicted RBC and platelet phenotypes. The 29 genomic variants interpreted by RyLAN in 6 HPA genes were correlated with Sanger sequencing. Results: Our study cohort consisted of 621 SCD patients (485 HbSS, 21 HbSβ0, 29 HbSβ+, 84 HbSC, 1HbS O Arab, and 1 HbSD). The mean age was 34.3 years, and 46% were male. Previous red cell transfusions were recorded in 62% of patients, and 3% were documented as never transfused. RyLAN software was executed as a singularity container in multithreaded mode, completing analysis of all 621 .bam WGS files in 18 hours. RyLAN predicted 237 unique extended platelet phenotype combinations in this cohort, with an average read depth of 33 in genomic areas of interest. Predictions for 10 platelet antigens in 26 participants, including rare phenotypes like HPA-25bw+ and HPA-13bw+, were confirmed by bidirectional Sanger. Conclusions: We describe an efficient, open-source algorithm used to interpret 6 HPA genes from WGS in a large SCD cohort. WGS, in conjunction with the RyLAN algorithm, demonstrated 100% accuracy in predicting common and rare HPA genomic variants. Future studies are needed to refine WGS algorithms in SCD, and to examine the possible value of this technology in HPA alloantibody identification workups, optimal platelet product allocation, and donor recruitment. Disclosures No relevant conflicts of interest to declare.
Introduction: Accurate typing of patient and donor red blood cell (RBC) antigens is critical for safe transfusion practice. Although blood typing is traditionally accomplished by serology, genotyping methods to predict RBC antigens have proven valuable in a growing number of situations such as recently-transfused patients, scarcity of typing reagents, and indeterminate serologic results. However current RBC genotyping assays address a limited number of blood group genes and associated variants, and may not detect novel genetic changes and certain rare but clinically-significant variants. Next generation sequencing (NGS) technology provides an appealing alternative technology, allowing the user to examine a patient's entire genome or exome in a high-throughput manner. Whereas efforts are underway in multiple fields to apply exome sequencing (ES) for diagnostic, prognostic, and treatment purposes, Transfusion Medicine, with its extensive clinical genomic database, should find ready application from this approach. We describe here the creation of an algorithm to interpret NGS into a predicted extended RBC phenotype, and its application to analyze ES data from 245 participants of the ClinSeq® sequencing cohort. Methods: RyLAN (Red Cell and Lymphocyte Antigen prediction from NGS) was created as an open-source Python application that takes an NGS sorted binary alignment matrix (.bam) file and index as input. The software interacts with a non-relational database that encodes genomic blood group coordinates and phenotype interpretation rules, and yields a predicted extended RBC phenotype and quality parameters. Hard filters for mapping quality, depth, vcf QUAL, and fraction of alternate allele can be modified per individual genomic coordinate. The output is provided as a MongoDB document to facilitate advanced bulk queries and statistical analysis. We employed RyLAN to analyze 245 ES NGS files from the ClinSeq® cohort, using a database of 176 known antigenic, null, and weak blood group single nucleotide variants in 27 blood group genes as input. Results: The cohort consisted of 115 females and 130 males; 89% of participants self-described as white race, non- Hispanic ethnicity. Three percent of participants self-described as Hispanic or Latino, 4% as Asian, 2% with African ancestry, and the remaining as mixed or unknown race. From the total 176 genomic positions analyzed, 160 were not addressed by current commercially-available RBC genotyping platforms. The average read depth for the positions of interest was 78.2, and the average vcf QUAL value was 968. The highest variant nucleotide frequency was observed at the Fya/Fyb and Jka/Jkb loci (275 and 223 total haplotype variant calls, respectively). Among other phenotypes, RyLAN predicted 4 instances of heterozygosity for the KEL*02N.17 allele, 5 heterozygous individuals for the weak FY* X allele, 32 total heterozygous samples for various weak Kidd alleles, 2 homozygous individuals for weak Kidd expression, 1 heterozygosity for Lu6/Lu9, 1 SC:1,2 case, 1 Co(a-b+) predicted phenotype, and a total of 19 RHAG*01.04 and 47 KLF1*BGM12 alleles. Limited areas of the BCAM, KLF1, KEL, FUT7, ERMAP and CR1 genes failed quality filters repeatedly, and careful review indicated that these regions were not captured in the ES libraries. The ACKR1 promoter GATA-binding site variant was present in every sample and predicted all cases of self-reported African ancestry. Conclusions: We describe a new, open-source informatics tool to translate NGS data into a predicted extended RBC phenotype, and demonstrate its application through the analysis of 245 ClinSeq® ES files. Most predicted antigen frequencies were as expected for the ethnic composition of our cohort. We detected a higher frequency of the RHAG p.V270I and KLF1 p.S102P variants than expected, findings that are in agreement with the 1000 Genomes Project and warrant further study. Our analysis also corroborates the relative frequency of the JK*01W.01 allele, and the presence of the JK*01W.03 and JK*01W.04 alleles in the Caucasian population, which can lead to serologic discrepancies in other genotyping platforms. Serologic confirmation of these findings is being conducted. Further study of genomic data across multiple ethnic groups can help refine knowledge of blood group gene polymorphisms and their clinical association. Disclosures No relevant conflicts of interest to declare.
Introduction: Red blood cell (RBC) transfusions are central in the management of sickle cell disease (SCD), an inherited hemoglobinopathy characterized by hemolysis, acute pain, and multi-systemic complications. Extended matching of patient and donor RBC antigens is an established strategy to minimize alloimmunization, which can make provision of compatible blood difficult and can result in severe, even lethal hemolytic transfusion reactions. While RBC genotype matching has proven valuable in SCD transfusion practice, current technologies are often limited in throughput and focus on selected blood groups and known variants. Limited information is available comparing whole genome sequencing (WGS) with other blood typing platforms in SCD. Design/Methods: WGS was performed on stored blood samples from 621 SCD patients recruited into two clinical studies. We utilized our open-source Python application (RyLAN), to translate WGS data into a predicted extended RBC and platelet phenotype. The 467 genomic variants interpreted by RyLAN in 41 genes were correlated with clinical and laboratory data in the immunohematology and electronic health records (Figure 1). Results: The 621 patients included 485 HbSS, 21 HbSb0, 29 HbSβ+, 84 HbSC, 1HbS O Arab, and 1 HbSD. The mean age was 34.3 ± 12.1 years, and 54% were female. Health records indicated that 383 (62%) patients had previously received RBC transfusions and 17(3%) had never been transfused; the status of the remaining 221 was unknown. RyLAN software was executed as a singularity container in multithreaded mode, completing the analysis of all 621 bam WGS files in 8.5 hours (8 CPUs and 16GB of memory per file). The average read depth for genomic positions of interest was 33 and the average QUAL value was 644. The highest variant allele frequency was detected at the Fyb, ACKR1 promoter, and the KCAM- loci (94%, 86% and 82%, respectively). Each of the 621 participants demonstrated a unique extended blood group genotype through WGS. RyLAN predicted 237 unique extended platelet phenotype combinations in this cohort, including HPA-25bw and HPA-13bw positive patients. Blood antigen WGS predictions were correlated with other typing methods in 112 individuals: 192 total serologic reactions for 8 antigens; 55 documented alloantibodies; 25 genomic variants in 71 participants by probe-elongation array; and PCR with sequence-specific primers for 8 variants in 13 individuals (Figure 1). Two instances of heterozygosity (Jka/Jkb and Doa/Dob) were undetected due to low read depth, and 8 unresolved discrepancies were identified: 2 with serology, 1 with a reported historical alloantibody, 2 with probe-elongation array determinations, and 1 with the PCR method. WGS detected multiple weak blood group variants, surpassing the sensitivity of serology in one complex case, as well as rare phenotypes including 4 Yka-, 5 Kna/Knb, 1 FORS1-, and 2 Jra- cases. The algorithm correctly predicted an Sla-negative RBC phenotype in a patient with documented anti-Sla alloantibody. Conclusion: We describe an efficient, open-source algorithm used to interpret 35 minor blood group and 6 platelet antigen genes from WGS in a large SCD cohort. Eight unresolved discrepancies were identified from 2126 correlation events with serology, alloimmunization history, and other genotyping methods in a subset of 112 individuals. WGS demonstrated higher sensitivity for weak antigen detection compared with serology, and a capacity to detect rare phenotypes not readily determined by other methods. Sanger resequencing is currently in progress to validate rare phenotype predictions and resolve remaining discrepancies. Future studies are needed to refine WGS algorithms in SCD, and determine the value of this technology for alloantibody identification, optimal blood group allocation, and donor recruitment. Figure 1. Figure 1. Disclosures No relevant conflicts of interest to declare.
Background: The 1000 Genomes Project provides a database of over 80 million genomic variants found across 2504 individuals from 26 populations. A current priority of the genomics field is to design information systems to translate this knowledge into clinical significance and patient care. The applications and advantages of red blood cell (RBC) antigen prediction through genotyping are widely accepted in transfusion medicine. Current technologies address a limited number of single nucleotide polymorphisms (SNPs) in 12 blood group genes, and our background knowledge of RBC phenotype distribution is often limited to a few populations. We analyzed the 1000 Genomes database with 4 objectives: 1) determine allele distributions of 46 blood group-related genes across the 5 genotyped superpopulations: Africa, East Asia, Europe, South Asia and the Americas; 2) identify possible new blood group alleles and their geographic association; 3) determine the feasibility of blood group genotyping by NGS; and 4) establish a scaffold of chromosomal coordinates to interpret NGS output files into a predicted RBC phenotype. Results: From the initial list of 46 blood group-related genes, we eliminated the five genes with known rearrangements and focused only on regions that met the strict criteria for accessibility through short, paired-end NGS reads (77% of 80.4kb). We mapped over 800 known alleles in coding and non-coding regions, and documented the 80 variants that were both present in the 1000 Genomes database and met the strict accessibility criteria. Sixty-four of these 80 variants are not addressed by current RBC genotyping technology. All 80 variants, including the ACKR1 promoter silencing mutation, are located within exon pull-down boundaries. The average low-coverage sequencing depth was 18,424x, with exome-sequencing confirmation at 65.7x depth. Twenty-three alleles had at least one novel population distribution, such as documentation of the Kpaallele for the first time in Africa and South Asia. From a total of 30 novel blood group continental frequencies, 14 correspond to a newfound presence in South Asia. 1000 Genomes identified a total of 926 missense mutations in blood group genes that met strict NGS mapping criteria, as well as multiple deletions. Two novel missense mutations in ERMAP and SLC14A1 are classified as likely antigenic, since they target the same amino acids responsible for the SCER- and Cr(a-) alleles. Six novel deletions involving the Lewis, H, Cromer, Indian and OK systems are also classified as likely-deleterious after careful analysis. For example, a novel in-frame 24bp deletion in SLC14A1 eliminates part of the intracytoplasmic tail, which is required for membrane localization and includes the 28G residue that defines JK*01W.03. Thus, this novel deletion is predicted to alter Kidd protein expression. The 8 novel alleles are distributed throughout the five superpopulations but are most frequently found in Africa. Four standard bioinformatics programs named SIFT, PolyPhen-2, Mutation Taster, and Mutation Assessor failed to detect half of the control known blood group alleles and thus are not adequate for the analysis of novel blood group variants in the transfusion medicine context. Conclusions: NGS can allow comprehensive, fast, and high-throughput RBC antigen prediction. All queried blood group alleles are amenable to targeted exome sequencing, and 77% of blood group coding sequences can be addressed with a short, paired-end NGS strategy. Based on 1000 Genomes, we created a database of the worldwide distribution of 80 known and 8 novel blood group variants, along with their chromosomal coordinates in the hg19 and GRCh38 assemblies. This database is the scaffold for the creation of a new transfusion medicine bioinformatics pipeline that will translate NGS .vcf output files into a predicted RBC phenotype. New algorithms that focus on exposed peptides and antigenicity are required for the analysis of novel variants identified by NGS in the immunohematology context. Disclosures No relevant conflicts of interest to declare.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.