BackgroundA close match of the HLA alleles between donor and recipient is an important prerequisite for successful unrelated hematopoietic stem cell transplantation. To increase the chances of finding an unrelated donor, registries recruit many hundred thousands of volunteers each year. Many registries with limited resources have had to find a trade-off between cost and resolution and extent of typing for newly recruited donors in the past. Therefore, we have taken advantage of recent improvements in NGS to develop a workflow for low-cost, high-resolution HLA typing.ResultsWe have established a straightforward three-step workflow for high-throughput HLA typing: Exons 2 and 3 of HLA-A, -B, -C, -DRB1, -DQB1 and -DPB1 are amplified by PCR on Fluidigm Access Array microfluidic chips. Illumina sequencing adapters and sample specific tags are directly incorporated during PCR. Upon pooling and cleanup, 384 samples are sequenced in a single Illumina MiSeq run. We developed “neXtype” for streamlined data analysis and HLA allele assignment. The workflow was validated with 1140 samples typed at 6 loci. All neXtype results were concordant with the Sanger sequences, demonstrating error-free typing of more than 6000 HLA loci. Current capacity in routine operation is 12,000 samples per week.ConclusionsThe workflow presented proved to be a cost-efficient alternative to Sanger sequencing for high-throughput HLA typing. Despite the focus on cost efficiency, resolution exceeds the current standards of Sanger typing for donor registration.
The high‐throughput department of DKMS Life Science Lab encounters novel human leukocyte antigen (HLA) alleles on a daily basis. To characterise these alleles, we have developed a system to sequence the whole gene from 5′‐ to 3′‐UTR for the HLA loci A, B, C, DQB1 and DPB1 for submission to the European Molecular Biology Laboratory – European Nucleotide Archive (EMBL‐ENA) and the IPD‐IMGT/HLA Database. Our workflow is based on a dual redundant sequencing strategy. Using shotgun sequencing on an Illumina MiSeq instrument and single molecule real‐time (SMRT) sequencing on a PacBio RS II instrument, we are able to achieve highly accurate HLA full‐length consensus sequences. Remaining conflicts are resolved using the R package DR2S (Dual Redundant Reference Sequencing). Given the relatively high throughput of this strategy, we have developed the semi‐automated web service TypeLoader, to aid in the submission of sequences to the EMBL‐ENA and the IPD‐IMGT/HLA Database. In the IPD‐IMGT/HLA Database release 3.24.0 (April 2016; prior to the submission of the sequences described here), only 5.2% of all known HLA alleles have been fully characterised together with intronic and UTR sequences. So far, we have applied our strategy to characterise and submit 1056 HLA alleles, thereby more than doubling the number of fully characterised alleles. Given the increasing application of next generation sequencing (NGS) for full gene characterisation in clinical practice, extending the HLA database concomitantly is highly desirable. Therefore, we propose this dual redundant sequencing strategy as a workflow for submission of novel full‐length alleles and characterisation of sequences that are as yet incomplete. This would help to mitigate the predominance of partially known alleles in the database.
Recent years have seen a rapid increase in the discovery of novel allelic variants of the human leukocyte antigen (HLA) genes. Commonly, only the exons encoding the peptide binding domains of novel HLA alleles are submitted. As a result, the IPD-IMGT/HLA Database lacks sequence information outside those regions for the majority of known alleles. This has implications for the application of the new sequencing technologies, which deliver sequence data often covering the complete gene. As these technologies simplify the characterization of the complete gene regions, it is desirable for novel alleles to be submitted as full-length sequences to the database. However, the manual annotation of full-length alleles and the generation of specific formats required by the sequence repositories is prone to error and time consuming. We have developed TypeLoader to address both these facets. With only the full-length sequence as a starting point, Typeloader performs automatic sequence annotation and subsequently handles all steps involved in preparing the specific formats for submission with very little manual intervention. TypeLoader is routinely used at the DKMS Life Science Lab and has aided in the successful submission of more than 900 novel HLA alleles as full-length sequences to the European Nucleotide Archive repository and the IPD-IMGT/HLA Database with a 95% reduction in the time spent on annotation and submission when compared with handling these processes manually. TypeLoader is implemented as a web application and can be easily installed and used on a standalone Linux desktop system or within a Linux client/server architecture. TypeLoader is downloadable from http://www.github.com/DKMS-LSL/typeloader.
HLA‐E is a member of the nonclassical HLA class Ib genes. Even though it is structurally highly similar to the classical HLA class Ia genes, it is less diverse and only 45 alleles and 12 proteins were known in December 2019 (IPD‐IMGT/HLA, release 3.38.0). Since 2017, we have genotyped over 3 million voluntary stem cell donors for HLA‐E by sequencing the most relevant allele‐determining bases of exons 2 and 3. As expected, most donors harbor the two predominant alleles HLA‐E*01:01 and/or HLA‐E*01:03. However, in 1666 (0.05%) of our samples we detected 345 distinct novel HLA‐E sequences. The most frequent one was identified in 162 samples and has by now been named HLA‐E*01:114. To characterize these novel alleles in full‐length, we used both short‐read Illumina and long‐read PacBio sequencing to obtain fully phased and highly accurate sequences. This resulted in 234 submissions to IPD‐IMGT/HLA comprising 170 novel HLA‐E alleles, which encode for 93 novel HLA‐E proteins, as well as 64 confirmations or sequence extensions. Consequently, the number of HLA‐E alleles in the database (release 3.42.0) has now increased to 256 HLA‐E alleles and 110 HLA‐E proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.