Amplicon-based next-generation sequencing (NGS) of immunoglobulin (IG) and T-cell receptor (TR) gene rearrangements for clonality assessment, marker identification and quantification of minimal residual disease (MRD) in lymphoid neoplasms has been the focus of intense research, development and application. However, standardization and validation in a scientifically controlled multicentre setting is still lacking. Therefore, IG/TR assay development and design, including bioinformatics, was performed within the EuroClonality-NGS working group and validated for MRD marker identification in acute lymphoblastic leukaemia (ALL). Five EuroMRD ALL reference laboratories performed IG/TR NGS in 50 diagnostic ALL samples, and compared results with those generated through routine IG/TR Sanger sequencing. A central polytarget quality control (cPT-QC) was used to monitor primer performance, and a central in-tube quality control (cIT-QC) was spiked into each sample as a library-specific quality control and calibrator. NGS identified 259 (average 5.2/sample, range 0–14) clonal sequences vs. Sanger-sequencing 248 (average 5.0/sample, range 0–14). NGS primers covered possible IG/TR rearrangement types more completely compared with local multiplex PCR sets and enabled sequencing of bi-allelic rearrangements and weak PCR products. The cPT-QC showed high reproducibility across all laboratories. These validated and reproducible quality-controlled EuroClonality-NGS assays can be used for standardized NGS-based identification of IG/TR markers in lymphoid malignancies.
High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.