FireCloud, one of three NCI Cloud Pilots, is a collaborative genome analysis platform built on a cloud computing infrastructure. FireCloud aims to solve the many challenges presented by the increasingly large data sets and computing requirements employed in cancer research. However, cost uncertainty associated with cloud computing's pay-asyou-go model is proving to be a barrier to adoption of cloud computing. In this paper we present guidelines for optimizing workflows to minimize cost and reduce latency. Our guidelines include: (i) dynamic disk sizing to efficiently utilize virtual disks; (ii) tuned provisioning of virtual machines (VMs) using a performance monitoring tool; (iii) taking advantage of steep price discounts of preemptible VMs; and (iv) utilizing the optimal parallelization of a task's workload.
Background We have previously demonstrated that cerebrospinal fluid-derived B cells from early relapsing-remitting multiple sclerosis (RRMS) patients that express a VH4 gene accumulate specific replacement mutations that can be quantified as a score that identifies such patients as having or likely to convert to RRMS. Furthermore, we showed that next generation sequencing is an efficient method for obtaining the sequencing information required by this mutation scoring tool, originally developed using the less clinically viable single-cell Sanger sequencing. Objective To determine the accuracy of MSPrecise, the diagnostic test that identifies the presence of the RRMS-enriched mutation pattern from patient cerebrospinal fluid B cells. Methods Cerebrospinal fluid cell pellets were obtained from RRMS and other neurological disease (OND) patient cohorts. VH4 gene segments were amplified, sequenced by next generation sequencing and analyzed for mutation score. Results The diagnostic test showed a sensitivity of 75% on the RRMS cohort and a specificity of 88% on the OND cohort. The accuracy of the test in identifying RRMS patients or patients that will develop RRMS is 84%. Conclusion MSPrecise exhibits good performance in identifying patients with RRMS irrespective of time with RRMS.
BackgroundThe genes that produce antibodies and the immune receptors expressed on lymphocytes are not germline encoded; rather, they are somatically generated in each developing lymphocyte by a process called V(D)J recombination, which assembles specific, independent gene segments into mature composite genes. The full set of composite genes in an individual at a single point in time is referred to as the immune repertoire. V(D)J recombination is the distinguishing feature of adaptive immunity and enables effective immune responses against an essentially infinite array of antigens. Characterization of immune repertoires is critical in both basic research and clinical contexts. Recent technological advances in repertoire profiling via high-throughput sequencing have resulted in an explosion of research activity in the field. This has been accompanied by a proliferation of software tools for analysis of repertoire sequencing data. Despite the widespread use of immune repertoire profiling and analysis software, there is currently no standardized format for output files from V(D)J analysis. Researchers utilize software such as IgBLAST and IMGT/High V-QUEST to perform V(D)J analysis and infer the structure of germline rearrangements. However, each of these software tools produces results in a different file format, and can annotate the same result using different labels. These differences make it challenging for users to perform additional downstream analyses.ResultsTo help address this problem, we propose a standardized file format for representing V(D)J analysis results. The proposed format, VDJML, provides a common standardized format for different V(D)J analysis applications to facilitate downstream processing of the results in an application-agnostic manner. The VDJML file format specification is accompanied by a support library, written in C++ and Python, for reading and writing the VDJML file format.ConclusionsThe VDJML suite will allow users to streamline their V(D)J analysis and facilitate the sharing of scientific knowledge within the community. The VDJML suite and documentation are available from https://vdjserver.org/vdjml/. We welcome participation from the community in developing the file format standard, as well as code contributions.
Introduction Diffuse large B-cell lymphoma (DLBCL) subtypes can be identified based on immunohistochemistry, somatic mutation and gene expression profiles. These cell-of-origin (COO) subtypes have distinct biological and pathogenic characteristics. In addition, studies have shown the association of COO with drug response such as with rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP) as well as targeted therapy. Therefore, proper assessment of COO subgroup is an important step in treatment selection and outcome. In this study, we sought to develop predictive COO models using RNA-Seq based gene expression profiling and plasma proteomic data, focusing on the two defined major DLBCL subtypes - germinal center B cell-like (GCB) and activated B cell-like (ABC). Methods COO subgroups of patient samples were assigned by the Hans algorithm. Data from archival formalin-fixed paraffin-embedded (FFPE) tissues were obtained using the Illumina HiSeq platform (RNA-Seq). A subset of samples were used as a training set to select differentially expressed genes (DEGs) in ABC vs. GCB lymphomas to build support vector machine (SVM) classification models. The model with best leave-one-out cross validation (LOOCV) on the training set was applied to the remaining samples to assess its initial predictive power. Gene set enrichment analysis (GSEA, Broad Institute) and key pathway analysis (KPA, Clarivate Analytics) were also utilized to further explore the underlying biology of each COO subtype. Protein expression data using the Olink Proteomics platform was obtained from baseline patient plasma samples. Protein biomarkers to differentiate ABC and GCB subgroups were identified from a set of training samples and evaluated in independent cohorts. Due to notable batch effect, batch information was included and specified as a random factor in the model. Results Genes identified by Scott et al. (Blood 2014) for COO assignment were first tested in our RNA-Seq training data of 6 GCB and 8 ABC samples. Thirteen of 15 gene markers showed significant differences between the ABC and GCB subgroups. From these markers, we further selected 6 to build machine learning models based on fold change, false discovery rate and entropy. This 6-gene signature include 3 markers relatively up-regulated in ABC subtype and 3 up-regulated in GCB subtype. A SVM model with these genes achieved 100% LOOCV on the training data and correctly predicted COO of 20/22 samples in the validating cohort with 1 GCB and 1 ABC samples misclassified. These two samples were also misclassified if a larger panel of signature genes from Scott et al. (Blood 2014) was used. KPA on the DEGs from ABC vs. GCB predicted the activation of NFKB1and STAT4/5 transcription factors as key elements upstream of the DEGs, indicating promoted signaling of NFкB and STAT pathways in ABC subgroup. On the other hand, REST was predicted as an inhibited upstream regulator of some DEGs. RCOR1, a corepressor of REST, has significantly lower expression level in the ABC subgroup in our data. These may imply the inhibition of REST/RCOR1 pathway in ABC patients. Plasma protein data from two studies were used to form a training set with 21 GCB and 6 ABC. A set of differentially expressed analytes from ABC vs. GCB were identified which included several targets of the NFкB pathway. In an independent cohort containing 5 GCB and 4 ABC plasma samples, many of these same plasma proteins showed differential expression profiles between ABC and GCB, making them potential blood-based biomarkers for COO determination. Conclusions In this study, we built a SVM model with a subset of genes from Scott et al. (Blood 2014) to accurately predict COO of refractory DLBCL from archival FFPE tissue. Further analyses of the RNA-Seq data disclosed alterations in key transcriptional hubs between the different COO subgroups. Olink plasma data from independent cohorts demonstrated potential protein markers for a plasma-based differentiation of the ABC and GCB subtypes. These biomarkers and machine learning models are being further validated using additional datasets. Disclosures Liu: Incyte Research Institute: Employment, Equity Ownership. Lu:Incyte Research Institute: Employment, Equity Ownership. Dong:Incyte Research Institute: Employment, Equity Ownership. Liu:Incyte Research Institute: Employment, Equity Ownership. Salinas:Incyte Research Institute: Employment, Equity Ownership. Owens:Incyte Research Institute: Employment, Equity Ownership. Pratta:Incyte Research Institute: Employment, Equity Ownership. Smith:Incyte Research Institute: Employment, Equity Ownership. Tada:Incyte Research Institute: Employment, Equity Ownership. Newton:Incyte Research Institute: Employment, Equity Ownership. Burn:Incyte Research Institute: Employment, Equity Ownership.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.