Background Modern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care. However, the design of precision medicine tests for unmet clinical needs from this information in the small cohorts available for test discovery remains a challenging task. Obtaining reliable performance assessments at the earliest stages of test development can also be problematic. We describe a novel approach to classifier development designed to create clinically useful tests together with reliable estimates of their performance. The method incorporates elements of traditional and modern machine learning to facilitate the use of cohorts where the number of samples is less than the number of measured patient attributes. It is based on a hierarchy of classification and information abstraction and combines boosting, bagging, and strong dropout regularization. Results We apply this dropout-regularized combination approach to two clinical problems in oncology using mRNA expression and associated clinical data and compare performance with other methods of classifier generation, including Random Forest. Performance of the new method is similar to or better than the Random Forest in the two classification tasks used for comparison. The dropout-regularized combination method also generates an effective classifier in a classification task with a known confounding variable. Most importantly, it provides a reliable estimate of test performance from a relatively small development set of samples. Conclusions The flexible dropout-regularized combination approach is able to produce tests tailored to particular clinical questions and mitigate known confounding effects. It allows the design of molecular diagnostic tests addressing particular clinical questions together with reliable assessment of whether test performance is likely to be fit-for-purpose in independent validation at the earliest stages of development. Electronic supplementary material The online version of this article (10.1186/s12859-019-2922-2) contains supplementary material, which is available to authorized users.
Background Modern molecular profiling techniques are yielding vast amounts of data from patient samples that could be utilized with machine learning methods to provide important biological insights and improvements in patient outcomes. Unsupervised methods have been successfully used to identify molecularly-defined disease subtypes. However, these approaches do not take advantage of potential additional clinical outcome information. Supervised methods can be implemented when training classes are apparent (e.g., responders or non-responders to treatment). However, training classes can be difficult to define when assessing relative benefit of one therapy over another using gold standard clinical endpoints, since it is often not clear how much benefit each individual patient receives. Results We introduce an iterative approach to binary classification tasks based on the simultaneous refinement of training class labels and classifiers towards self-consistency. As training labels are refined during the process, the method is well suited to cases where training class definitions are not obvious or noisy. Clinical data, including time-to-event endpoints, can be incorporated into the approach to enable the iterative refinement to identify molecular phenotypes associated with a particular clinical variable. Using synthetic data, we show how this approach can be used to increase the accuracy of identification of outcome-related phenotypes and their associated molecular attributes. Further, we demonstrate that the advantages of the method persist in real world genomic datasets, allowing the reliable identification of molecular phenotypes and estimation of their association with outcome that generalizes to validation datasets. We show that at convergence of the iterative refinement, there is a consistent incorporation of the molecular data into the classifier yielding the molecular phenotype and that this allows a robust identification of associated attributes and the underlying biological processes. Conclusions The consistent incorporation of the structure of the molecular data into the classifier helps to minimize overfitting and facilitates not only good generalization of classification and molecular phenotypes, but also reliable identification of biologically relevant features and elucidation of underlying biological processes.
The remarkable success of immune checkpoint inhibitors (ICIs) has given hope of cure for some patients with advanced cancer; however, the fraction of responding patients is 15–35%, depending on tumor type, and the proportion of durable responses is even smaller. Identification of biomarkers with strong predictive potential remains a priority. Until now most of the efforts were focused on biomarkers associated with the assumed mechanism of action of ICIs, such as levels of expression of programmed death-ligand 1 (PD-L1) and mutation load in tumor tissue, as a proxy of immunogenicity; however, their performance is unsatisfactory. Several assays designed to capture the complexity of the disease by measuring the immune response in tumor microenvironment show promise but still need validation in independent studies. The circulating proteome contains an additional layer of information characterizing tumor–host interactions that can be integrated into multivariate tests using modern machine learning techniques. Here we describe several validated serum-based proteomic tests and their utility in the context of ICIs. We discuss test performances, demonstrate their independence from currently used biomarkers, and discuss various aspects of associated biological mechanisms. We propose that serum-based multivariate proteomic tests add a missing piece to the puzzle of predicting benefit from ICIs.
Mass-spectrometry-based analyses have identified a variety of candidate protein biomarkers that might be crucial for epithelial ovarian cancer (EOC) development and therapy response. Comprehensive validation studies of the biological and clinical implications of proteomics are needed to advance them toward clinical use. Using the Deep MALDI method of mass spectrometry, we developed and independently validated (development cohort: n = 199, validation cohort: n = 135) a blood-based proteomic classifier, stratifying EOC patients into good and poor survival groups. We also determined an age dependency of the prognostic performance of this classifier, and our protein set enrichment analysis showed that the good and poor proteomic phenotypes were associated with, respectively, lower and higher levels of complement activation, inflammatory response, and acute phase reactants. This work highlights that, just like molecular markers of the tumor itself, the systemic condition of a patient (partly reflected in proteomic patterns) also influences survival and therapy response in a subset of ovarian cancer patients and could therefore be integrated into future processes of therapy planning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.