The accuracy with which cancer phenotypes can be predicted by selecting and combining molecular features is compromised by the large number of potential features available. In an effort to design a robust prognostic model to predict breast cancer survival, we hypothesized that signatures consisting of genes that are coexpressed in multiple cancer types should correspond to molecular events that are prognostic in all cancers, including breast cancer. We previously identified several such signatures-called attractor metagenes-in an analysis of multiple tumor types. We then tested our attractor metagene hypothesis as participants in the Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge. Using a rich training data set that included gene expression and clinical features for breast cancer patients, we developed a prognostic model that was independently validated in a newly generated patient data set. We describe our model, which was based on three attractor metagenes associated with mitotic chromosomal instability, mesenchymal transition, or lymphocyte-based immune recruitment. INTRODUCTIONMedical tests that incorporate molecular profiling of tumors for clinical decision-making (predictive tests) or prognosis (prognostic tests) are typically based on models that combine values associated with particular molecular features, such as the expression levels of specific genes. These genes are selected after analyzing rich gene expression data sets (acquired from testing patient tumors) annotated with clinical phenotypes such as drug responses or survival times. The data sets used to define a model are referred to as "training data sets." A computational technique is typically used to identify a number of genes that, when properly combined, are associated with a phenotype of interest in a statistically significant manner. The predictive power of the resulting model is later confirmed in independent "validation data sets."There are, however, vast numbers-tens or hundreds of thousandsof potentially relevant molecular features to choose from when developing a model, making it difficult to precisely identify those at the core of the biological mechanisms responsible for the phenotype of interest. Spurious or suboptimal predictions may occur, and the end result may be a model that only partly reflects physiological reality. Such a model may still be clinically useful, but there is room for improvement.One way to address this problem is by using molecular features preselected on the basis of previous knowledge. In such an approach, a training data set is used mainly for pinpointing the combination of preselected features that is most associated with the phenotype of interest. We used this approach during our participation in the Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge, an open challenge to build computational models that accurately predict breast cancer survival (hereinafter referred to as the Challenge) (1). Specifically, we hypothesized that selected gene coexpression signatures present in multiple cancer...
first draft of paper, designed experiments performed statistical analyses, performed bioinformatics analyses, performed data visualisation. M.T. wrote first draft of paper, designed experiments, generated tools & reagents, performed statistical analyses, performed bioinformatics analyses, performed data visualisation. S.M.G.E. wrote first draft of paper, generated tools & reagents, performed bioinformatics analyses, performed data visualisation. A.G.D. wrote first draft of paper, designed experiments, generated tools & reagents, performed bioinformatics analyses. M.D. generated tools & reagents. S.D. generated tools & reagents. L.Y.L. generated tools & reagents. S.S. generated tools & reagents. H.Z. generated tools & reagents. K.Z. generated tools & reagents, performed bioinformatics analyses. T.O.Y. generated tools & reagents, performed bioinformatics analyses. J.M.C. generated tools & reagents. A.B. generated tools & reagents. C.M.L. generated tools & reagents. I.U. generated tools & reagents. B.L. generated tools & reagents. W.Z. generated tools & reagents. A.D.E. generated tools & reagents, supervised research. NMW performed bioinformatics analyses, performed data visualisation. J.A.W. performed bioinformatics analyses. M.K.H.Z. performed bioinformatics analyses. C.V.A. performed bioinformatics analyses. C.P. performed data visualisation. J.T.S. supervised research. J.M.S. supervised research. D.A. supervised research. Y.G. supervised research. K.E. wrote first draft of paper, supervised research. D.C.W. designed experiments, supervised research. Q.D.M. wrote first draft of paper, designed experiments, generated tools & reagents, supervised research. P.V.L. wrote first draft of paper, designed experiments, supervised research. P.C.B. wrote first draft of paper, designed experiments, supervised research.
Mining gene expression profiles has proven valuable for identifying signatures serving as surrogates of cancer phenotypes. However, the similarities of such signatures across different cancer types have not been strong enough to conclude that they represent a universal biological mechanism shared among multiple cancer types. Here we present a computational method for generating signatures using an iterative process that converges to one of several precise attractors defining signatures representing biomolecular events, such as cell transdifferentiation or the presence of an amplicon. By analyzing rich gene expression datasets from different cancer types, we identified several such biomolecular events, some of which are universally present in all tested cancer types in nearly identical form. Although the method is unsupervised, we show that it often leads to attractors with strong phenotypic associations. We present several such multi-cancer attractors, focusing on three that are prominent and sharply defined in all cases: a mesenchymal transition attractor strongly associated with tumor stage, a mitotic chromosomal instability attractor strongly associated with tumor grade, and a lymphocyte-specific attractor.
Background: The winning model of the Sage Bionetworks/DREAM Breast Cancer Prognosis Challenge made use of several molecular features, called attractor metagenes, as well as another metagene defined by the average expression level of the two genes FGD3 and SUSD3. This is a follow-up study toward developing a breast cancer prognostic test derived from and improving upon that model.Methods: We designed a feature selector facility calculating the prognostic scores of combinations of features, including those that we had used earlier, as well as those used in existing breast cancer biomarker assays, identifying the optimal selection of features for the test.Results: The resulting test, called BCAM (Breast Cancer Attractor Metagenes), is universally applicable to all clinical subtypes and stages of breast cancer and does not make any use of breast cancer molecular subtype or hormonal status information, none of which provided additional prognostic value. BCAM is composed of several molecular features: the breast cancer-specific FGD3-SUSD3 metagene, four attractor metagenes present in multiple cancer types (CIN, MES, LYM, and END), three additional individual genes (CD68, DNAJB9, and CXCL12), tumor size, and the number of positive lymph nodes.Conclusions: Our analysis leads to the unexpected and remarkable suggestion that ER, PR, and HER2 status, or molecular subtype classification, do not provide additional prognostic value when the values of the FGD3-SUSD3 and attractor metagenes are taken into consideration.Impact: Our results suggest that BCAM's prognostic predictions show potential to outperform those resulting from existing breast cancer biomarker assays. Cancer Epidemiol Biomarkers Prev; 23(12); 2850-6. Ó2014 AACR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.