Extracting relevant information from large-scale data offers unprecedented opportunities in cancerology. We applied independent component analysis (ICA) to bladder cancer transcriptome data sets and interpreted the components using gene enrichment analysis and tumor-associated molecular, clinicopathological, and processing information. We identified components associated with biological processes of tumor cells or the tumor microenvironment, and other components revealed technical biases. Applying ICA to nine cancer types identified cancer-shared and bladder-cancer-specific components. We characterized the luminal and basal-like subtypes of muscle-invasive bladder cancers according to the components identified. The study of the urothelial differentiation component, specific to the luminal subtypes, showed that a molecular urothelial differentiation program was maintained even in those luminal tumors that had lost morphological differentiation. Study of the genomic alterations associated with this component coupled with functional studies revealed a protumorigenic role for PPARG in luminal tumors. Our results support the inclusion of ICA in the exploitation of multiscale data sets.
Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. In this study we compare feature selection methods on public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Surprisingly, complex wrapper and embedded methods generally do not outperform simple univariate feature selection methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.