Motivation
Use of multi-omics data carrying comprehensive signals about the disease is strongly desirable for understanding and predicting disease progression, cancer particularly as a serious disease with a high mortality rate. However, recent methods currently fail to effectively utilize the multi-omics data for cancer survival prediction and thus significantly limiting the accuracy of survival prediction using omics data.
Results
In this work, we constructed a deep learning model with multimodal representation and integration to predict the survival of patients using multi-omics data. We first developed an unsupervised learning part to extract high-level feature representations from omics data of different modalities. Then, we employed an attention-based method to integrate feature representations, produced by the unsupervised learning part, into a single compact vector and finally we fed the vector into fully connected layers for survival prediction. We used multimodal data to train the model and predict pancancer survival, and the results show that using multimodal data can lead to higher prediction accuracy compared to using single modal data. Furthermore, we used the concordance index and the 5-fold cross-validation method for comparing our proposed method with current state-of-the-art methods and our results show that our model achieves better performance on the majority of cancer types in our testing datasets.
Availability
https://github.com/ZhangqiJiang07/MultimodalSurvivalPrediction
The advancement of Omics technology has led to a surge in molecular and cell profiling data for mechanism study. Large amount of data and complex data structure pose a great challenge to data analysis. Modern machine learning methods such as deep learning are expected to take advantage such big data for accurate disease prediction or other related tasks. However, large feature number may bring large amount redundant information and adversely affect the accuracy of a classifier. To this end, feature selection methods can remove redundant information and help the model achieve higher accuracy by selecting informative features. In this paper, we propose a two-step deep learning-based method combining stacked denoising autoencoders (SDAE) with SVM-RFE to accomplish the task of feature selection. We compared our method with other related methods and the results showed that our approach achieved a better performance than other methods when using the TCGA datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.