High-dimensional omics data contain intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data, due to the large number of molecular features and small number of available samples, which is also called “the curse of dimensionality” in machine learning. To tackle this problem and pave the way for machine learning-aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed supports multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy compared to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various applications of high-dimensional omics data and has great potential to facilitate more accurate and personalised clinical decision making.
The recent outbreak of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), which causes coronavirus disease 2019 (COVID-19), has led to a worldwide pandemic. One week after initial symptoms develop, a subset of patients progresses to severe disease, with high mortality and limited treatment options. To design novel interventions aimed at preventing spread of the virus and reducing progression to severe disease, detailed knowledge of the cell types and regulating factors driving cellular entry is urgently needed. Here we assess the expression patterns in genes required for COVID-19 entry into cells and replication, and their regulation by genetic, epigenetic and environmental factors, throughout the respiratory tract using samples collected from the upper (nasal) and lower airways (bronchi). Matched samples from the upper and lower airways show a clear increased expression of these genes in the nose compared to the bronchi and parenchyma. Cellular deconvolution indicates a clear association of these genes with the proportion of secretory epithelial cells. Smoking status was found to increase the majority of COVID-19 related genes including ACE2 and TMPRSS2 but only in the lower airways, which was associated with a significant increase in the predicted proportion of goblet cells in bronchial samples of current smokers. Both acute and second hand smoke were found to increase ACE2 expression in the bronchus. Inhaled corticosteroids decrease ACE2 expression in the lower airways. No significant effect of genetics on ACE2 expression was observed, but a strong association of DNA- methylation with ACE2 and TMPRSS2- mRNA expression was identified in the bronchus.
The lack of explainability is one of the most prominent disadvantages of deep learning applications in omics. This ‘black box’ problem can undermine the credibility and limit the practical implementation of biomedical deep learning models. Here we present XOmiVAE, a variational autoencoder (VAE)-based interpretable deep learning model for cancer classification using high-dimensional omics data. XOmiVAE is capable of revealing the contribution of each gene and latent dimension for each classification prediction and the correlation between each gene and each latent dimension. It is also demonstrated that XOmiVAE can explain not only the supervised classification but also the unsupervised clustering results from the deep learning network. To the best of our knowledge, XOmiVAE is one of the first activation level-based interpretable deep learning models explaining novel clusters generated by VAE. The explainable results generated by XOmiVAE were validated by both the performance of downstream tasks and the biomedical knowledge. In our experiments, XOmiVAE explanations of deep learning-based cancer classification and clustering aligned with current domain knowledge including biological annotation and academic literature, which shows great potential for novel biomedical knowledge discovery from deep learning models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.