Background: Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO terms; recently some research exploited word embeddings to learn vector representations for GO terms from a large-scale corpus. In this paper, we proposed a novel method, named GO2Vec, that exploits graph embeddings to learn vector representations for GO terms from GO graph. GO2Vec combines the information from both GO graph and GO annotations, and its learned vectors can be applied to a variety of bioinformatics applications, such as calculating functional similarity between proteins and predicting protein-protein interactions. Results: We conducted two kinds of experiments to evaluate the quality of GO2Vec: (1) functional similarity between proteins on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions on the Yeast and Human datasets from the STRING database. Experimental results demonstrate the effectiveness of GO2Vec over the information content-based measures and the word embedding-based measures. Conclusion: Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GO and GOA graphs. Our results also demonstrate that GO annotations provide useful information for computing the similarity between GO terms and between proteins.
Multi-omics data are increasingly being gathered for investigations of complex diseases such as cancer. However, high dimensionality, small sample size, and heterogeneity of different omics types pose huge challenges to integrated analysis. In this paper, we evaluate two network-based approaches for integration of multi-omics data in an application of clinical outcome prediction of neuroblastoma. We derive Patient Similarity Networks (PSN) as the first step for individual omics data by computing distances among patients from omics features. The fusion of different omics can be investigated in two ways: the network-level fusion is achieved using Similarity Network Fusion algorithm for fusing the PSNs derived for individual omics types; and the feature-level fusion is achieved by fusing the network features obtained from individual PSNs. We demonstrate our methods on two high-risk neuroblastoma datasets from SEQC project and TARGET project. We propose Deep Neural Network and Machine Learning methods with Recursive Feature Elimination as the predictor of survival status of neuroblastoma patients. Our results indicate that network-level fusion outperformed feature-level fusion for integration of different omics data whereas feature-level fusion is more suitable incorporating different feature types derived from same omics type. We conclude that the network-based methods are capable of handling heterogeneity and high dimensionality well in the integration of multi-omics.
For past few decades, key objectives of rational drug discovery have been the designing of specific and selective ligands for target proteins. Infectious diseases like malaria are continuously becoming resistant to traditional medicines, which inculcates need for new approaches to design inhibitors for antimalarial targets. A novel method for ab initio designing of multi target specific pharmacophores using the interaction field maps of active sites of multiple proteins has been developed to design 'specificity' pharmacophores for aspartic proteases. The molecular interaction field grid maps of active sites of aspartic proteases (plasmepsin II & IV from Plasmodium falciparum, plasmepsin from Plasmodium vivax, pepsin & cathepsin D from human) are calculated and common pharmacophoric features for favourable binding spots in active sites are extracted in the form of cliques of graphs using inductive logic programming (ILP). The two pharmacophore ensembles are constructed from largest common cliques by imposing size of receptor active site (L) and domain-specific receptor-ligand information (S). The overlap of chemical space between two ensembles and the results of virtual screening of inhibitor database with known activities show that this method can design efficient pharmacophores with no prior ligand information.
De novo design of drugs uses the three-dimensional structure of a target protein (often called the receptor) to design molecules (or ligands) that could bind to the receptor and hence inhibit its functioning. Thus, unlike a ligand-based approach, this form of drug design does not require prior knowledge of inhibitors. In this paper, the three-dimensional structure of a receptor is used indirectly, in the form of molecular interaction fields of the receptor and small molecules (or probes). In addition, we also use domain-specific constraints encoding basic geometric and pharmacological requirements imposed by the target. Interaction energies of one or more targets with a set of probes are used to identify threedimensional constraints that occur in many-preferably all-targets. In a graph-theoretic sense, the constraints are (small, fixed-size) cliques in graphs with labelled vertices representing probe-specific points of high interaction energy, and edges between a pair of vertices are labelled by the three-dimensional distance between the corresponding points of interaction. Our interest is in the discovery of frequent cliques that satisfy domain-specific constraints. In the paper, the discovery of such patterns is done using an Inductive Logic Programming (ILP) engine. The case for the use of ILP stems primarily from the explicit ways of incorporating domain-constraints, but any other technique capable of discovering frequent cliques from data can be used with some additional effort. The frequent cliques discovered are used to hypothesize pharmacophore-like structures on potential ligands. We test the utility of this approach by conducting a case study on the discovery of anti-malarials. Specifically, we test the approach on proteins belonging to the class of aspartic proteases. We are particularly interested in plasmepsin II, which is an enzyme in the haemoglobin degradation pathway of Plasmodium falciparum. We assess the pharmacophore-like constraints using: (a) a database of known inhibitors and non-inhibitors of aspartic proteases; and (b) a database of decoys that are physico-chemically similar to the aspartic proteases. Our results suggest that the
Background Cancers are genetically heterogeneous, so anticancer drugs show varying degrees of effectiveness on patients due to their differing genetic profiles. Knowing patient’s responses to numerous cancer drugs are needed for personalized treatment for cancer. By using molecular profiles of cancer cell lines available from Cancer Cell Line Encyclopedia (CCLE) and anticancer drug responses available in the Genomics of Drug Sensitivity in Cancer (GDSC), we will build computational models to predict anticancer drug responses from molecular features. Results We propose a novel deep neural network model that integrates multi-omics data available as gene expressions, copy number variations, gene mutations, reverse phase protein array expressions, and metabolomics expressions, in order to predict cellular responses to known anti-cancer drugs. We employ a novel graph embedding layer that incorporates interactome data as prior information for prediction. Moreover, we propose a novel attention layer that effectively combines different omics features, taking their interactions into account. The network outperformed feedforward neural networks and reported 0.90 for $$R^2$$ R 2 values for prediction of drug responses from cancer cell lines data available in CCLE and GDSC. Conclusion The outstanding results of our experiments demonstrate that the proposed method is capable of capturing the interactions of genes and proteins, and integrating multi-omics features effectively. Furthermore, both the results of ablation studies and the investigations of the attention layer imply that gene mutation has a greater influence on the prediction of drug responses than other omics data types. Therefore, we conclude that our approach can not only predict the anti-cancer drug response precisely but also provides insights into reaction mechanisms of cancer cell lines and drugs as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.