Predicting multicellular function through multi-layer tissue networks

Žitnik, Marinka; Leskovec, Jure

doi:10.1093/bioinformatics/btx252

Cited by 464 publications

(317 citation statements)

References 52 publications

Supporting

Mentioning

317

Contrasting

Order By: Relevance

“…Akin to LP [38,75,76] , node embeddings also offer a convenient route to incorporating multiple networks into SL approaches. While methods such as SL-I and SL-A may require concatenating the original networks or integrating them into a single network before learning, recent work has shown that SL-E-based methods can embed information from multiple molecular/heterogeneous networks and learn gene classifiers in tandem [77][78][79][80][81][82][83][84][85] . However, none of these studies have compared the variety of SL-E methods to learning directly on the adjacency matrix.…”

Section: Discussionmentioning

confidence: 99%

Supervised-learning is an accurate method for network-based gene classification

Liu

Mancuso

Yannakopoulos

et al. 2019

Preprint

View full text Add to dashboard Cite

Background: Assigning every human gene to specific functions, diseases, and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods such as supervised-learning and label-propagation that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine learning technique across fields, supervised-learning has been applied only in a few network-based studies for predicting pathway-, phenotype-, or disease-associated genes. It is unknown how supervised-learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label-propagation, the widely-benchmarked canonical approach for this problem. Results:In this study, we present a comprehensive benchmarking of supervised-learning for network-based gene classification, evaluating this approach and a state-of-the-art label-propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised-learning on a gene's full network connectivity outperforms label-propagation and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label-propagation's appeal for naturally using network topology. We further show that supervised-learning on the full network is also superior to learning on node-embeddings (derived using node2vec ), an increasingly popular approach for concisely representing network connectivity. Conclusion:These results show that supervised-learning is an accurate approach for prioritizing genes associated with diverse functions, diseases, and traits and should be considered a staple of network-based gene classification workflows. The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available.

show abstract

Section: Discussionmentioning

confidence: 99%

Supervised-learning is an accurate method for network-based gene classification

Liu

Mancuso

Yannakopoulos

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…To this aim, multiples sources of omics data encode different layers, representing a biological system as a network of networks. This integrated perspective allows for more predictive performances [17,18,19] and has been shown to better characterize the evolution of complex diseases such as cancer [20], as well as to better understand the response to genetic and metabolic perturbations in complex organisms like E. coli [21].…”

Section: Multi-omicsmentioning

confidence: 99%

Multilayer network modeling of integrated biological systems

Domenico

2018

Physics of Life Reviews

View full text Add to dashboard Cite

Biological systems, from a cell to the human brain, are inherently complex. A powerful representation of such systems, described by an intricate web of relationships across multiple scales, is provided by complex networks. Recently, several studies are highlighting how simple networks -obtained by aggregating or neglecting temporal or categorical description of biological data -are not able to account for the richness of information characterizing biological systems. More complex models, namely multilayer networks, are needed to account for interdependencies, often varying across time, of biological interacting units within a cell, a tissue or parts of an organism.Gosak et al [1] review the most recent advances in the application of multilayer networks for modeling complex biological systems, from molecular interactions within a cell to neuronal connectivity of the human brain.

show abstract

“…A successful and intuitive application of multilayer gene networks is PARADIGM, a system that models the central dogma of biology (DNA–mRNA–protein) with multiple patient‐specific “omics” measurements, and uses probabilistic inference to identify altered protein activities in each patient. Another application regards the rewiring of protein–protein interactions in 107 human tissues by means of a multilevel interactome that was shown to capture tissue‐specific functions of the proteins …”

Section: Heterogeneous Network To Integrate All Of Biologymentioning

confidence: 99%

Formatting biological big data for modern machine learning in drug discovery

Duran‐Frigola

Fernández‐Torras

Bertoni

et al. 2018

WIREs Comput Mol Sci

View full text Add to dashboard Cite

Biological data is accumulating at an unprecedented rate, escalating the role of data‐driven methods in computational drug discovery. This scenario is favored by recent advances in machine learning algorithms, which are optimized for huge datasets and consistently beat the predictive performance of previous art, rapidly approaching human expert reasoning. The urge to couple biological data to cutting‐edge machine learning has spurred developments in data integration and knowledge representation, especially in the form of heterogeneous, multiplex and semantically‐rich biological networks. Today, thanks to the propitious rise in knowledge embedding techniques, these large and complex biological networks can be converted to a vector format that suits the majority of machine learning implementations. Here, we explain why this can be particularly transformative for drug discovery where, for decades, customary chemoinformatics methods have employed vector descriptors of compound structures as the standard input of their prediction tasks. A common vector format to represent biology and chemistry may push biological information into most of the existing steps of the drug discovery pipeline, boosting the accuracy of predictions and uncovering connections between small molecules and other biological entities such as targets or diseases. This article is categorized under: Computer and Information Science > Databases and Expert Systems Computer and Information Science > Chemoinformatics

show abstract

Predicting multicellular function through multi-layer tissue networks

Cited by 464 publications

References 52 publications

Supervised-learning is an accurate method for network-based gene classification

Supervised-learning is an accurate method for network-based gene classification

Multilayer network modeling of integrated biological systems

Formatting biological big data for modern machine learning in drug discovery

Contact Info

Product

Resources

About