2019
DOI: 10.1101/721423
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Supervised-learning is an accurate method for network-based gene classification

Abstract: Background: Assigning every human gene to specific functions, diseases, and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods such as supervised-learning and label-propagation that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine learning technique across fields, supervised-learning has been applied only in a few network-based studies for predicting pathway-, phenotype-, or disease-associated gen… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
31
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 12 publications
(31 citation statements)
references
References 90 publications
(121 reference statements)
0
31
0
Order By: Relevance
“…Some examples of this general task are: 1) classifying uncharacterized genes in a functional interaction network to cellular functions they might participate in (Liu et al , 2020) , and 3) classifying medical terms in a term co-occurrence network (mined from electronic health records) to semantic types (e.g. drug, disease, symptoms, etc.)…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Some examples of this general task are: 1) classifying uncharacterized genes in a functional interaction network to cellular functions they might participate in (Liu et al , 2020) , and 3) classifying medical terms in a term co-occurrence network (mined from electronic health records) to semantic types (e.g. drug, disease, symptoms, etc.)…”
Section: Discussionmentioning
confidence: 99%
“…Large-scale molecular networks are powerful models that capture interactions between biomolecules (genes, proteins, metabolites) on a genome scale (McGillivray et al , 2018) and provide a basis for predicting novel associations between individual genes/proteins and various cellular functions, phenotypic traits, and complex diseases (Liu et al , 2020;Sharan et al , 2007) . An area of research that has gained rapid adoption in network science across disciplines is learning low-dimensional numerical representations, or "embeddings", of nodes in a network for easily leveraging machine-learning (ML) algorithms to analyze large networks (Goyal and Ferrara, 2018;Cai et al , 2018;Hamilton et al , 2018) .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, even though the individual STRING datasets considered in this study were natively structured as networks, we used their adjacency matrices as sources of feature vectors for traditional classification algorithms such as SVM, LR and DT. Although this feature-based representation has been recently shown to be more effective for (gene) classification 37 , this assessment may need to be re-evaluated in the context of network integration. Additionally, in the current implementation of EI, we assigned all-zero feature vectors to proteins disconnected from an individual network.…”
Section: Discussionmentioning
confidence: 99%
“…For all the datasets, both individual and integrated, we used each protein's adjacency vector as its feature values vector for training and evaluating predictive models. This feature encoding has recently been shown to be effective for network-based gene classification 37 .…”
Section: Intermediate Base Prediction Vectors Generated For Ensemble mentioning
confidence: 99%