2008
DOI: 10.1186/1471-2105-9-136
|View full text |Cite
|
Sign up to set email alerts
|

Mining phenotypes for gene function prediction

Abstract: Background: Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interference or gene knock-out, have been developed and used to decipher functions for genes. However, there have been relatively few efforts to make use of phenotype data beyond the single genotype-phenotype relationships.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
27
0

Year Published

2010
2010
2015
2015

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 42 publications
(28 citation statements)
references
References 45 publications
1
27
0
Order By: Relevance
“…The researcher collects many variables on each study subject and then wants to identify the variables that have an influence on the outcome variable. This problem becomes especially pronounced with modern high-throughput experiments where the number of variables p is often much larger than the number of observations n (e.g., genomics, transcriptomics, proteomics, metabolomics, metabonomics and phenomics; see, [1-6]) or in complex modeling situations with many potential predictors, where the aim is to find a meaningful non-linear model (see e.g., [7]). One of the major aims in the analysis of these high-dimensional data sets is to detect the signal variables S , while controlling the number of selected noise variables N .…”
Section: Introductionmentioning
confidence: 99%
“…The researcher collects many variables on each study subject and then wants to identify the variables that have an influence on the outcome variable. This problem becomes especially pronounced with modern high-throughput experiments where the number of variables p is often much larger than the number of observations n (e.g., genomics, transcriptomics, proteomics, metabolomics, metabonomics and phenomics; see, [1-6]) or in complex modeling situations with many potential predictors, where the aim is to find a meaningful non-linear model (see e.g., [7]). One of the major aims in the analysis of these high-dimensional data sets is to detect the signal variables S , while controlling the number of selected noise variables N .…”
Section: Introductionmentioning
confidence: 99%
“…In our own early work on using text for characterizing gene's function, we have introduced the use of probabilistic topic models applied to PubMed abstracts for representing sets of genes sharing a common function [53]. Van Driel et al [16] later use a similar idea for grouping and characterizing genes, by identifying similarities among the text describing their respective phenotypes, obtained from OMIM; Groth et al [21,22] also approach phenotype-based study of genes by applying a clustering technique to the textdescriptions of phenotypes, and associating text and keywords within it with GO categories. A text-based classification system by Stapley et al [57] used support vector machines to assign yeast proteins to subcellular locations; Nenadic et al [36] used a similar approach to annotate proteins with one of 11 biological process terms from the upper levels of the GO hierarchy.…”
Section: Introductionmentioning
confidence: 99%
“…This would seem equally important for ecotoxicology and should be encouraged. In the interim, approaches developed for phenotype clustering (phenoclustering) based on automated literature searching using semantic (text) clustering tools [116] may have some value for assisting in AOP development.…”
Section: Mining the Extant Literature For Relevant Informationmentioning
confidence: 99%