2007
DOI: 10.1186/1471-2105-8-284
|View full text |Cite
|
Sign up to set email alerts
|

Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach

Abstract: BackgroundIncorrectly annotated sequence data are becoming more commonplace as databases increasingly rely on automated techniques for annotation. Hence, there is an urgent need for computational methods for checking consistency of such annotations against independent sources of evidence and detecting potential annotation errors. We show how a machine learning approach designed to automatically predict a protein's Gene Ontology (GO) functional class can be employed to identify potential gene annotation errors.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
26
0

Year Published

2008
2008
2018
2018

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 35 publications
(28 citation statements)
references
References 38 publications
2
26
0
Order By: Relevance
“…The Naive Bayes k -grams (NB k -grams) method [73] uses a sliding a window of size k along each sequence to generate a bag of k -grams representation of the sequence. Much like in the case of the Naive Bayes classifier described above, the Naïve Bayes k -grams classifier treats each k -gram in the bag to be independent of the others given the class label for the sequence.…”
Section: Methodsmentioning
confidence: 99%
“…The Naive Bayes k -grams (NB k -grams) method [73] uses a sliding a window of size k along each sequence to generate a bag of k -grams representation of the sequence. Much like in the case of the Naive Bayes classifier described above, the Naïve Bayes k -grams classifier treats each k -gram in the bag to be independent of the others given the class label for the sequence.…”
Section: Methodsmentioning
confidence: 99%
“…These range from the propagation of erroneous annotations to conflicts between the results of different annotation pipelines [11][12][13][14] Databases such as the curated part of UniProt offer the benefit of providing evidence for why a particular gene function has been assigned [15]. This is important, as a particularly common source of error is the assignment of function based solely on sequence similarity.…”
Section: Supplementary Informationmentioning
confidence: 99%
“…There have been multiple assessments of GO annotation correctness, often focusing on subsets of annotations (Andorf et al, 2007;Devos and Valencia, 2001;Naumoff et al, 2004;Park et al, 2005;Schnoes et al, 2009;Sˇkunca et al, 2012). Assessment of GO's structure independent of annotation has tended to focus on issues of redundancy within the ontology structure; that is, using different names for the same concept or different concepts for the same name (Alterovitz et al, 2007;Onsongo et al, 2008).…”
Section: Introductionmentioning
confidence: 99%