GO PaD: the Gene Ontology Partition Database

Alterovitz, Gil; Xiang, Michael; Mohan, Mamta; Ramoni, Marco

doi:10.1093/nar/gkl799

Cited by 56 publications

(48 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A similar effect could occur if there are variations in coding practices within an institution, since a single diagnosis may be represented as several different ICD9 codes (which could get worse with usage of ICD-10). This is a limitation of LIMIT, which could be addressed if ICD9 codes were grouped, for example by using information theoretic approaches [45], into disease categories which have the same level of specificity.…”

Section: Discussionmentioning

confidence: 99%

An unsupervised learning method to identify reference intervals from a clinical database

Poole

Schroeder

Shah

2016

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Reference intervals are critical for the interpretation of laboratory results. The development of reference intervals using traditional methods is time consuming and costly. An alternative approach, known as an a posteriori method, requires an expert to enumerate diagnoses and procedures that can affect the measurement of interest. We develop a method, LIMIT, to use laboratory test results from a clinical database to identify ICD9 codes that are associated with extreme laboratory results, thus automating the a posteriori method. LIMIT was developed using sodium serum levels, and validated using potassium serum levels, both tests for which harmonized reference intervals already exist. To test LIMIT, reference intervals for total hemoglobin in whole blood were learned, and were compared with the hemoglobin reference intervals found using an existing a posteriori approach. In addition, prescription of iron supplements were used to identify individuals whose hemoglobin levels were low enough for a clinician to choose to take action. This prescription data indicating clinical action was then used to estimate the validity of the hemoglobin reference interval sets. Results show that LIMIT produces usable reference intervals for sodium, potassium and hemoglobin laboratory tests. The hemoglobin intervals produced using the data driven approaches consistently had higher positive predictive value and specificity in predicting an iron supplement prescription than the existing intervals. LIMIT represents a fast and inexpensive solution for calculating reference intervals, and shows that it is possible to use laboratory results and coded diagnoses to learn laboratory test reference intervals from clinical data warehouses.

show abstract

Section: Discussionmentioning

confidence: 99%

An unsupervised learning method to identify reference intervals from a clinical database

Poole

Schroeder

Shah

2016

Journal of Biomedical Informatics

View full text Add to dashboard Cite

show abstract

“…As an example of the former: any ontologically structured data point can be characterized with respect to information content (see e.g. Alterovitz et al, 2007, Lord et al, 2003a. Lord et al (2003b) found that this measure, in connection with sequence similarity, uncovered a number of genes in LocusLink that were manually mis-annotated (pp.…”

Section: Quantifying Quality Versus Quantifying Quantitymentioning

confidence: 99%

Manual curation is not sufficient for annotation of genomic databases

et al. 2007

View full text Add to dashboard Cite

show abstract

“…Assessment of GO's structure independent of annotation has tended to focus on issues of redundancy within the ontology structure; that is, using different names for the same concept or different concepts for the same name (Alterovitz et al, 2007;Onsongo et al, 2008). To the extent assessment of GO and its annotations are considered together, it is almost exclusively in the context of gene group enrichment analyses (Gross et al, 2012;Grossmann et al, 2007;Jantzen et al, 2011;Yang et al, 2011).…”

Section: Introductionmentioning

confidence: 99%

Assessing identity, redundancy and confounds in Gene Ontology annotations over time

Gillis

Pavlidis

2013

Bioinformatics

View full text Add to dashboard Cite

Motivation: The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored. Results: We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their 'functional identity' over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks. Availability: Data available at

show abstract

GO PaD: the Gene Ontology Partition Database

Cited by 56 publications

References 10 publications

An unsupervised learning method to identify reference intervals from a clinical database

An unsupervised learning method to identify reference intervals from a clinical database

Manual curation is not sufficient for annotation of genomic databases

Assessing identity, redundancy and confounds in Gene Ontology annotations over time

Contact Info

Product

Resources

About