2012
DOI: 10.1186/1471-2164-13-s7-s3
|View full text |Cite
|
Sign up to set email alerts
|

Bayesian prediction of bacterial growth temperature range based on genome sequences

Abstract: BackgroundThe preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments.ResultsThis study found a total of 40 protein families useful for distinction bet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
23
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(24 citation statements)
references
References 24 publications
1
23
0
Order By: Relevance
“…This approach resulted in a training dataset with 5,532 organisms annotated with OGT, as well as a dataset with 1,438 un-annotated organisms. The annotated training dataset comprises 4,974 bacteria, 222 archaea and 337 eukarya ( Figure 1) and is much larger than those used in other approaches, such as 22 bacteria(30 ), 77 bacteria (33 ) or 204 prokaryotes (32 ). In the annotated dataset the number of proteins in each organism follows a normal distribution centered around 3,000 (Figure 1i).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This approach resulted in a training dataset with 5,532 organisms annotated with OGT, as well as a dataset with 1,438 un-annotated organisms. The annotated training dataset comprises 4,974 bacteria, 222 archaea and 337 eukarya ( Figure 1) and is much larger than those used in other approaches, such as 22 bacteria(30 ), 77 bacteria (33 ) or 204 prokaryotes (32 ). In the annotated dataset the number of proteins in each organism follows a normal distribution centered around 3,000 (Figure 1i).…”
Section: Resultsmentioning
confidence: 99%
“…Additionally, Zeldovich found that the sum fraction of the seven amino acids I, V, Y, W, R, E and L showed a correlation coefficient as high as 0.93 with OGT in a dataset consisting of 204 proteomes of archaea and bacteria (32 ). Jensen et al developed a Bayesian classifier to distinguish three thermophilicity classes (thermophiles, mesophiles and psychrophiles) based on 77 bacteria with known OGT (33 ). Training datasets containing the OGTs for a large number of organisms have been hard to obtain, something which has prevented the development of state-of-the-art machine learning models for OGT prediction.…”
Section: Introductionmentioning
confidence: 99%
“…In general, cellular processes speed up as temperature increases, but extremely high temperatures can also denature proteins and negatively affect biochemical reactions. Proteins function best within a specific temperature window that maximizes enzymatic reaction rate without denaturing the protein, and most enzymes have evolved within an optimal temperature range that is closely tied to environmental temperature (1) . Few enzymes show optimal activity more than 10°C above or below the optimal growth temperature of the host organism (1,2) .…”
Section: Introductionmentioning
confidence: 99%
“…The NBC is a relatively simple classification method, but it has been shown to be useful in a wide range of fields, such as prediction of bacterial thermophilicity (Jensen et al, 2012), diagnosis of classical swine fewer (Geenen et al, 2011), and detection of clinical mastitis (Steeneveld et al, 2009). The NBC has advantages over comparable classification methods, such as artificial neural networks or logistic regression functions, because missing observations can be easily handled in an NBC by including only the observations that are available.…”
Section: Introductionmentioning
confidence: 99%