2017
DOI: 10.1371/journal.pone.0174638
|View full text |Cite
|
Sign up to set email alerts
|

Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species

Abstract: Genes that are indispensable for survival are essential genes. Many features have been proposed for computational prediction of essential genes. In this paper, the least absolute shrinkage and selection operator method was used to screen key sequence-based features related to gene essentiality. To assess the effects, the selected features were used to predict the essential genes from 31 bacterial species based on a support vector machine classifier. For all 31 bacterial objects (21 Gram-negative objects and te… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

2
43
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(49 citation statements)
references
References 30 publications
(26 reference statements)
2
43
0
Order By: Relevance
“…We extracted features from gene nucleotide sequences and protein sequences. Several features derived from sequence data have been validated their usefulness in predicting gene essentiality in model organisms [10,16]. In this paper, we used the following sequence derived features: codon frequency, maximum relative synonymous codon usage (RSCUmax), codon adaptation index (CAI), gene length, GC content, amino acid frequency, and protein sequence length.…”
Section: Features Derived From Sequence Datamentioning
confidence: 99%
“…We extracted features from gene nucleotide sequences and protein sequences. Several features derived from sequence data have been validated their usefulness in predicting gene essentiality in model organisms [10,16]. In this paper, we used the following sequence derived features: codon frequency, maximum relative synonymous codon usage (RSCUmax), codon adaptation index (CAI), gene length, GC content, amino acid frequency, and protein sequence length.…”
Section: Features Derived From Sequence Datamentioning
confidence: 99%
“…Our experiments showed that DeeplyEssential has better predictive performance 291 both on down-sampled and clustered datasets. On the down-sampled dataset used 292 in [23], DeeplyEssential showed an improvement of 12.8% in AUC compared to [23] 293 and achieved a slightly better AUC on the network-based feature model [2]. In addition, 294 DeeplyEssential produced significantly better sensitivity and precision than the 295 three methods in Table 5, achieving 6.2% improved sensitivity and 137.4% improved 296 precision compare to [2].…”
Section: Comparison With Methods That Address Orthologus Genes 249mentioning
confidence: 93%
“…With the 14 introduction of large gene database such as DEG, CEG and OGEE [4, 25, 40], researchers 15 designed more complex prediction models using a wider set of features. These features 16 can be broadly categorized into (i) sequence features, i.e., codon frequency, GC content, 17 gene length [29, 35, 42], (ii) topological features, i.e., degree centrality, cluster 18 coefficient [1, 6, 24, 31], and (iii) functional features, i.e., homology, gene expression 19cellular localization, functional domain and molecular properties [5,9,23,30,39].Sequence based features can be directly obtained from the primary DNA sequence of 21 a gene and its corresponding protein sequence. Functional features such as network 22 topology requires knowledge of protein-protein interaction network, e.g., STRING and 23 HumanNET [15,37].…”
mentioning
confidence: 99%
See 2 more Smart Citations