2013
DOI: 10.1093/bioinformatics/btt029
|View full text |Cite
|
Sign up to set email alerts
|

PreDNA: accurate prediction of DNA-binding sites in proteins by integrating sequence and geometric structure information

Abstract: Evaluated on a new non-redundant protein set with 224 chains, the method has 80.7% sensitivity and 82.9% specificity in the 5-fold cross-validation test. In addition, it predicts DNA-binding sites with 85.1% sensitivity and 85.3% specificity when tested on a dataset with 62 protein-DNA complexes. Compared with a recently published method, BindN+, our method predicts DNA-binding sites with a 7% better area under the receiver operating characteristic curve value when tested on the same dataset. Many important pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
48
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 44 publications
(48 citation statements)
references
References 37 publications
0
48
0
Order By: Relevance
“…As there are a number of combination methods and concatenation methods, we only consider the state-of-the-art works for the respective groups. Consequently, Ma et al’s work using combination method [56] and Li et al’s work [32] using the concatenation methods are used for comparison. In Ma et al’s work, it used PSSM with four physicochemical properties including the lone electron pairs, hydrophobicity, side chain pKa value and molecular mass are combined to calculate the feature representation for residues.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…As there are a number of combination methods and concatenation methods, we only consider the state-of-the-art works for the respective groups. Consequently, Ma et al’s work using combination method [56] and Li et al’s work [32] using the concatenation methods are used for comparison. In Ma et al’s work, it used PSSM with four physicochemical properties including the lone electron pairs, hydrophobicity, side chain pKa value and molecular mass are combined to calculate the feature representation for residues.…”
Section: Resultsmentioning
confidence: 99%
“…The similarity between any two proteins in PDNA-62 is less than 25%. The second benchmarking dataset, PDNA-224, is a recently developed dataset for DNA-binding residue prediction [32], which contains 224 protein sequences. The 224 protein sequences are extracted from 224 protein-DNA complexes retrieved from PDB [31] by using the cut-off pair-wise sequence similarity of 25%.…”
Section: Methodsmentioning
confidence: 99%
“…SVM is one of the most common machine learning algorithm used for development of several bioinformatics prediction methods [15], [26][33]. SVM takes a set of feature vector attributes along with their real output as input.…”
Section: Methodsmentioning
confidence: 99%
“…Sensitivity Specificity MCC AUC-ROC AUC-PR To further investigate the performance of JSD-based features proposed in this study, we analyzed two additional datasets, namely RBscore [2] and PreDNA datasets [37]. Although the RBscore and PreDNA datasets initially contain 381 and 224 DNA-binding proteins, respectively, we have eliminated a few proteins since they are either included in our training dataset or ineligible due to their MSAs.…”
Section: Featurementioning
confidence: 99%