2023
DOI: 10.1093/bib/bbad192
|View full text |Cite
|
Sign up to set email alerts
|

emPDBA: protein-DNA binding affinity prediction by combining features from binding partners and interface learned with ensemble regression model

Abstract: Protein–deoxyribonucleic acid (DNA) interactions are important in a variety of biological processes. Accurately predicting protein-DNA binding affinity has been one of the most attractive and challenging issues in computational biology. However, the existing approaches still have much room for improvement. In this work, we propose an ensemble model for Protein-DNA Binding Affinity prediction (emPDBA), which combines six base models with one meta-model. The complexes are classified into four types based on the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
4
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 42 publications
0
4
0
Order By: Relevance
“…Our previously proposed 60 × 4 residue–nucleotide pairwise preference was derived from the analysis of 1545 nonredundant protein–DNA complexes, which takes protein structure information into account . The eight kinds of protein secondary structures are calculated by DSSP software: α-helix (H), β-ladder (E), π-helix (I), β-bridge (B), turn (T), 3 10 -helix (G), bend (S), uncertain structure (M).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our previously proposed 60 × 4 residue–nucleotide pairwise preference was derived from the analysis of 1545 nonredundant protein–DNA complexes, which takes protein structure information into account . The eight kinds of protein secondary structures are calculated by DSSP software: α-helix (H), β-ladder (E), π-helix (I), β-bridge (B), turn (T), 3 10 -helix (G), bend (S), uncertain structure (M).…”
Section: Methodsmentioning
confidence: 99%
“…These features include residue interface preference from pairwise statistical potential, dynamic characteristics, and coevolutionary information. For the first one, it has been widely proven that the residue pairwise potential is of good robustness in a variety of applications such as molecular docking, folding, and function predictions due to its synthetical consideration of various physical interactions. , In our previous study on protein–RNA interactions, we developed the residue–nucleotide pairwise potential, which has been successfully applied in protein–RNA complex interface and structure predictions. Recently, for protein–DNA interactions, we proposed the 60 × 4 residue–nucleotide pairwise preference which takes molecular secondary structure information into account, and successfully applied it to the protein–DNA binding affinity prediction . With regard to dynamic characteristics, they are closely related to biomolecular functions such as ligand binding, allostery, and catalysis.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, physics-based simulations utilize principles of molecular mechanics and dynamics to simulate folding pathways, while homology modeling (e.g., algorithms such as PSI-BLAST, HHblits, and HMMER) leverages evolutionary relationships between proteins to infer structures [13][14][15][16][17][18][19][20]. Of recent further interest, machine learning techniques, particularly deep learning, have emerged as powerful tools for predicting protein structures by learning patterns from large datasets [4,[21][22][23][24][25][26][27][28][29][30]. Recent advancements in deep learning, exemplified by AlphaFold, have revolutionized protein structure prediction.…”
Section: Introductionmentioning
confidence: 99%
“…Some such examples include the PreDBA and PredPRBA, which are ML-based heterogeneous ensemble and gradient-boosted regression tree models, respectively, that require structure as input and are trained on a relatively small data set of 100 protein–DNA and 103 protein–RNA complexes, respectively. , Another ML-based tool, SAMPDI-3D, which is a gradient-boosting decision tree (DT) ML method, was trained on a total of 883 entries with 419 protein and 463 DNA mutants and also uses structural data as input . Other tools, such as mCSM-NA and mmCSM-NA, which rely on graph-based structural signatures, and emPDBA, which is an ensemble regression model, were trained on 331 (including mutants), 155 experimentally solved complexes, and 340 entries, respectively. The data set contained approximately 700 P-DNA complexes; however, this was refined in order to train and test DNAffinity, while PDA-Pred was trained on 391 entries. Both of these models are ML-based approaches and derive features from structural and/or simulation data. , Apart from ML-based tools, PremPDI uses optimized parameters from experimental sets of mutations (219 in total, with 49 unique protein–DNA complexes) .…”
Section: Introductionmentioning
confidence: 99%