2020
DOI: 10.1093/bioinformatics/btaa196
|View full text |Cite
|
Sign up to set email alerts
|

Redundancy-weighting the PDB for detailed secondary structure prediction using deep-learning models

Abstract: Motivation The Protein Data Bank (PDB), the ultimate source for data in structural biology, is inherently imbalanced. To alleviate biases, virtually all structural biology studies use nonredundant (NR) subsets of the PDB, which include only a fraction of the available data. An alternative approach, dubbed redundancy-weighting (RW), down-weights redundant entries rather than discarding them. This approach may be particularly helpful for machine-learning (ML) methods that use the PDB as their s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 35 publications
0
6
0
Order By: Relevance
“…For example, for translocases (F-score: 0), we only have 228 experimental structures for training, validation, and testing. Furthermore, while experimental evidence is regarded as the ground truth when creating prediction models, the PDB has known issues such as redundancy 44 or enzyme-inhibitor complexes marked as enzyme-substrate complexes 45–47 . For serine endopeptidase La (EC: 3.4.21.53), none of the eight test PDBs are correctly predicted.…”
Section: Resultsmentioning
confidence: 99%
“…For example, for translocases (F-score: 0), we only have 228 experimental structures for training, validation, and testing. Furthermore, while experimental evidence is regarded as the ground truth when creating prediction models, the PDB has known issues such as redundancy 44 or enzyme-inhibitor complexes marked as enzyme-substrate complexes 45–47 . For serine endopeptidase La (EC: 3.4.21.53), none of the eight test PDBs are correctly predicted.…”
Section: Resultsmentioning
confidence: 99%
“…Other studies proposed novel architectures as well as input features, for example [36] , [56] . Besides the model’s architecture and features, several studies also proposed new training and test datasets that eliminate data redundancy such that the models trained using these datasets can achieve a good performance [13] , [57] , [58] .…”
Section: General Descriptionmentioning
confidence: 99%
“…• meshinr_dssp8Compatibility and meshirw_dssp8Compatibility_Weighted -Two slightly different measures of the agreement between the secondary structure (8 states) of the decoy's residues (as measured by DSSP 55 ) and their predicted secondary structure 49 .…”
Section: Author Contributions Statementmentioning
confidence: 99%
“…These terms include pair-wise atomic potentials 30,42,43 , torsion angles 44 , hydrogen bonds, and hydrogen bond patterns 45 , solvation terms, "meta" energy terms that consider the distribution of other terms within the protein atoms, an extended radius of gyration that takes into account different classes of amino acids (polar vs. non-polar, secondary structure elements vs. coil region, etc. ), and compatibility of the decoys with solvent exposure prediction 46 , and with 3-, 8-and 13-classes secondary structure predictions [47][48][49] .…”
Section: Introductionmentioning
confidence: 99%