2021
DOI: 10.1016/j.patter.2021.100329
|View full text |Cite
|
Sign up to set email alerts
|

Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake

Abstract: Summary DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 37 publications
0
5
0
Order By: Relevance
“…The number of features yielding the highest averaged AUC was chosen. Based on this selection, the best‐performing outer model judged by the highest AUC, was used for generating interpretable predictions using SHapley Additive exPlanations (SHAP) 26,27 . The average of the absolute Shapley values indicates the global feature importance for predicting recurrence.…”
Section: Methodsmentioning
confidence: 99%
“…The number of features yielding the highest averaged AUC was chosen. Based on this selection, the best‐performing outer model judged by the highest AUC, was used for generating interpretable predictions using SHapley Additive exPlanations (SHAP) 26,27 . The average of the absolute Shapley values indicates the global feature importance for predicting recurrence.…”
Section: Methodsmentioning
confidence: 99%
“…Feature importance reflects the relative contribution degree as a continuous value when the factor is considered in the tree algorithm. Cross-validation testing was also included in the AI analysis because it improves the general applicability and prevents overfitting [ 16 , 17 ]. The age, sex, physical activity, smoking status, pharmacotherapies (for hypertension, diabetes, dyslipidemia), cardiovascular disease history, and alcohol consumption were added as confounding factors to the prediction models.…”
Section: Methodsmentioning
confidence: 99%
“…There are over 1100 citations to the original NDB article. The type of research enabled by the NDB includes DNA conformational analyses [39], DNA structure prediction [40], RNA structure prediction [41], analyses of protein-nucleic acid interactions [42,43], and the creation of new specialty databases [44]. In our research, we have used the NDB to study a variety of aspects of nucleic acids.…”
Section: Research Enabled By the Ndbmentioning
confidence: 99%