Hybridization is a key molecular process in biology and biotechnology, but to date there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here we report a weighted neighbor voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36nt subsequences of the CYCS and VEGF genes) at temperatures ranging from 28°C to 55°C. Automated feature selection and weighting optimization resulted in a final 6-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ≈91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows design of efficient probe sequences for genomics research.
Hybridization is a key molecular process in biology and biotechnology, but to date there is no predictive model for accurately determining hybridization rate constants based on sequence information. To approach this problem systematically, we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 di↵erent DNA target and probe pairs (subsequences of the CYCS and VEGF genes) at temperatures ranging from 28 C to 55 C. Next, we rationally designed 38 features computable based on sequence, each feature individually correlated with hybridization kinetics. These features are used in our implementation of a weighted neighbor voting (WNV) algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants (a.k.a. labeled instances). Automated feature selection and weighting optimization resulted in a final 6-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 2 with ⇡74% accuracy and within a factor of 3 with ⇡92% accuracy, based on leave-one-out cross-validation. Predictive understanding of hybridization kinetics allows more e cient design of nucleic acid probes, for example in allowing sparse hybrid-capture panels to more quickly and economically enrich desired regions from genomic DNA.Hybridization of complementary DNA and RNA sequences is a fundamental molecular mechanism that underlies both biological processes [1-3] and nucleic acid analytic biotechnologies [4][5][6][7]. The thermodynamics of hybridization have been well-studied, and algorithms based on the nearestneighbor model of base stacking [8,9] predicts minimum free energy structures and melting temperatures [10,11] with reasonably good accuracy. In contrast, the kinetics of hybridization remain poorly understood, and to date no models or algorithms have been reported that accurately predict hybridization rate constants from sequence and reaction conditions (temperature, salinity). This knowledge deficiency has adversely impacted the research community by requiring either trial-and-error optimization of DNA primer and probe sequences for new genetic regions of interest, or brute-force use of thousands of DNA probes for target enrichment.Predictive modeling of hybridization kinetics faces two main challenges. First, the hybridization of complementary sequences can follow many di↵erent pathways, rendering simple reaction models inaccurate for a large fraction of DNA sequences. It is not practical to construct a comprehensive model that considers every potential DNA hybridization mechanism, due to the large variety of possible DNA sequences. Second, there is a very limited number of DNA sequences whose kinetics have been carefully directly, either in bulk solution [12][13][14] or at the single-molecule level [15][16][17]. One reason for the relative lack of data is the requirement of fluorophore-functionalized DNA oligonucleotides, which at roughly $200 per sequence becomes cost-prohibitive f...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.