Hybridization is a key molecular process in biology and biotechnology, but to date there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here we report a weighted neighbor voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36nt subsequences of the CYCS and VEGF genes) at temperatures ranging from 28°C to 55°C. Automated feature selection and weighting optimization resulted in a final 6-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ≈91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows design of efficient probe sequences for genomics research.
he detection of specific DNA sequences is central to precision medicine, from pathogen identification to the risk assessment of human genetic diseases to disease prognosis. As the understanding of the genomics of disease improves, translation of this scientific knowledge into actionable clinical practice will be facilitated by DNA diagnostic systems that are simultaneously fast, affordable, sensitive, massively multiplexed, quantitative and easy to operate.Since the early 2000s, however, DNA-detection technologies have bifurcated into either massively multiplexed but slow systems (next-generation sequencing (NGS) 1,2 and microarrays 3,4 ), or rapid but lowly multiplexing assays (quantitative PCR (qPCR) 5,6 and isothermal amplification 7,8 ). Two notable exceptions to the slow but powerful or fast but limited trade-off are the Biofire FilmArray multiplex PCR system 9 and the Oxford Nanopore high-throughput sequencing system 10 (Table 1). However, these two technologies do not offer accurate quantification and are unable to reliably recognize single-nucleotide differences, which are relevant for genetic and metabolic risk assessments 11,12 , pharmacogenetic drug dosing 13 , cancer-therapy selection 14 and antimicrobial resistance to infectious disease 15,16 .Here we present the design and performance of a toroidal chamber for the detection of DNA via PCR. The device allows for scalable and massive multiplexing, rapid turnaround times, single-nucleotide discrimination and precise quantification in a portable, affordable and battery-powered instrument that uses closed consumables to minimize contamination risks (Table 1). The toroidal PCR system is enabled by two technologies: (1) reliable convection PCR using an annular reaction chamber, and (2) a pre-quenched microarray that allows multiplexed readout via spatial separation. Convection PCR achieves thermal cycling of a PCR reaction mixture using passive movement of fluid owing to temperature-induced density differences. Convection PCR was initially proposed and experimentally demonstrated in 2002 using capillary tubes heated to 95 °C at the bottom 17 . Although a number of publications have explored
Targeted high-throughput DNA sequencing is a primary approach for genomics and molecular diagnostics, and more recently as a readout for DNA information storage. Oligonucleotide probes used to enrich gene loci of interest have different hybridization kinetics, resulting in non-uniform coverage that increases sequencing costs and decreases sequencing sensitivities. Here, we present a deep learning model (DLM) for predicting Next-Generation Sequencing (NGS) depth from DNA probe sequences. Our DLM includes a bidirectional recurrent neural network that takes as input both DNA nucleotide identities as well as the calculated probability of the nucleotide being unpaired. We apply our DLM to three different NGS panels: a 39,145-plex panel for human single nucleotide polymorphisms (SNP), a 2000-plex panel for human long non-coding RNA (lncRNA), and a 7373-plex panel targeting non-human sequences for DNA information storage. In cross-validation, our DLM predicts sequencing depth to within a factor of 3 with 93% accuracy for the SNP panel, and 99% accuracy for the non-human panel. In independent testing, the DLM predicts the lncRNA panel with 89% accuracy when trained on the SNP panel. The same model is also effective at predicting the measured single-plex kinetic rate constants of DNA hybridization and strand displacement.
Hybridization is a key molecular process in biology and biotechnology, but to date there is no predictive model for accurately determining hybridization rate constants based on sequence information. To approach this problem systematically, we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 di↵erent DNA target and probe pairs (subsequences of the CYCS and VEGF genes) at temperatures ranging from 28 C to 55 C. Next, we rationally designed 38 features computable based on sequence, each feature individually correlated with hybridization kinetics. These features are used in our implementation of a weighted neighbor voting (WNV) algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants (a.k.a. labeled instances). Automated feature selection and weighting optimization resulted in a final 6-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 2 with ⇡74% accuracy and within a factor of 3 with ⇡92% accuracy, based on leave-one-out cross-validation. Predictive understanding of hybridization kinetics allows more e cient design of nucleic acid probes, for example in allowing sparse hybrid-capture panels to more quickly and economically enrich desired regions from genomic DNA.Hybridization of complementary DNA and RNA sequences is a fundamental molecular mechanism that underlies both biological processes [1-3] and nucleic acid analytic biotechnologies [4][5][6][7]. The thermodynamics of hybridization have been well-studied, and algorithms based on the nearestneighbor model of base stacking [8,9] predicts minimum free energy structures and melting temperatures [10,11] with reasonably good accuracy. In contrast, the kinetics of hybridization remain poorly understood, and to date no models or algorithms have been reported that accurately predict hybridization rate constants from sequence and reaction conditions (temperature, salinity). This knowledge deficiency has adversely impacted the research community by requiring either trial-and-error optimization of DNA primer and probe sequences for new genetic regions of interest, or brute-force use of thousands of DNA probes for target enrichment.Predictive modeling of hybridization kinetics faces two main challenges. First, the hybridization of complementary sequences can follow many di↵erent pathways, rendering simple reaction models inaccurate for a large fraction of DNA sequences. It is not practical to construct a comprehensive model that considers every potential DNA hybridization mechanism, due to the large variety of possible DNA sequences. Second, there is a very limited number of DNA sequences whose kinetics have been carefully directly, either in bulk solution [12][13][14] or at the single-molecule level [15][16][17]. One reason for the relative lack of data is the requirement of fluorophore-functionalized DNA oligonucleotides, which at roughly $200 per sequence becomes cost-prohibitive f...
Current platforms for molecular analysis of DNA markers are either limited in multiplexing (qPCR, isothermal amplification), turnaround time (microarrays, NGS), quantitation accuracy (isothermal amplification, microarray, nanopore sequencing), or specificity against single-nucleotide differences (microarrays, nanopore sequencing). Here, we present the Donut PCR platform that features high multiplexing, rapid turnaround times, single nucleotide discrimination, and precise quantitation of DNA targets in a portable, affordable, and battery-powered instrument using closed consumables that minimize contamination. We built a bread-board instrument prototype and three assays/chips to demonstrate the capabilities of Donut PCR: (1) a 9-plex mammal identification panel, (2) a 15-plex bacterial identification panel, and (3) a 30-plex human SNP genotyping assay. The limit of detection of the platform is under 10 genomic copies in under 30 minutes, and the quantitative dynamic range is at least 4 logs. We envision that this platform would be useful for a variety of applications where rapid and highly multiplexed nucleic acid detection is needed at the point of care.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.