DNA-wrapped single-walled carbon nanotubes (SWCNTs) have demonstrated great versatility in their use as optical sensors. SWCNTs emit a near-infrared fluorescence that is responsive to even the slightest changes in the nanotube environment, enabling sensors that can respond to single-molecule fluctuations within the vicinity of their surfaces. The fluorescence response and surface interactions of these sensors are determined by the DNA wrapping sequence. However, the lack of information on the relationship between the DNA sequence and its effect on the SWCNT fluorescence remains a bottleneck for designing sequences that are specific to analytes of interest. We have recently demonstrated the use of directed evolution to control the fluorescence response of SWCNTs through DNA design. Iterative cycles of DNA mutation, screening, and selection allowed us to evolve sequences that yield DNA-wrapped SWCNT sensors with a desired fluorescence response to mycotoxins.
In this work, we apply the screening results of the DNA libraries used in this approach to train machine learning (ML) algorithms. Artificial neural network (ANN) and support vector machine (SVM) methods were used to predict the response of ssDNA-SWCNT sensors to a specific mycotoxin. The reliability of these models was further assessed through cross-validation. The ANN and SVM models with cross-validation were able to accurately classify the various DNA sequences as yielding either a high or low fluorescence response with an accuracy of 73 and 81%, respectively. The models were further used to predict the performance of alternative DNA sequences outside the initial training dataset, using the Hierarchy and k-means
++
clustering methods to evaluate the similarity and dissimilarity of each DNA sequence. Compared to the SVM model, the ANN model showed an improved ability to predict high responses for dissimilar DNA sequences. We further applied a combinatorial approach based on SVM and ANN models to design new DNA sequences for improving sensor performance. The success of this approach was validated experimentally, demonstrating the rational design of improved sensors with 95% prediction accuracy.
The application of ML algorithms to directed evolution libraries of DNA thus allows one to accurately map the performances of these sensors within a particular sequence space. The computational success of this mapping provides a framework for replacing current empirical approaches with the rational design of DNA sequences for SWCNT sensing.
Keywords: Nano Biosensor, DNA-SWCNT, Mycotoxin, Machine learning, Directed evolution method