In the field of bioinformatics and DNA computing, simulated hybridization experiments can replace real molecular hybridization experiments to some extent, avoiding some disadvantages of the actual experimental design. However, the core techniques, which are employed by the popular DNA simulation software, are limited to the exponential computational complexity of the combinatorial problems. As a result, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. To address this common problem, we hereby introduce a new method based on the machine learning technique. First, a sample set is employed to train the boosted tree algorithm, which resulted in a corresponding machine learning model. Second, this model is applied to predict the classification results of molecular hybridization for a given group of DNA molecular coding. The experiment results showed that the new method had an average accuracy level of 94.2% and an average efficiency level 90 839 times higher than that of the existing representative approaches. Especially for the case study in this paper, the efficiency of the new method is 235 000, 250 000, and 990 000 times higher than that of the three existing methods, respectively. These experimental results indicate that our new approach can quickly and accurately determine the biological effectiveness of molecular hybridization for a given DNA design.
KEYWORDSbiological effectiveness, boosted tree algorithm, DNA design, specific hybridization
INTRODUCTIONReal molecular biological experiments are crucial in the fields of bioinformatics and DNA (deoxyribonucleic acid) computing. The DNA code design is extremely vital to avoid the outcome uncertainties induced by various biological reasons. Hybridization is a molecular biological mechanism often used in DNA algorithms. Its biological effectiveness is often directly related to the success or failure of DNA computing. However, the economic and time costs of real molecular biological experiments are too high, which restricts the application of DNA computing. Therefore, simulation modeling is often used instead of actual biological testing.However, the time complexity shows an exponential growth with the increase in the lengths of molecules and the numbers of molecular species in simulating the molecular hybridization experiments. Thus, simulation of even slightly larger hybridization is difficult to implement. Unfortunately, this is determined by the inherent complexity of the problem, so that no solution within the framework of hard computing exists. It is well known that, in soft computing, computations have the following properties: uncertainties, inaccuracies, incomplete true value, low cost, and robustness.Is there any better solution in soft computing to fundamentally circumvent the high complexity of simulated molecular hybridization? This is an unsettled issue.Motivated by it, we propose a new approach based on the boosted tree (BT) algorithm to analyze DNA molecular specific hy...