Background
Viral outbreaks, including Dengue, Zika, Ebola, and particularly SARS-CoV-2, have caused significant global impacts and unprecedented losses of life. SARS-CoV-2, in particular, continues to be a leading cause of death worldwide and in the United States, with many individuals experiencing prolonged symptoms. In this study, we present a novel genomic surveillance approach that combines a stack-ensembled neural network and microarray genome resequencing by hybridization.
Results
The resequencing microarray features ~ 240,000 probes for approximately 30,000 nucleotides per genomic sample. The data utilized were derived from our previously reported cost-effective and rapid full-genome tiling array technology. Our base-calling algorithms were enhanced with 48 input features per base position and multiple scanning exposure times. The training dataset included 570,000 data points from which over 12,000 neural network models were developed. To assess the accuracy of our stack-ensembled models in base-calling and variant identification, we analyzed genomic data from four clinical samples with a cycle threshold value ≤ 24 via neural network and logistic regression meta-models.
Conclusions
Our models demonstrated accuracies exceeding 99% and coverages comparable to existing standards. Microarray genome resequencing of clinical viral samples provides significant benefits in terms of cost-effectiveness, speed, and flexibility, allowing for the surveillance of diverse viral genomes without the need for extensive algorithm retraining.