Whereas accelerated attention beclouded early stages of the coronavirus spread, knowledge of actual pathogenicity and origin of possible sub-strains remained unclear. By harvesting the Global initiative on Sharing All Influenza Data (GISAID) database (https://www.gisaid.org/), between December 2019 and January 15, 2021, a total of 8864 human SARS-CoV-2 complete genome sequences processed by gender, across 6 continents (88 countries) of the world, Antarctica exempt, were analyzed. We hypothesized that data speak for itself and can discern true and explainable patterns of the disease. Identical genome diversity and pattern correlates analysis performed using a hybrid of biotechnology and machine learning methods corroborate the emergence of inter- and intra- SARS-CoV-2 sub-strains transmission and sustain an increase in sub-strains within the various continents, with nucleotide mutations dynamically varying between individuals in close association with the virus as it adapts to its host/environment. Interestingly, some viral sub-strain patterns progressively transformed into new sub-strain clusters indicating varying amino acid, and strong nucleotide association derived from same lineage. A novel cognitive approach to knowledge mining helped the discovery of transmission routes and seamless contact tracing protocol. Our classification results were better than state-of-the-art methods, indicating a more robust system for predicting emerging or new viral sub-strain(s). The results therefore offer explanations for the growing concerns about the virus and its next wave(s). A future direction of this work is a defuzzification of confusable pattern clusters for precise intra-country SARS-CoV-2 sub-strains analytics.
BackgroundThe increased number of accessible genomes has prompted large-scale comparative studies for decerning evolutionary knowledge of infectious diseases, but challenges such as non-availability of close reference sequence(s), incompletely assembled or large number of genomes, preclude real time multiple sequence alignment and sub-strain(s) discovery. This paper introduces a cooperatively inspired open-source framework, for intelligent mining of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) genomes. We situate this study within the African context, to drive advancement on state-of-the-art, towards intelligent infectious disease characterization and prediction. The outcome is an enriched Knowledge Base, sufficient to provide deep understanding of the viral sub-strains’ identification problem. We also open investigation by gender, which to the best of our knowledge has been ignored in related research. Data for the study came from the Global Initiative on Sharing All Influenza Data database (https://gisaid.org) and processed for precise discovery of viral sub-strains transmission between and within African countries. To localize the transmission route(s) of each isolate excavated and provide appropriate links to similar isolate strain(s), a cognitive solution was imposed on the genome expression patterns discovered by unsupervised self-organizing map (SOM) component planes visualization. The Freidman-Nemenyi’s test was finally performed to validate our claim.ResultsEvidence of inter- and intra-genome diversity was noticed. While some isolates (or genomes) clustered differently, implying different evolutionary source (or high-diversity), others clustered closely together, indicating similar evolutionary source (or less-diversity). SOM component planes analysis revealed multiple sub-strains patterns, strongly suggesting local or intra-community and country to country transmissions. Cognitive maps of both male and female isolates revealed multiple transmission routes. Statistical results indicate significant difference between the various isolate groups at the 0.05 level of significance.ConclusionThe proposed framework offers explanations to SARS-CoV-2 diversity and provides real time identification to disease transmission routes, as well as rapid decision support for facilitating inter- and intra-country contact tracing of infected case(s). Intermediate data produced in this paper are helpful to enrich the genome datasets for intelligent characterization and prediction of COVID-19 and related pandemics, as well as the construction of intelligent device for accurate infectious disease monitoring.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.