The Covid-19 pandemic, a disease transmitted by the SARS-CoV-2 virus, has already caused the infection of more than 120 million people, of which 70 million have been recovered, while 3 million people have died. The high speed of infection has led to the rapid depletion of public health resources in most countries. RT-PCR is Covid-19’s reference diagnostic method. In this work we propose a new technique for representing DNA sequences: they are divided into smaller sequences with overlap in a pseudo-convolutional approach and represented by co-occurrence matrices. This technique eliminates multiple sequence alignment. Through the proposed method, it is possible to identify virus sequences from a large database: 347,363 virus DNA sequences from 24 virus families and SARS-CoV-2. When comparing SARS-CoV-2 with virus families with similar symptoms, we obtained $$0.97 \pm 0.03$$
0.97
±
0.03
for sensitivity and $$0.9919 \pm 0.0005$$
0.9919
±
0.0005
for specificity with MLP classifier and 30% overlap. When SARS-CoV-2 is compared to other coronaviruses and healthy human DNA sequences, we obtained $$0.99 \pm 0.01$$
0.99
±
0.01
for sensitivity and $$0.9986 \pm 0.0002$$
0.9986
±
0.0002
for specificity with MLP and 50% overlap. Therefore, the molecular diagnosis of Covid-19 can be optimized by combining RT-PCR and our pseudo-convolutional method to identify DNA sequences for SARS-CoV-2 with greater specificity and sensitivity.
The proliferation of the SARS-Cov-2 virus to the whole world caused more than 250,000 deaths worldwide and over 4 million confirmed cases. The severity of Covid-19, the exponential rate at which the virus proliferates, and the rapid exhaustion of the public health resources are critical factors. The RT-PCR with virus DNA identification is still the benchmark Covid-19 diagnosis method. In this work we propose a new technique for representing DNA sequences: they are divided into smaller sequences with overlap in a pseudo-convolutional approach, and represented by co-occurrence matrices. This technique analyzes the DNA sequences obtained by the RT-PCR method, eliminating sequence alignment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.