Highlights
Outgroup was not used due to the unknown source of SARS-CoV-2;
16373 SARS-CoV-2 genomes were included in the evolution analysis;
9 key specific sites of highly linkage and 4 major haplotypes were found;
Epidemic trends and possible earlier origins of SARS-CoV-2 were indicated.
In the past decade, treatments for tumors have made remarkable progress, such as the successful clinical application of targeted therapies. Nowadays, targeted therapies are based primarily on the detection of mutations, and next-generation sequencing (NGS) plays an important role in relevant clinical research. The mutation frequency is a major problem in tumor mutation detection and increasing sequencing depth is a widely used method to improve mutation calling performance. Therefore, it is necessary to evaluate the effect of different sequencing depth and mutation frequency as well as mutation calling tools. In this study, Strelka2 and Mutect2 tools were used in detecting the performance of 30 combinations of sequencing depth and mutation frequency. Results showed that the precision rate kept greater than 95% in most of the samples. Generally, for higher mutation frequency (≥20%), sequencing depth ≥200X is sufficient for calling 95% mutations; for lower mutation frequency (≤10%), we recommend improving experimental method rather than increasing sequencing depth. Besides, according to our results, although Strelka2 and Mutect2 performed similarly, the former performed slightly better than the latter one at higher mutation frequency (≥20%), while Mutect2 performed better when the mutation frequency was lower than 10%. Besides, Strelka2 was 17 to 22 times faster than Mutect2 on average. Our research will provide a useful and comprehensive guideline for clinical genomic researches on somatic mutation identification through systematic performance comparison among different sequencing depths and mutation frequency.
AbstractObjectivesTo reveal epidemic trend and possible origins of SARS-CoV-2 by exploring its evolution and molecular characteristics based on a large number of genomes since it has infected millions of people and spread quickly all over the world.MethodsVarious evolution analysis methods were employed.ResultsThe estimated Ka/Ks ratio of SARS-CoV-2 is 1.008 or 1.094 based on 622 or 3624 SARS-CoV-2 genomes, and the time to the most recent common ancestor (tMRCA) was inferred in late September 2019. Further 9 key specific sites of highly linkage and four major haplotypes H1, H2, H3 and H4 were found. The Ka/Ks, detected population size and development trends of each major haplotype showed H3 and H4 subgroups were going through a purify evolution and almost disappeared after detection, indicating H3 and H4 might have existed for a long time, while H1 and H2 subgroups were going through a near neutral or neutral evolution and globally increased with time. Notably the frequency of H1 was generally high in Europe and correlated to death rate (r>0.37).ConclusionsIn this study, the evolution and molecular characteristics of more than 16000 genomic sequences provided a new perspective for revealing epidemiology of SARS-CoV-2.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.