Here, we present the R package, minSNPs. This is a re-development of a previously described Java application named Minimum SNPs. MinSNPs assembles resolution-optimised sets of single nucleotide polymorphisms (SNPs) from sequence alignments such as genome-wide orthologous SNP matrices. MinSNPs can derive sets of SNPs optimised for discriminating any user-defined combination of sequences from all others. Alternatively, SNP sets may be optimised to determine all sequences from all other sequences, i.e., to maximise diversity. MinSNPs encompasses functions that facilitate rapid and flexible SNP mining, and clear and comprehensive presentation of the results. The minSNPs’ running time scales in a linear fashion with input data volume and the numbers of SNPs and SNPs sets specified in the output. MinSNPs was tested using a previously reported orthologous SNP matrix of Staphylococcus aureus and an orthologous SNP matrix of 3,279 genomes with 164,335 SNPs assembled from four S. aureus short read genomic data sets. MinSNPs was shown to be effective for deriving discriminatory SNP sets for potential surveillance targets and in identifying SNP sets optimised to discriminate isolates from different clonal complexes. MinSNPs was also tested with a large Plasmodium vivax orthologous SNP matrix. A set of five SNPs was derived that reliably indicated the country of origin within three south-east Asian countries. In summary, we report the capacity to assemble comprehensive SNP matrices that effectively capture microbial genomic diversity, and to rapidly and flexibly mine these entities for optimised marker sets.
Here we present the R package - MinSNPs. This is designed to assemble resolution optimised sets of single nucleotide polymorphisms (SNPs) from alignments such as genome wide orthologous SNP matrices. We also demonstrate a pipeline for assembling such matrices from multiple bio-projects, so as to facilitate SNP set derivation from globally representative data sets. MinSNPs can derive sets of SNPs optimised for discriminating any user-defined combination of sequences from all others. Alternatively, SNP sets may be optimised to discriminate all from all, i.e., to maximise diversity. MinSNPs encompasses functions that facilitate rapid and flexible SNP mining, and clear and comprehensive presentation of the results. The MinSNPs running time scales in a linear fashion with input data volume, and the numbers of SNPs and SNPs sets specified in the output. MinSNPs was tested using a previously reported orthologous SNP matrix of Staphylococcus aureus. and an orthologous SNP matrix of 3,279 genomes with 164,335 SNPs assembled from four S. aureus short read genomic data sets. MinSNPs demonstrated efficacy in deriving discriminatory SNP sets for potential surveillance targets and in identifying SNP sets optimised to discriminate isolates from different clonal complexes (CC). MinSNPs was also tested with a large Plasmodium vivax orthologous SNP matrix. A set of five SNPs was derived that reliably indicated the country of origin within 3 south-east Asian countries. In summary, we report the capacity to assemble comprehensive SNP matrices that effectively capture microbial genomic diversity, and to rapidly and flexibly mine these entities for optimised surveillance marker sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.