Although genetic methods of species identification, especially DNA barcoding, are strongly debated, tests of these methods have been restricted to a few empirical cases for pragmatic reasons. Here we use simulation to test the performance of methods based on sequence comparison (BLAST and genetic distance) and tree topology over a wide range of evolutionary scenarios. Sequences were simulated on a range of gene trees spanning almost three orders of magnitude in tree depth and in coalescent depth; that is, deep or shallow trees with deep or shallow coalescences. When the query's conspecific sequences were included in the reference alignment, the rate of positive identification was related to the degree to which different species were genetically differentiated. The BLAST, distance, and liberal tree-based methods returned higher rates of correct identification than did the strict tree-based requirement that the query was within, but not sister to, a single-species clade. Under this more conservative approach, ambiguous outcomes occurred in inverse proportion to the number of reference sequences per species. When the query's conspecific sequences were not in the reference alignment, only the strict tree-based approach was relatively immune to making false-positive identifications. Thresholds affected the rates at which false-positive identifications were made when the query's species was unrepresented in the reference alignment but did not otherwise influence outcomes. A conservative approach using the strict tree-based method should be used initially in large-scale identification systems, with effort made to maximize sequence sampling within species. Once the genetic variation within a taxonomic group is well characterized and the taxonomy resolved, then the choice of method used should be dictated by considerations of computational efficiency. The requirement for extensive genetic sampling may render these techniques inappropriate in some circumstances.
In recent years, research has shown that geographical variation in mitochondrial DNA of commensal rats provides a strong signal of human dispersal and migration. However, interpretation of genetic variation is complicated by the presence of multiple species of Rattus especially in Island Southeast Asia, by the occurrence of some of these Rattus sp. as subfossils in archaeological and natural sites, and by the difficulty of osteological identification of these remains. Amplification of DNA from ancient sources usually yields only small fragments (∼200 bp). We assessed whether we could identify Rattus sp. reliably with DNA barcoding using cytochrome oxidase I (COI) sequences, or tree‐based methods using D‐loop, cytochrome b and COI sequences. Species forming well‐differentiated clades in a molecular phylogeny were accurately identified by both methods, even when we used short DNA fragments. Identification was less accurate for paraphyletic and polyphyletic species. We suggest that taxonomic revisions that recognize cryptic or polytypic species will lead to even greater accuracy of DNA‐based identification methods.
Species Delimitation is a plugin to the Geneious software to support the exploration of species boundaries in a gene tree. The user assigns taxa to putative species and the plugin computes statistics relating to the probability of the observed monophyly or exclusivity having occurred by chance in a coalescent process. It also assesses the within and between species genetic distances to infer the probability with which members of a putative species might be identified successfully with tree-based methods.
Before the SARS outbreak only two human coronaviruses (HCoV) were known: HCoV-OC43 and HCoV-229E. With the discovery of SARS-CoV in 2003, a third family member was identified. Soon thereafter, we described the fourth human coronavirus (HCoV-NL63), a virus that has spread worldwide and is associated with croup in children. We report here the complete genome sequence of two HCoV-NL63 clinical isolates, designated Amsterdam 57 and Amsterdam 496. The genomes are 27,538 and 27,550 nucleotides long, respectively, and share the same genome organization. We identified two variable regions, one within the 1a and one within the S gene, whereas the 1b and N genes were most conserved. Phylogenetic analysis revealed that HCoV-NL63 genomes have a mosaic structure with multiple recombination sites. Additionally, employing three different algorithms, we assessed the evolutionary rate for the S gene of group Ib coronaviruses to be approximately 3 x 10(-4) substitutions per site per year. Using this evolutionary rate we determined that HCoV-NL63 diverged in the 11th century from its closest relative HCoV-229E.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.