DNA barcoding involves sequencing a standard region of DNA as a tool for species identification. However, there has been no agreement on which region(s) should be used for barcoding land plants. To provide a community recommendation on a standard plant barcode, we have compared the performance of 7 leading candidate plastid DNA regions (atpF-atpH spacer, matK gene, rbcL gene, rpoB gene, rpoC1 gene, psbK-psbI spacer, and trnH-psbA spacer). Based on assessments of recoverability, sequence quality, and levels of species discrimination, we recommend the 2-locus combination of rbcLŰmatK as the plant barcode. This core 2-locus barcode will provide a universal framework for the routine use of DNA sequence data to identify specimens and contribute toward the discovery of overlooked species of land plants.matK Í rbcL Í species identification L arge-scale standardized sequencing of the mitochondrial gene CO1 has made DNA barcoding an efficient species identification tool in many animal groups (1). In plants, however, low substitution rates of mitochondrial DNA have led to the search for alternative barcoding regions. From initial investigations of plastid regions (2-4), 7 leading candidates have emerged (5, 6). Four are portions of coding genes (matK, rbcL, rpoB, and rpoC1), and 3 are noncoding spacers (atpF-atpH, trnH-psbA, and psbK-psbI). Different research groups have proposed various combinations of these loci as their preferred plant barcodes, but no consensus has emerged (5-12). This lack of an agreed standard has impeded progress in plant barcoding.Our aim here is to identify a standard DNA barcode for land plants. To achieve this goal, we have pooled data across laboratories including sequence data from 907 samples, representing 445 angiosperm, 38 gymnosperm, and 67 cryptogam species. Using various subsets of these data, we evaluated the 7 candidate loci using criteria in the Consortium for the Barcode of Life's (CBOL) data standards and guidelines for locus selection (http:// www.barcoding.si.edu/protocols.html). Universality: Which loci can be routinely sequenced across the land plants? Sequence quality and coverage: Which loci are most amenable to the production of bidirectional sequences with few or no ambiguous base calls? Discrimination: Which loci enable most species to be distinguished?
ResultsUniversality. Direct universality assessments using a single primer pair for each locus in angiosperms resulted in 90%-98% PCR and sequencing success for 6/7 regions. Success for the seventh region, psbK-psbI, was 77% (Fig. 1A). Greater problems were encountered in other land plant groups, with rpoB, matK, atpF-atpH, and psbK-psbI all showing Ϝ50% success in gymnosperms and/or cryptogams based on data compiled from several laboratories (Fig. 1 A).Sequence Quality. Evaluation of sequence quality and coverage from the candidate loci demonstrated that high quality bidirectional sequences were routinely obtained from rbcL, rpoC1, and rpoB (Fig. 1B, x axis). The remaining 4 loci required more manual editing and produced f...