We previously developed a web server CPGAVAS for annotation, visualization and GenBank submission of plastome sequences. Here, we upgrade the server into CPGAVAS2 to address the following challenges: (i) inaccurate annotation in the reference sequence likely causing the propagation of errors; (ii) difficulty in the annotation of small exons of genes petB, petD and rps16 and trans-splicing gene rps12; (iii) lack of annotation for other genome features and their visualization, such as repeat elements; and (iv) lack of modules for diversity analysis of plastomes. In particular, CPGAVAS2 provides two reference datasets for plastome annotation. The first dataset contains 43 plastomes whose annotation have been validated or corrected by RNA-seq data. The second one contains 2544 plastomes curated with sequence alignment. Two new algorithms are also implemented to correctly annotate small exons and trans-splicing genes. Tandem and dispersed repeats are identified, whose results are displayed on a circular map together with the annotated genes. DNA-seq and RNA-seq data can be uploaded for identification of single-nucleotide polymorphism sites and RNA-editing sites. The results of two case studies show that CPGAVAS2 annotates better than several other servers. CPGAVAS2 will likely become an indispensible tool for plastome research and can be accessed from http://www.herbalgenomics.org/cpgavas2.
Ban-Lan-Gen, the root tissues derived from several morphologically indistinguishable plant species, have been used widely in traditional Chinese medicines for numerous years. The identification of reliable markers to distinguish various source plant species is critical for the effective and safe use of products containing Ban-Lan-Gen. Here, we analyzed and characterized the complete chloroplast (cp) genome sequence of Strobilanthes cusia (Nees) Kuntze to identify high-resolution markers for the species determination of Southern Ban-Lan-Gen. Total DNA was extracted and subjected to next-generation sequencing. The cp genome was then assembled, and the gaps were filled using PCR amplification and Sanger sequencing. Genome annotation was conducted using CpGAVAS web server. The genome was 144,133 bp in length, presenting a typical quadripartite structure of large (LSC; 91,666 bp) and small (SSC; 17,328 bp) single-copy regions separated by a pair of inverted repeats (IRs; 17,811 bp). The genome encodes 113 unique genes, including 79 protein-coding, 30 transfer RNA, and 4 ribosomal RNA genes. A total of 20 tandem, 2 forward, and 6 palindromic repeats were detected in the genome. A phylogenetic analysis based on 65 protein-coding genes showed that S. cusia was closely related to Andrographis paniculata and Ruellia breedlovei, which belong to the same family, Acanthaceae. One interesting feature is that the IR regions apparently undergo simultaneous contraction and expansion, resulting in the presence of single copies of rps19, rpl2, rpl23, and ycf2 in the LSC region and the duplication of psbA and trnH genes in the IRs. This study provides the first complete cp genome in the genus Strobilanthes, containing critical information for the classification of various Strobilanthes species in the future. This study also provides the foundation for precisely determining the plant sources of Ban-Lan-Gen.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.