Angelica dahurica
is a widely grown plant species with multiple uses, especially in the medical field. However, the frequent introduction of
A
.
dahurica
to new areas has made it difficult to distinguish between varieties. Simple sequence repeats (SSRs) detected based on transcriptome analyses are very useful for constructing genetic maps and analyzing genetic diversity. They are also relevant for the molecular marker-assisted breeding of
A
.
dahurica
. We identified 33,724 genic SSR loci based on transcriptome sequencing data. A total of 114 primer pairs were designed for the SSR loci and were tested for their specificity and diversity. Ten SSR loci in untranslated regions were ultimately selected. Subsequently, 56
A
.
dahurica
ecotypes collected from different regions were analyzed. The SSR loci comprised 2–8 alleles, with a mean of 5.2 alleles per locus. The polymorphic information content value and Shannon’s information index were 0.6274–0.2702 (average of 0.4091) and 1.3040–0.5618 (average of 0.8475), respectively. Thus, the 10 novel SSRs identified in this study were almost in accordance with Harvey-Weinberg equilibrium and will be useful for analyzing
A
.
dahurica
genetic relationships. The results of this study confirm the potential value of transcriptome databases for the development of new SSR markers.
Undoubtedly, single-cell RNA sequencing (scRNA-seq) has changed the research landscape by providing insights into heterogeneous, complex and rare cell populations. Given that more such data sets will become available in the near future, their accurate assessment with compatible and robust models for cell type annotation is a prerequisite. Considering this, herein, we developed scAnno (scRNA-seq data annotation), an automated annotation tool for scRNA-seq data sets primarily based on the single-cell cluster levels, using a joint deconvolution strategy and logistic regression. We explicitly constructed a reference profile for human (30 cell types and 50 human tissues) and a reference profile for mouse (26 cell types and 50 mouse tissues) to support this novel methodology (scAnno). scAnno offers a possibility to obtain genes with high expression and specificity in a given cell type as cell type-specific genes (marker genes) by combining co-expression genes with seed genes as a core. Of importance, scAnno can accurately identify cell type-specific genes based on cell type reference expression profiles without any prior information. Particularly, in the peripheral blood mononuclear cell data set, the marker genes identified by scAnno showed cell type-specific expression, and the majority of marker genes matched exactly with those included in the CellMarker database. Besides validating the flexibility and interpretability of scAnno in identifying marker genes, we also proved its superiority in cell type annotation over other cell type annotation tools (SingleR, scPred, CHETAH and scmap-cluster) through internal validation of data sets (average annotation accuracy: 99.05%) and cross-platform data sets (average annotation accuracy: 95.56%). Taken together, we established the first novel methodology that utilizes a deconvolution strategy for automated cell typing and is capable of being a significant application in broader scRNA-seq analysis. scAnno is available at https://github.com/liuhong-jia/scAnno.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.