2020
DOI: 10.1093/nar/gkaa1105
|View full text |Cite
|
Sign up to set email alerts
|

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation

Abstract: The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
552
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 704 publications
(554 citation statements)
references
References 26 publications
0
552
0
2
Order By: Relevance
“…For the public projects, EDGAR has to rely on findable and accessible annotations. To ensure maximal comparability, EDGAR3.0 uses the RefSeq annotations, as all genomes in RefSeq are re-annotated using the latest version of the PGAP pipeline ( 31 ). For private projects, EDGAR has to rely on the genomes provided by the users.…”
Section: Discussionmentioning
confidence: 99%
“…For the public projects, EDGAR has to rely on findable and accessible annotations. To ensure maximal comparability, EDGAR3.0 uses the RefSeq annotations, as all genomes in RefSeq are re-annotated using the latest version of the PGAP pipeline ( 31 ). For private projects, EDGAR has to rely on the genomes provided by the users.…”
Section: Discussionmentioning
confidence: 99%
“…The majority was retrieved from NCBI RefSeq. 16 However, 80 datasets were only identified through a literature survey, as they did not have an associated genome assembly deposited. In total, 28 studies were identified spanning 16 countries (Figure 1A and B, Table S1).…”
Section: Resultsmentioning
confidence: 99%
“…No plasmid was detected. The sequence was annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP), version March 2021 (Li et al, 2021). The genome was found to contain 5 258 ORFs, among which there are 5 158 CDSs (CoDing Sequences), 74 tRNAs, 22 rRNAs, 3 ncRNAs and 1 tmRNA.…”
Section: Genome Announcementmentioning
confidence: 99%