2020
DOI: 10.1093/nar/gkaa1100
|View full text |Cite
|
Sign up to set email alerts
|

UniProt: the universal protein knowledgebase in 2021

Abstract: The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

15
3,195
0
7

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 5,492 publications
(3,440 citation statements)
references
References 50 publications
15
3,195
0
7
Order By: Relevance
“…multiple sequence alignments (MSAs) and position-specific scoring matrices (PSSMs) computed by a combination of pairwise BLAST (24), PSI-BLAST (25), and MMseqs2 (11, 12) on query vs. PDB (26) and query vs. UniProt (1). For each residue in the query, the following per-residue predictions are assembled: secondary structure (RePROF/PROFsec (5, 27) and ProtBertSec (14)); solvent accessibility (RePROF/PROFacc); transmembrane helices and strands (TMSEG (28) and PROFtmb (29)); protein disorder (Meta-Disorder (30)); backbone flexibility (relative B-values; PROFbval (31)); disulfide bridges (DISULFIND (32)); sequence conservation (ConSurf/ConSeq (3336)); protein-protein, protein-DNA, and protein-RNA binding residues (ProNA2020 (3)); PROSITE motifs (37); effects of sequence variation (single amino acid variants, SAVs; SNAP2 (38)).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…multiple sequence alignments (MSAs) and position-specific scoring matrices (PSSMs) computed by a combination of pairwise BLAST (24), PSI-BLAST (25), and MMseqs2 (11, 12) on query vs. PDB (26) and query vs. UniProt (1). For each residue in the query, the following per-residue predictions are assembled: secondary structure (RePROF/PROFsec (5, 27) and ProtBertSec (14)); solvent accessibility (RePROF/PROFacc); transmembrane helices and strands (TMSEG (28) and PROFtmb (29)); protein disorder (Meta-Disorder (30)); backbone flexibility (relative B-values; PROFbval (31)); disulfide bridges (DISULFIND (32)); sequence conservation (ConSurf/ConSeq (3336)); protein-protein, protein-DNA, and protein-RNA binding residues (ProNA2020 (3)); PROSITE motifs (37); effects of sequence variation (single amino acid variants, SAVs; SNAP2 (38)).…”
Section: Methodsmentioning
confidence: 99%
“…Sequence similarity and automatic assignment via UniRule suggest NCAP is RNA binding (binding with the viral genome), binding with the membrane protein M (UniProt identifier P0DTC5/VME1_SARS2), and is fundamental for virion assembly. goPredSim (19) transferred GO terms from other proteins for MFO ( RNA-binding ; GO:0003723; ECO:0000213) and CCO (compartments in the host cell and viral nucleocapsid; GO:0019013; GO:0044172; GO:0044177; GO:0044220; GO:0030430; ECO:0000255) matching annotations found in UniProt (1). While it missed the experimentally verified MFO term identical protein binding (GO:0042802), go-PredSim predicted protein folding (GO:0006457) and protein ubiquitination (GO:0016567) suggesting the nucleoprotein to be involved in biological processes requiring protein binding.…”
Section: Use Casementioning
confidence: 99%
See 1 more Smart Citation
“…Authors chose 3-mers amino acids for proteins and 5-mers for nucleic acids as words. Three datasets (Uniprot400k [81], RRM3k [82], and Homeo8k [83]) were used to pre-train the Fast-Bioseq protein embedding models, whereas RNA embedding models were trained directly from the RRM162 dataset [82]. In contrast, 8-mer frequency features were used for the DNA sequences in the Homeo215 dataset [84].…”
Section: Applications For Molecular Interactionsmentioning
confidence: 99%
“…After obtaining the core compound targets, the gene symbol was converted into Ensembl ID through the Uniprot database (http://www.uniprot.org/), 22 imported into the website OmishareTools (http://www.omicshare.com/tools/index.php/) for GO enrichment function and KEGG enrichment analysis, and nally screened by the P value. GO enrichment mainly analyzed the biological process, cellular composition, and molecular function of the target, while KEGG enrichment could study the potential biological pathways and functions involved in the target.…”
Section: Enrichment Analysis Of Go and Keggmentioning
confidence: 99%