The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
The prevalence of hepatitis B virus (HBV) surface antigen (HBsAg) chronic carriage in west Africa is the highest in the world, but its molecular epidemiology remains relatively poorly investigated. Plasma samples from random asymptomatic carriers of HBsAg in Conakry, Guinea, were studied and the complete genome sequences of 81 strains were obtained. Three additional samples from Kumasi, Ghana, were also included in the analysis. Phylogenetic analyses confirmed the dominance of genotype E (95.1 %), including 8.6 % of strains (viral load, 5¾10 3 -2.6¾10 8 IU ml "1 ) comprising dominant variants with large deletions in the core region and minority wild-type variants. The presence of two different patterns of deletions in two and four donors suggested targeted genome fragility between nt 1979 and 2314. The remaining sequences included one subgenotype A3 (1 %) and six A/E recombinant forms (4-7 %). A/E strains with identical points of recombination in three donors suggested strongly that these recombinant HBV strains are circulating and transmitted in the population. Recombination points were concentrated in the core gene. The detection of similar A/E recombinant strains in Ghana suggested a geographical extension of recombinant HBV to the region. The quasispecies of one additional Ghanaian strain sequenced in the pre-surface/surface region resolved into dominant clones of either the A or E genotype, but also three different patterns of A/E recombinant variants. The observation that both deletions of genotype E strains and A/E recombination points are mostly located in the core gene at specific positions indicates a region of the genome where genetic rearrangements preferentially take place.
Iran is a low to medium endemic country for hepatitis B virus (HBV), depending on the region, where genotype D is dominant. Samples from 170 asymptomatic HBsAg-positive blood donors were quantified and the median viral load was 6.7 × 10(2) IU/ml with 10.6% samples unquantifiable. Fifty complete genome sequences of these strains were characterized. Phylogenetic analysis identified 98% strains as subgenotype D1 and 2% as D2. Deduced serotypes were ayw2 (94%), ayw1 (4%), and adw (2%). The nucleotide diversity of the complete genome subgenotype D1 Iranian strains was limited (2.8%) and comparison with D1 strains from Egypt and Tunisia revealed little variation between strains from these three countries (range 1.9-2.8%). The molecular analysis of the individual genes revealed that the G1896A mutation was present in 86.2% of the strains and in 26 strains (29.9%) this mutation was accompanied by the G1899A mutation. The double mutations A1762T/G1764A and G1764T/C1766G were found in 20.7% and 24.1% of the strains, respectively. The pre-C initiation codon was mutated in five strains (5.8%). One strain had a 2-amino acid (aa) insertion at position s111 and another sP120Q substitution suggesting a vaccine escape mutant.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.