Database resources of the National Center for Biotechnology Information

Agarwala, Richa; Barrett, Tanya; Beck, Jeffrey; Benson, D. A.; Bollin, Colleen J; Bolton, Evan; Bourexis, Devon; Brister, J. Rodney; Bryant, Stephen H.; Canese, Kathi; Cavanaugh, Mark; Charowhas, Chad; Clark, Karen; Dondoshansky, Ilya; Feolo, Michael; Fitzpatrick, Lawrence; Funk, Kathryn; Geer, Lewis Y.; Gorelenkov, Viatcheslav; Graeff, Alan S.; Hlavina, Wratko; Holmes, Brad; Johnson, Mark R.; Kattman, B; Khotomlianski, Viatcheslav; Kimchi, Avi; Kimelman, Michael; Kimura, Masato; Kitts, Paul; Klimke, William; Kotliarov, Alex; Krasnov, Sergey; Kuznetsov, Anatoliy; Landrum, Melissa; Landsman, David; Lathrop, Stacy; Lee, Jennifer M; Leubsdorf, Carl; Lu, Zhiyong; Madden, Thomas; Marchler‐Bauer, Aron; Malheiro, Adriana; Meric, Peter; Karsch‐Mizrachi, Ilene; Mnev, Anatoly; Murphy, Terence; Orris, Rebecca; Ostell, James; O’Sullivan, Christopher D.; Palanigobu, Vasuki; Panchenko, Anna R.; Phan, Lon; Pierov, Borys; Pruitt, Kim D.; Rodarmer, Kurt; Sayers, Eric W; Schneider, Valérie; Schoch, Conrad L.; Schuler, Gregory D.; Sherry, Stephen T.; Siyan, Karanjit S.; Soboleva, Alexandra; Soussov, Vladimir; Starchenko, Grigory; Tatusova, Tatiana; Thibaud‐Nissen, Françoise; Todorov, Kamen O.; Trawick, Barton W.; Vakatov, Denis; Ward, Minghong; Yaschenko, Eugene; Zasypkin, Aleksandr; Zbicz, Kerry

doi:10.1093/nar/gkx1095

Cited by 1,280 publications

(794 citation statements)

References 13 publications

Supporting

Mentioning

784

Contrasting

Unclassified

Order By: Relevance

“…The peptidase family with most homologues from bacteria is M20 (40,159 sequences). This is also the family with homologues from the most bacterial species (10,173) and the greatest percentage of bacterial species with peptidase homologues (51.8%). Other families with more than 10,000 homologues are shown in Table 4.…”

Section: Peptidases Families From Bacteriamentioning

confidence: 99%

“…A family of peptidases was assembled initially using the protein sequence of a well-characterized peptidase, for example bovine chymotrypsin, known as the "type example". To find homologues of a type example, sequence searches were conducted against either the UniProt knowledgebase [9] or the non-redundant protein sequence library at NCBI [10] using BlastP [11]. Within a family, other well characterized proteolytic enzymes with different substrate preferences were identified manually and each of these was designated a "holotype" for a particular set of substrate preferences.…”

Section: Detection and Classification Of Homologuesmentioning

confidence: 99%

See 1 more Smart Citation

Origins of peptidases

Rawlings

Bateman

2019

Biochimie

View full text Add to dashboard Cite

a b s t r a c tThe distribution of all peptidase homologues across all phyla of organisms was analysed to determine within which kingdom each of the 271 families originated. No family was found to be ubiquitous and even peptidases thought to be essential for life, such as signal peptidase and methionyl aminopeptides are missing from some clades. There are 33 peptidase families common to archaea, bacteria and eukaryotes and are assumed to have originated in the last universal common ancestor (LUCA). These include peptidases with different catalytic types, exo-and endopeptidases, peptidases with different tertiary structures and peptidases from different families but with similar structures. This implies that the different catalytic types and structures pre-date LUCA. Other families have had their origins in the ancestors of viruses, archaea, bacteria, fungi, plants and animals, and a number of families have had their origins in the ancestors of particular phyla. The evolution of peptidases is compared to recent hypotheses about the evolution of organisms.

show abstract

Section: Peptidases Families From Bacteriamentioning

confidence: 99%

Section: Detection and Classification Of Homologuesmentioning

confidence: 99%

Origins of peptidases

Rawlings

Bateman

2019

Biochimie

View full text Add to dashboard Cite

show abstract

“…PubMed Central (PMC) is an online collection comprising over 3 million biomedical and biological articles gathered from thousands of journals [2]. PMC is maintained and curated by the National Library of Medicine (NLM) at the US National Institute of Health [3].…”

Section: Introductionmentioning

confidence: 99%

Phenotype Instance Verification and Evaluation Tool (PIVET): A Scaled Phenotype Evidence Generation Framework Using Web-Based Medical Literature

Henderson

et al. 2018

J Med Internet Res

View full text Add to dashboard Cite

BackgroundResearchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making.ObjectiveThe objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved.MethodsPIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET’s phenotype representation with PheKnow-Cloud’s by using PheKnow-Cloud’s experimental setup. In PIVET’s framework, we also introduce a statistical model trained on domain expert–verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner.ResultsPIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET’s analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes.ConclusionsOur study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy.

show abstract

“…If you do not have privacy concerns but want to run more searches, you may opt to use the Web services of the European Bioinformatics Institute (Lopez, Cowley, Li, & McWilliam, 2014). Systems administration, database maintenance, and pipeline management are much less efficient on standalone workstations than on clusters.…”

mentioning

confidence: 99%

“…We recommend a standalone installation only if: (1) you are not familiar with the LINUX operation system, which runs almost all large-scale computing facilities; (2) have no access to such facilities; (3) the wait time for your compute jobs is prohibitively long on the shared resources; or (4) you have no more than hundreds of searches per day. If you do not have privacy concerns but want to run more searches, you may opt to use the Web services of the European Bioinformatics Institute (Lopez, Cowley, Li, & McWilliam, 2014). Such users typically integrate BLAST results into their own specific pipelines.…”

mentioning

confidence: 99%

Installing, Maintaining, and Using a Local Copy of BLAST for Compute Cluster or Workstation Use

Ladunga

2018

CP in Bioinformatics

View full text Add to dashboard Cite

The Basic Local Alignment Search Tool (BLAST) is the first resource to computationally characterize a novel amino acid or nucleic acid sequence. BLAST plays important roles in genomics, transcriptomics, and protein science. For numerous academic and commercial researchers, neither BLAST Web servers nor cloud resources satisfy the requirements of high-throughput comparative genomic pipelines or company policies. For such users, this unit describes how to install BLAST locally, either on a standalone workstation, or preferably on a compute cluster. We provide practical guidance for the planning and the installation under the LINUX, Windows, and Mac OS X operating systems. We propose strategies for downloading existing and generating new sequence databases in BLAST format. © 2018 by John Wiley & Sons, Inc.

show abstract

Database resources of the National Center for Biotechnology Information

Cited by 1,280 publications

References 13 publications

Origins of peptidases

Origins of peptidases

Phenotype Instance Verification and Evaluation Tool (PIVET): A Scaled Phenotype Evidence Generation Framework Using Web-Based Medical Literature

Installing, Maintaining, and Using a Local Copy of BLAST for Compute Cluster or Workstation Use

Contact Info

Product

Resources

About