Extracytoplasmic function σ factors (ECFs) represent one of the major bacterial signal transduction mechanisms in terms of abundance, diversity and importance, particularly in mediating stress responses. Here, we performed a comprehensive phylogenetic analysis of this protein family by scrutinizing all proteins in the NCBI database. As a result, we identified an average of ∼10 ECFs per bacterial genome and 157 phylogenetic ECF groups that feature a conserved genetic neighborhood and a similar regulation mechanism. Our analysis expands previous classification efforts ∼50-fold, enriches many original ECF groups with previously unclassified proteins and identifies 22 entirely new ECF groups. The ECF groups are hierarchically related to each other and are further composed of subgroups with closely related sequences. This two-tiered classification allows for the accurate prediction of common promoter motifs and the inference of putative regulatory mechanisms across subgroups composing an ECF group. This comprehensive, high-resolution description of the phylogenetic distribution of the ECF family, together with the massive expansion of classified ECF sequences and an openly accessible data repository called ‘ECF Hub’ (https://www.computational.bio.uni-giessen.de/ecfhub), will serve as a powerful hypothesis-generator to guide future research in the field.
Owing greatly to the advancement of next-generation sequencing (NGS), the amount of NGS data is increasing rapidly. Although there are many NGS applications, one of the most commonly used techniques 'RNA sequencing (RNA-seq)' is rapidly replacing microarray-based techniques in laboratories around the world. As more and more of such techniques are standardized, allowing technicians to perform these experiments with minimal hands-on time and reduced experimental/operator-dependent biases, the bottleneck of such techniques is clearly visible; that is, data analysis. Further complicating the matter, increasing evidence suggests most of the genome is transcribed into RNA; however, the majority of these RNAs are not translated into proteins. These RNAs that do not become proteins are called 'noncoding RNAs (ncRNAs)'. Although some time has passed since the discovery of ncRNAs, their annotations remain poor, making analysis of RNA-seq data challenging. Here, we examine the current limitations of RNA-seq analysis using case studies focused on the detection of novel transcripts and examination of their characteristics. Finally, we validate the presence of novel transcripts using biological experiments, showing novel transcripts can be accurately identified when a series of filters is applied. In conclusion, novel transcripts that are identified from RNA-seq must be examined carefully before proceeding to biological experiments.
16Extracytoplasmic function σ factors (ECFs) represent one of the major bacterial signal transduction 17 mechanisms in terms of abundance, diversity and importance, particularly in mediating stress 18 responses. Here, we performed a comprehensive phylogenetic analysis of this protein family by 19 scrutinizing all proteins in the NCBI database. As result, we identified ~10 ECFs per bacterial genome 20 on average and classified them into 157 phylogenetic ECF groups that feature a conserved genetic 21 neighborhood and a similar regulation mechanism. Our analysis expands the number of unique ECF 22 sequences ~50-fold relative to previous classification efforts, enriches many original ECF groups with 23 previously unclassified proteins and identifies 22 entirely new ECF groups. The ECF groups are 24 hierarchically related to each other and are further composed of subgroups with closely related 25 sequences. This two-tiered classification allows for the accurate prediction of common promoter motifs 26 and the inference of putative regulatory mechanisms across subgroups composing an ECF group. This 27 comprehensive, high-resolution description of the phylogenetic distribution of the ECF family, together 28 with the massive expansion of classified ECF sequences, enables the application of in silico tools for 29
Increasing evidence indicates the presence of long noncoding RNAs (lncRNAs) is specific to various cell types. Although lncRNAs are speculated to be more numerous than protein-coding genes, the annotations of lncRNAs remain primitive due to the lack of well-structured schemes for their identification and description. Here, we introduce a new knowledge database “ANGIOGENES” (http://angiogenes.uni-frankfurt.de) to allow for in silico screening of protein-coding genes and lncRNAs expressed in various types of endothelial cells, which are present in all tissues. Using the latest annotations of protein-coding genes and lncRNAs, publicly-available RNA-seq data was analyzed to identify transcripts that are expressed in endothelial cells of human, mouse and zebrafish. The analyzed data were incorporated into ANGIOGENES to provide a one-stop-shop for transcriptomics data to facilitate further biological validation. ANGIOGENES is an intuitive and easy-to-use database to allow in silico screening of expressed, enriched and/or specific endothelial transcripts under various conditions. We anticipate that ANGIOGENES serves as a starting point for functional studies to elucidate the roles of protein-coding genes and lncRNAs in angiogenesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.