Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes. As reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods). Our results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases.
Chitin is an important structural component of numerous fungal pathogens and parasitic nematodes. The human macrophage chitotriosidase (HCHT) is a chitinase that hydrolyses glycosidic bonds between the N-acetyl-D-glucosamine units of this biopolymer. HCHT belongs to the Glycoside Hydrolase (GH) superfamily and contains a well-characterized catalytic domain appended to a chitin-binding domain (ChBDCHIT1). Although its precise biological function remains unclear, HCHT has been described to be involved in innate immunity. In this study, the molecular basis for interaction with insoluble chitin as well as with soluble chito-oligosaccharides has been determined. The results suggest a new mechanism as a common binding mode for many Carbohydrate Binding Modules (CBMs). Furthermore, using a phylogenetic approach, we have analysed the modularity of HCHT and investigated the evolutionary paths of its catalytic and chitin binding domains. The phylogenetic analyses indicate that the ChBDCHIT1 domain dictates the biological function of HCHT and not its appended catalytic domain. This observation may also be a general feature of GHs. Altogether, our data have led us to postulate and discuss that HCHT acts as an immune catalyser.
The microbiome of honeybees ( Apis spp.) and bumblebees ( Bombus spp.) is highly conserved and represented by few phylotypes. This simplicity in taxon composition makes the bee’s microbiome an emergent model organism for the study of gut microbial communities.
TQMD is a tool which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is optimized to dereplicate at high taxonomic levels (phylum/class), whereas the other dereplication tools are optimized for lower taxonomic levels (species/strain), making TQMD complementary to the existing dereplicating tools. TQMD is available at <https://bitbucket.org/phylogeno/tqmd>.
TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd ].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.