Background The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction – a process called ‘end-to-end learning’ – has led to state-of-the-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN. Results By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension. Conclusion Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.
Short-chain alkanes play a substantial role in carbon and sulfur cycling at hydrocarbon-rich environments globally, yet few studies have examined the metabolism of ethane (C2), propane (C3), and butane (C4) in anoxic sediments in contrast to methane (C1). In hydrothermal vent systems, short-chain alkanes are formed over relatively short geological time scales via thermogenic processes and often exist at high concentrations. The sediment-covered hydrothermal vent systems at Middle Valley (MV, Juan de Fuca Ridge) are an ideal site for investigating the anaerobic oxidation of C1–C4 alkanes, given the elevated temperatures and dissolved hydrocarbon species characteristic of these metalliferous sediments. We examined whether MV microbial communities oxidized C1–C4 alkanes under mesophilic to thermophilic sulfate-reducing conditions. Here we present data from discrete temperature (25, 55, and 75°C) anaerobic batch reactor incubations of MV sediments supplemented with individual alkanes. Co-registered alkane consumption and sulfate reduction (SR) measurements provide clear evidence for C1–C4 alkane oxidation linked to SR over time and across temperatures. In these anaerobic batch reactor sediments, 16S ribosomal RNA pyrosequencing revealed that Deltaproteobacteria, particularly a novel sulfate-reducing lineage, were the likely phylotypes mediating the oxidation of C2–C4 alkanes. Maximum C1–C4 alkane oxidation rates occurred at 55°C, which reflects the mid-core sediment temperature profile and corroborates previous studies of rate maxima for the anaerobic oxidation of methane (AOM). Of the alkanes investigated, C3 was oxidized at the highest rate over time, then C4, C2, and C1, respectively. The implications of these results are discussed with respect to the potential competition between the anaerobic oxidation of C2–C4alkanes with AOM for available oxidants and the influence on the fate of C1 derived from these hydrothermal systems.
Summary The extent to which differences in microbial community structure result in variations in organic matter (OM) degradation is not well understood. Here, we tested the hypothesis that distinct marine microbial communities from North Atlantic surface and bottom waters would exhibit varying compositional succession and functional shifts in response to the same pool of complex high molecular weight (HMW‐OM). We also hypothesized that microbial communities would produce a broader spectrum of enzymes upon exposure to HMW‐OM, indicating a greater potential to degrade these compounds than reflected by initial enzymatic activities. Our results show that community succession in amended mesocosms was congruent with cell growth, increased bacterial production and most notably, with substantial shifts in enzymatic activities. In all amended mesocosms, closely related taxa that were initially rare became dominant at time frames during which a broader spectrum of active enzymes were detected compared to initial timepoints, indicating a similar response among different communities. However, succession on the whole‐community level, and the rates, spectra and progression of enzymatic activities, reveal robust differences among distinct communities from discrete water masses. These results underscore the crucial role of rare bacterial taxa in ocean carbon cycling and the importance of bacterial community structure for HMW‐OM degradation.
Heterotrophic microbial communities use extracellular enzymes to initialize degradation of high molecular weight organic matter in the ocean. The potential of microbial communities to access organic matter, and the resultant rates of hydrolysis, affect the efficiency of the biological pump as well as the rate and location of organic carbon cycling in surface and deep waters. In order to investigate spatial-and depth-related patterns in microbial enzymatic capacities in the ocean, we measured hydrolysis rates of six high-molecular-weight polysaccharides and two low-molecular-weight substrate proxies at sites spanning 38 • S to 10 • N in the Atlantic Ocean, and at six depths ranging from surface to bottom water. In surface to upper mesopelagic waters, the spectrum of substrates hydrolyzed followed distinct patterns, with hydrolytic assemblages more similar vertically within a single station than at similar depths across multiple stations. Additionally, the proportion of total hydrolysis occurring above the pycnocline, and the spectrum of substrates hydrolyzed in mesopelagic and deep waters, was positively related to the strength of stratification at a site, while other physichochemical parameters were generally poor predictors of the measured hydrolysis rates. Spatial as well as depth-driven constraints on heterotrophic hydrolytic capacities result in broad variations in potential carbon-degrading activity in the ocean. The spectrum of enzymatic capabilities and rates of hydrolysis in the ocean, and the proportion of organic carbon hydrolyzed above the permanent thermocline, may influence the efficiency of the biological pump and net carbon export across distinct latitudinal and depth regions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.