Motivation One of the many technical challenges that arises when scheduling bioinformatics analyses at scale is determining the appropriate amount of memory and processing resources. Both over- and under-allocation leads to an inefficient use of computational infrastructure. Over allocation locks resources that could otherwise be used for other analyses. Under-allocation causes job failure and requires analyses to be repeated with a larger memory or runtime allowance. We address this challenge by using a historical dataset of bioinformatics analyses run on the Galaxy platform to demonstrate the feasibility of an online service for resource requirement estimation. Results Here we introduced the Galaxy job run dataset and tested popular machine learning models on the task of resource usage prediction. We include three popular forest models: the extra trees regressor, the gradient boosting regressor and the random forest regressor, and find that random forests perform best in the runtime prediction task. We also present two methods of choosing walltimes for previously unseen jobs. Quantile regression forests are more accurate in their predictions, and grant the ability to improve performance by changing the confidence of the estimates. However, the sizes of the confidence intervals are variable and cannot be absolutely constrained. Random forest classifiers address this problem by providing control over the size of the prediction intervals with an accuracy that is comparable to that of the regressor. We show that estimating the memory requirements of a job is possible using the same methods, which as far as we know, has not been done before. Such estimation can be highly beneficial for accurate resource allocation. Availability and implementation Source code available at https://github.com/atyryshkina/algorithm-performance-analysis, implemented in Python. Supplementary information Supplementary data are available at Bioinformatics online.
The essential micronutrient Selenium (Se) is co-translationally incorporated as selenocysteine into proteins. Selenoproteins contain one or more selenocysteines and are vital for optimum immunity. Interestingly, many pathogenic bacteria utilize Se for various biological processes suggesting that Se may play a role in bacterial pathogenesis. A previous study had speculated that Francisella tularensis, a facultative intracellular bacterium and the causative agent of tularemia, sequesters Se by upregulating Se-metabolism genes in type II alveolar epithelial cells. Therefore, we investigated the contribution of host vs. pathogen-associated selenoproteins in bacterial disease using F. tularensis as a model organism. We found that F. tularensis was devoid of any Se utilization traits, neither incorporated elemental Se, nor exhibited Se-dependent growth. However, 100% of Se-deficient mice (0.01 ppm Se), which express low levels of selenoproteins, succumbed to F. tularensis-live vaccine strain pulmonary challenge, whereas 50% of mice on Se-supplemented (0.4 ppm Se) and 25% of mice on Se-adequate (0.1 ppm Se) diet succumbed to infection. Median survival time for Se-deficient mice was 8 days post-infection while Se-supplemented and -adequate mice was 11.5 and >14 days post-infection, respectively. Se-deficient macrophages permitted significantly higher intracellular bacterial replication than Se-supplemented macrophages ex vivo, corroborating in vivo observations. Since Francisella replicates in alveolar macrophages during the acute phase of pneumonic infection, we hypothesized that macrophage-specific host selenoproteins may restrict replication and systemic spread of bacteria. F. tularensis infection led to an increased expression of several macrophage selenoproteins, suggesting their key role in limiting bacterial replication. Upon challenge with F. tularensis, mice lacking selenoproteins in macrophages (TrspM) displayed lower survival and increased bacterial burden in the lung and systemic tissues in comparison to WT littermate controls. Furthermore, macrophages from TrspM mice were unable to restrict bacterial replication ex vivo in comparison to macrophages from littermate controls. We herein describe a novel function of host macrophage-specific selenoproteins in restriction of intracellular bacterial replication. These data suggest that host selenoproteins may be considered as novel targets for modulating immune response to control a bacterial infection.
Recent studies have suggested that individual variants do not sufficiently explain the variable expressivity of phenotypes observed in complex disorders. For example, the 16p12.1 deletion is associated with developmental delay and neuropsychiatric features in affected individuals, but is inherited in >90% of cases from a mildly-affected parent. While children with the deletion are more likely to carry additional "second-hit" variants than their parents, the mechanisms for how these variants contribute to phenotypic variability are unknown. We performed detailed clinical assessments, whole-genome sequencing, and RNA sequencing of lymphoblastoid cell lines for 32 individuals in five large families with multiple members carrying the 16p12.1 deletion. We found that the deletion dysregulates multiple autism and brain development genes such as FOXP1, ANK3, and MEF2. Carrier children also showed expression changes that were inherited as well as de novo compared with their parents, which matched with 39/47 observed developmental phenotypes. We identified significant enrichments for 13/25 classes of "second-hit" variants in genes with expression changes, where 7/25 variant classes were only enriched when inherited from the non-carrier parent, including missense SNVs and large deletions. In 11 instances, including for ZEB2 and SYNJ1, gene expression was synergistically altered by both the deletion and inherited "second-hits" in carrier children. Finally, brain-specific interaction network analysis showed strong connectivity between genes carrying "second-hits" and genes with transcriptome alterations, including differential expression, alternative splicing, and allele-specific expression. Our study shows that family-based assessments of transcriptome data are highly relevant towards understanding the genetic mechanisms associated with complex disorders.
Background Recent studies have suggested that individual variants do not sufficiently explain the variable expressivity of phenotypes observed in complex disorders. For example, the 16p12.1 deletion is associated with developmental delay and neuropsychiatric features in affected individuals, but is inherited in > 90% of cases from a mildly-affected parent. While children with the deletion are more likely to carry additional “second-hit” variants than their parents, the mechanisms for how these variants contribute to phenotypic variability are unknown. Methods We performed detailed clinical assessments, whole-genome sequencing, and RNA sequencing of lymphoblastoid cell lines for 32 individuals in five large families with multiple members carrying the 16p12.1 deletion. We identified contributions of the 16p12.1 deletion and “second-hit” variants towards a range of expression changes in deletion carriers and their family members, including differential expression, outlier expression, alternative splicing, allele-specific expression, and expression quantitative trait loci analyses. Results We found that the deletion dysregulates multiple autism and brain development genes such as FOXP1, ANK3, and MEF2. Carrier children also showed an average of 5323 gene expression changes compared with one or both parents, which matched with 33/39 observed developmental phenotypes. We identified significant enrichments for 13/25 classes of “second-hit” variants in genes with expression changes, where 4/25 variant classes were only enriched when inherited from the noncarrier parent, including loss-of-function SNVs and large duplications. In 11 instances, including for ZEB2 and SYNJ1, gene expression was synergistically altered by both the deletion and inherited “second-hits” in carrier children. Finally, brain-specific interaction network analysis showed strong connectivity between genes carrying “second-hits” and genes with transcriptome alterations in deletion carriers. Conclusions Our results suggest a potential mechanism for how “second-hit” variants modulate expressivity of complex disorders such as the 16p12.1 deletion through transcriptomic perturbation of gene networks important for early development. Our work further shows that family-based assessments of transcriptome data are highly relevant towards understanding the genetic mechanisms associated with complex disorders.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.