Background Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. Results We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA. To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case–control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. Conclusions R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.
During the last decade, numerous studies have been carried out to exploit the complexity of genomic and transcriptomic lesions driving acute myeloid leukemia (AML) initiation. These studies have helped improve risk classification and treatment options. Detailed molecular characterization of longitudinal AML samples are, however, sparse, meanwhile relapse and therapy resistance represent the main challenge in AML care. To this end, we performed transcriptome-wide RNA sequencing of longitudinal diagnosis, relapse and/or primary resistant samples from 47 adult and 23 pediatric AML patients with known mutational background. Gene expression analysis revealed the association of short event-free survival with overexpression of GLI2 and IL1R1, as well as downregulation of ST18. Moreover, CR1-downregulation and DPEP1-upregulation were associated with AML relapse both in adults and children. Finally, machine learning and network-based analysis identified overexpressed CD6 and downregulated INSR as highly co-predictive genes depicting important relapse-associated characteristics among adult AML patients. Our findings point towards the importance of a tumor-promoting inflammatory environment in leukemia progression, as indicated by several of the herein identified differentially expressed genes. Together, this knowledge provides the foundation for novel personalized drug targets and has the potential to maximize the benefit of current treatments, to improve cure rates in AML.
We present new results concerning probability distributions of times in the coalescence tree and expected allele frequencies for coalescent with large sample size. The obtained results are based on computational methodologies, which involve combining coalescence time scale changes with techniques of integral transformations and using analytical formulae for infinite products. We show applications of the proposed methodologies for computing probability distributions of times in the coalescence tree and their limits, for evaluation of accuracy of approximate expressions for times in the coalescence tree and expected allele frequencies, and for analysis of large human mitochondrial DNA dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.