SummaryMost common human diseases are likely to have complex etiologies. Methods of analysis that allow for the phenomenon of epistasis are of growing interest in the genetic dissection of complex diseases. By allowing for epistatic interactions between potential disease loci, we may succeed in identifying genetic variants that might otherwise have remained undetected. Here we aimed to analyze the ability of logistic regression (LR) and two tree-based supervised learning methods, classification and regression trees (CART) and random forest (RF), to detect epistasis. Multifactor-dimensionality reduction (MDR) was also used for comparison. Our approach involves first the simulation of datasets of autosomal biallelic unphased and unlinked single nucleotide polymorphisms (SNPs), each containing a two-loci interaction (causal SNPs) and 98 'noise' SNPs. We modelled interactions under different scenarios of sample size, missing data, minor allele frequencies (MAF) and several penetrance models: three involving both (indistinguishable) marginal effects and interaction, and two simulating pure interaction effects. In total, we have simulated 99 different scenarios. Although CART, RF, and LR yield similar results in terms of detection of true association, CART and RF perform better than LR with respect to classification error. MAF, penetrance model, and sample size are greater determining factors than percentage of missing data in the ability of the different techniques to detect true association. In pure interaction models, only RF detects association. In conclusion, tree-based methods and LR are important statistical tools for the detection of unknown interactions among true risk-associated SNPs with marginal effects and in the presence of a significant number of noise SNPs. In pure interaction models, RF performs reasonably well in the presence of large sample sizes and low percentages of missing data. However, when the study design is suboptimal (unfavourable to detect interaction in terms of e.g. sample size and MAF) there is a high chance of detecting false, spurious associations.
We introduce a nonparametric estimator of the conditional survival function in the mixture cure model for right‐censored data when cure status is partially known. The estimator is developed for the setting of a single continuous covariate but it can be extended to multiple covariates. It extends the estimator of Beran, which ignores cure status information. We obtain an almost sure representation, from which the strong consistency and asymptotic normality of the estimator are derived. Asymptotic expressions of the bias and variance demonstrate a reduction in the variance with respect to Beran's estimator. A simulation study shows that, if the bandwidth parameter is suitably chosen, our estimator performs better than others for an ample range of covariate values. A bootstrap bandwidth selector is proposed. Finally, the proposed estimator is applied to a real dataset studying survival of sarcoma patients.
In this work, an alternative method to the Arrhenius equation for thermogravimetric analysis (TGA) is presented. It is based in performing a logistic regression of the raw TGA data. This model assumes that more than one physical process may be involved in each mass loss step and that each physical process may extend along all the experiment. The logistic mixture obtained explains the complete TGA trace, including as many mass loss steps as the experiment has. The typical asymptotic tendency of the mass loss steps is perfectly reproduced by the model. A discussion of the model from the statistical point of view is presented as well as a comparison with other classical models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.