SUMMARY We hypothesized that DNA methylation distributes into specific patterns in cancer cells, which reflect critical biological differences. We therefore examined the methylation profiles of 344 patients with acute myeloid leukemia (AML). Clustering of these patients by methylation data segregated patients into 16 groups. Five of these groups defined new AML subtypes that shared no other known feature. In addition, DNA methylation profiles segregated patients with CEBPA aberrations from other subtypes of leukemia, defined four epigenetically distinct forms of AML with NPM1 mutations, and showed that established AML1-ETO, CBFb-MYH11, and PML-RARA leukemia entities are associated with specific methylation profiles. We report a 15 gene methylation classifier predictive of overall survival in an independent patient cohort (p < 0.001, adjusted for known covariates).
Genome-wide association studies (GWAS) are a popular approach for identifying common genetic variants and epistatic effects associated with a disease phenotype. The traditional statistical analysis of such GWAS attempts to assess the association between each individual Single Nucleotide Polymorphism (SNP) and the observed phenotype. Recently, kernel machine-based tests for association between a SNP set (e.g., SNPs in a gene) and the disease phenotype have been proposed as a useful alternative to the traditional individual SNP approach, and allow for flexible modeling of the potentially complicated joint SNP effects in a SNP set while adjusting for covariates. We extend the kernel machine framework to accommodate related subjects from multiple independent families, and provide a score-based variance component test for assessing the association of a given SNP set with a continuous phenotype, while adjusting for additional covariates and accounting for within-family correlation. We illustrate the proposed method using simulation studies and an application to genetic data from the Genetic Epidemiology Network of Arteriopathy (GENOA) study.
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
There is increasing interest in the joint analysis of multiple phenotypes in genome-wide association studies (GWASs), especially for the analysis of multiple secondary phenotypes in case-control studies and in detecting pleiotropic effects. Multiple phenotypes often measure the same underlying trait. By taking advantage of similarity across phenotypes, one could potentially gain statistical power in association analysis. Because continuous phenotypes are likely to be measured on different scales, we propose a scaled marginal model for testing and estimating the common effect of single-nucleotide polymorphism (SNP) on multiple secondary phenotypes in case-control studies. This approach improves power in comparison to individual phenotype analysis and traditional multivariate analysis when phenotypes are positively correlated and measure an underlying trait in the same direction (after transformation) by borrowing strength across outcomes with a one degree of freedom (1-DF) test and jointly estimating outcome-specific scales along with the SNP and covariate effects. To account for case-control ascertainment bias for the analysis of multiple secondary phenotypes, we propose weighted estimating equations for fitting scaled marginal models. This weighted estimating equation approach is robust to departures from normality of continuous multiple phenotypes and the misspecification of within-individual correlation among multiple phenotypes. Statistical power improves when the within-individual correlation is correctly specified. We perform simulation studies to show the proposed 1-DF common effect test outperforms several alternative methods. We apply the proposed method to investigate SNP associations with smoking behavior measured with multiple secondary smoking phenotypes in a lung cancer case-control GWAS and identify several SNPs of biological interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.