Clustering is a challenging problem in unsupervised learning. In lieu of a gold standard, stability has become a valuable surrogate to performance and robustness. In this work, we propose a non-parametric bootstrapping approach to estimating the stability of a clustering method, which also captures stability of the individual clusters and observations. This flexible framework enables different types of comparisons between clusterings and can be used in connection with two if possible bootstrap approaches for stability. The first approach, scheme 1, can be used to assess confidence (stability) around clustering from the original dataset based on bootstrap replications. A second approach, scheme 2, searches over the bootstrap clusterings for an optimally stable partitioning of the data. The two schemes accommodate different model assumptions that can be motivated by an investigator's trust (or lack thereof) in the original data and additional computational considerations. We propose a hierarchical visualization extrapolated from the stability profiles that give insights into the separation of groups, and projected visualizations for the inspection of the stability of individual operations. Our approaches show good performance in simulation and on real data. These approaches can be implemented using the R package bootcluster that is available on the Comprehensive R Archive Network (CRAN).
Background:The metabolome is a collection of exogenous chemicals and metabolites from cellular processes that may reflect the body’s response to environmental exposures. Studies of air pollution and metabolomics are limited.Objectives:To explore changes in the human metabolome before, during, and after the 2008 Beijing Olympics Games, when air pollution was high, low, and high, respectively.Methods:Serum samples were collected before, during, and after the Olympics from 26 participants in an existing panel study. Gas and ultra-high performance liquid chromatography/mass spectrometry were used in metabolomics analysis. Repeated measures ANOVA, network analysis, and enrichment analysis methods were employed to identify metabolites and classes associated with air pollution changes.Results:A total of 886 molecules were measured in our metabolomics analysis. Network partitioning identified four modules with 65 known metabolites that significantly changed across the three time points. All known molecules in the first module (n=33) were lipids (e.g., eicosapentaenoic acid, stearic acid). The second module consisted primarily of dipeptides (n=24, e.g., isoleucylglycine) plus 8 metabolites from four other classes (e.g., hypoxanthine, 12-hydroxyeicosatetraenoic acid). Most of the metabolites in Modules 3 (19 of 23) and 4 (5 of 5) were unknown. Enrichment analysis of module-identified metabolites indicted significantly overrepresented pathways, including long- and medium-chain fatty acids, polyunsaturated fatty acids (n3 and n6), eicosanoids, lysolipid, dipeptides, fatty acid metabolism, and purine metabolism [(hypo) xanthine/inosine–containing pathways].Conclusions:We identified two major metabolic signatures: one consisting of lipids, and a second that included dipeptides, polyunsaturated fatty acids, taurine, and xanthine. Metabolites in both groups decreased during the 2008 Beijing Olympics, when air pollution was low, and increased after the Olympics, when air pollution returned to normal (high) levels. https://doi.org/10.1289/EHP3705
The microbiome influences health and disease through complex networks of host genetics, genomics, microbes, and environment. Identifying the mechanisms of these interactions has remained challenging. Systems genetics in laboratory mice (Mus musculus) enables data-driven discovery of biological network components and mechanisms of host–microbial interactions underlying disease phenotypes. To examine the interplay among the whole host genome, transcriptome, and microbiome, we mapped QTL and correlated the abundance of cecal messenger RNA, luminal microflora, physiology, and behavior in a highly diverse Collaborative Cross breeding population. One such relationship, regulated by a variant on chromosome 7, was the association of Odoribacter (Bacteroidales) abundance and sleep phenotypes. In a test of this association in the BKS.Cg-Dock7m +/+ Leprdb/J mouse model of obesity and diabetes, known to have abnormal sleep and colonization by Odoribacter, treatment with antibiotics altered sleep in a genotype-dependent fashion. The many other relationships extracted from this study can be used to interrogate other diseases, microbes, and mechanisms.
Telomere length is a heritable marker of cellular age that is associated with morbidity and mortality. Poor sleep behaviors, which are also associated with adverse health events, may be related to leukocyte telomere length (LTL). We studied a subpopulation of 3,145 postmenopausal women (1,796 European-American (EA) and 1,349 African-American (AA)) enrolled in the Women’s Health Initiative in 1993–1998 with data on Southern blot-measured LTL and self-reported usual sleep duration and sleep disturbance. LTL-sleep associations were analyzed separately for duration and disturbance using weighted and confounder-adjusted linear regression models in the entire sample (AAs + EAs; adjusted for race/ethnicity) and in racial/ethnic strata, since LTL differs by ancestry. After adjustment for covariates, each additional daily hour of sleep beyond 5 hours, approximately, was associated with a 27-base-pair (95% confidence interval (CI): 6, 48) longer LTL in the entire sample. Associations between sleep duration and LTL were strongest among AAs (adjusted β = 37, 95% CI: 4, 70); a similar, nonsignificant association was observed for EAs (adjusted β = 20, 95% CI: −7, 48). Sleep disturbance was not associated with LTL in our study. Our models did not show departure from linearity (quadratic sleep terms: P ≥ 0.55). Our results suggest that longer sleep duration is associated with longer LTL in postmenopausal women.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.