BackgroundUntargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation.MethodsWe investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n = 1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci.ResultsRun day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors (KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable.ConclusionMissing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.Electronic supplementary materialThe online version of this article (10.1007/s11306-018-1420-2) contains supplementary material, which is available to authorized users.
Epigenetic regulation has been postulated to affect glucose metabolism, insulin sensitivity and the risk of type 2 diabetes. Therefore, we performed an epigenome-wide association study for measures of glucose metabolism in whole blood samples of the population-based Cooperative Health Research in the Region of Augsburg F4 study using the Illumina HumanMethylation 450 BeadChip. We identified a total of 31 CpG sites where methylation level was associated with measures of glucose metabolism after adjustment for age, sex, smoking, and estimated white blood cell proportions and correction for multiple testing using the Benjamini-Hochberg (B-H) method (four for fasting glucose, seven for fasting insulin, 25 for homeostasis model assessment-insulin resistance [HOMA-IR]; B-H-adjusted p-values between 9.2x10-5 and 0.047). In addition, DNA methylation at cg06500161 (annotated to ABCG1) was associated with all the aforementioned phenotypes and 2-hour glucose (B-H-adjusted p-values between 9.2x10-5 and 3.0x10-3). Methylation status of additional three CpG sites showed an association with fasting insulin only after additional adjustment for body mass index (BMI) (B-H-adjusted p-values = 0.047). Overall, effect strengths were reduced by around 30% after additional adjustment for BMI, suggesting that this variable has an influence on the investigated phenotypes. Furthermore, we found significant associations between methylation status of 21 of the aforementioned CpG sites and 2-hour insulin in a subset of samples with seven significant associations persisting after additional adjustment for BMI. In a subset of 533 participants, methylation of the CpG site cg06500161 (ABCG1) was inversely associated with ABCG1 gene expression (B-H-adjusted p-value = 1.5x10-9). Additionally, we observed an enrichment of the top 1,000 CpG sites for diabetes-related canonical pathways using Ingenuity Pathway Analysis. In conclusion, our study indicates that DNA methylation and diabetes-related traits are associated and that these associations are partially BMI-dependent. Furthermore, the interaction of ABCG1 with glucose metabolism is modulated by epigenetic processes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.