Large-scale population phenotyping for molecular epidemiological studies is subject to all the usual criteria of analytical chemistry. As part of a major phenotyping investigation we have used high-resolution 1H NMR spectroscopy to characterize 24-h urine specimens obtained from population samples in Aito Town, Japan (n = 259), Chicago, IL (n = 315), and Guangxi, China (n = 278). We have investigated analytical reproducibility, urine specimen storage procedures, interinstrument variability, and split specimen detection. Our data show that the multivariate analytical reproducibility of the NMR screening platform was >98% and that most classification errors were due to urine specimen handling inhomogeneity. Differences in metabolite profiles were then assessed for Aito Town, Chicago, and Guangxi population samples; novel combinations of biomarkers were detected that separated the population samples. These cross-population differences in urinary metabolites could be related to genetic, dietary, and gut microbial factors.
Rates of heart disease and stroke vary markedly between north and south China. A 1H NMR spectroscopy-based Metabolome-Wide Association approach was used to identify urinary metabolites that discriminate between southern and northern Chinese population samples, to investigate population biomarkers that might relate to the difference in cardiovascular disease risk. NMR spectra were acquired from two 24-hour urine specimens per person for 523 northern and 244 southern Chinese participants in the INTERMAP Study of macro/micronutrients and BP. Discriminating metabolites were identified using Orthogonal Partial Least Squares Discriminant Analysis and assessed for statistical significance with conservative Family Wise Error Rate <0.01 to minimise false positive findings. Urinary metabolites significantly (P <1.2×10−16 to 2.9×10−69) higher in northern than southern Chinese populations included dimethylglycine, alanine, lactate, branched-chain amino acids (isoleucine, leucine, valine), N-acetyls of glycoprotein fragments (including uromodulin), N-acetyl neuraminic acid, pentanoic/heptanoic acid, methylguanidine; metabolites significantly (P <1.1×10−12 to 2×10−127) higher in the south were gut microbial co-metabolites (hippurate, 4-cresyl sulphate, phenylacetylglutamine; 2-hydroxyisobutyrate), succinate, creatine, scyllo-inositol, prolinebetaine, trans-aconitate. These findings indicate the importance of environmental influences (e.g., diet), endogenous metabolism and mammalian-gut microbial co-metabolism, that may help explain north-south China differences in cardiovascular disease risk.
LC/MS is an analytical technique that, due to its high sensitivity, has become increasingly popular for the generation of metabolic signatures in biological samples and for the building of metabolic data bases. However, to be able to create robust and interpretable (transparent) multivariate models for the comparison of many samples, the data must fulfil certain specific criteria: (i) that each sample is characterized by the same number of variables, (ii) that each of these variables is represented across all observations, and (iii) that a variable in one sample has the same biological meaning or represents the same metabolite in all other samples. In addition, the obtained models must have the ability to make predictions of, e.g. related and independent samples characterized accordingly to the model samples. This method involves the construction of a representative data set, including automatic peak detection, alignment, setting of retention time windows, summing in the chromatographic dimension and data compression by means of alternating regression, where the relevant metabolic variation is retained for further modelling using multivariate analysis. This approach has the advantage of allowing the comparison of large numbers of samples based on their LC/MS metabolic profiles, but also of creating a means for the interpretation of the investigated biological system. This includes finding relevant systematic patterns among samples, identifying influential variables, verifying the findings in the raw data, and finally using the models for predictions. The presented strategy was here applied to a population study using urine samples from two cohorts, Shanxi (People's Republic of China) and Honolulu (USA). The results showed that the evaluation of the extracted information data using partial least square discriminant analysis (PLS-DA) provided a robust, predictive and transparent model for the metabolic differences between the two populations. The presented findings suggest that this is a general approach for data handling, analysis, and evaluation of large metabolic LC/MS data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.