MotivationBiobanks are important infrastructures for life science research. Optimal sample handling regarding e.g. collection and processing of biological samples is highly complex, with many variables that could alter sample integrity and even more complex when considering multiple study centers or using legacy samples with limited documentation on sample management. Novel means to understand and take into account such variability would enable high-quality research on archived samples.ResultsThis study investigated whether pre-analytical sample variability could be predicted and reduced by modeling alterations in the plasma metabolome, measured by NMR, as a function of pre-centrifugation conditions (1–36 h pre-centrifugation delay time at 4 °C and 22 °C) in 16 individuals. Pre-centrifugation temperature and delay times were predicted using random forest modeling and performance was validated on independent samples. Alterations in the metabolome were modeled at each temperature using a cluster-based approach, revealing reproducible effects of delay time on energy metabolism intermediates at both temperatures, but more pronounced at 22 °C. Moreover, pre-centrifugation delay at 4 °C resulted in large, specific variability at 3 h, predominantly of lipids. Pre-analytical sample handling error correction resulted in significant improvement of data quality, particularly at 22 °C. This approach offers the possibility to predict pre-centrifugation delay temperature and time in biobanked samples before use in costly downstream applications. Moreover, the results suggest potential to decrease the impact of undesired, delay-induced variability. However, these findings need to be validated in multiple, large sample sets and with analytical techniques covering a wider range of the metabolome, such as LC-MS.Availability and implementationThe sampleDrift R package is available at https://gitlab.com/CarlBrunius/sampleDrift.Supplementary information
Supplementary data are available at Bioinformatics online.