Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline

Hrydziuszko, Olga; Viant, Mark R.

doi:10.1007/s11306-011-0366-4

Cited by 189 publications

(208 citation statements)

References 38 publications

Supporting

Mentioning

200

Contrasting

Unclassified

Order By: Relevance

“…Missing values in mass spectrometry arise for multiple reasons including randomly due to technical issues but also due to the occurrence of compounds at concentrations below a pre-determined threshold or by truncation based on a signal-to-noise ratio. Webb-Robertson et al (2015) and Hrydziuszko and Viant (2011) both showed that the proportion of missing values increases with declining peak abundance suggesting some detection limit censoring. For missing values arising from detection limit censoring, the unobserved values are small and would decrease the mean if they had been observed.…”

Section: Test Statistics and Significance Determinationmentioning

confidence: 99%

“…Hrydziuszko and Viant, 2011;Wang et al, 2012;WebbRobertson et al, 2015) which presents a significant challenge for statistical analysis (see e.g. Clough et al, 2009).…”

Section: Introductionmentioning

confidence: 99%

“…This manipulation yields a complete dataset to which standard statistical methods can be applied. Several imputation techniques have been shown to perform adequately for single biospecimen analyses for up to about 20% missing data (Gromski et al, 2014;Hrydziuszko and Viant, 2011). However, for multiple biospecimen investigations, we recently showed that a wide range of imputation methods result in substantial changes to the betweenbiospecimen correlation and multivariate analysis of variance (MANOVA) inferential results, particularly when large amounts of missing data are present (Taylor et al, 2016).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens

et al. 2016

View full text Add to dashboard Cite

Motivation: High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact betweenbiospecimen correlation and multivariate analysis results. Results: We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. Availability and Implementation: We provide R functions to implement and illustrate our method as supplementary information.

show abstract

Section: Test Statistics and Significance Determinationmentioning

confidence: 99%

“…Hrydziuszko and Viant, 2011;Wang et al, 2012;WebbRobertson et al, 2015) which presents a significant challenge for statistical analysis (see e.g. Clough et al, 2009).…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens

et al. 2016

View full text Add to dashboard Cite

show abstract

“…Missing values were replaced after normalisation with half the detection limit, i.e. 50 % of the minimum value found in the dataset (Xia et al 2009;Hrydziuszko and Viant 2012).…”

Section: Data Extraction Pre-processing and Normalisationmentioning

confidence: 99%

Untargeted urine metabolomics reveals a biosignature for muscle respiratory chain deficiencies

et al. 2014

View full text Add to dashboard Cite

Mitochondrial diseases are a heterogeneous group of disorders characterised by impaired mitochondrial oxidative phosphorylation system. Most often for mitochondrial disease, where no metabolic diagnostic biomarkers exist, a deficiency is diagnosed after analysing the respiratory chain enzymes (complexes I-IV) in affected tissues or by identifying one of an ever expanding number of DNA mutations. This presents a great challenge to identify cases to undergo the invasive diagnostic procedures required. An untargeted liquid chromatography mass spectrometry metabolomics approach was used to search for a metabolic biosignature that can distinguish respiratory chain deficient (RCD) patients from clinical controls (CC). A cohort of 37 ethnically diverse cases was used. Sample preparation, liquid chromatography time-of-flight mass spectrometry methods and data processing methods were standardised. Furthermore the developed methodology used reverse phase chromatography in conjunction with positive electrospray ionisation and hydrophilic interaction chromatography with negative electrospray ionisation. Urine samples of 37 patients representing two different experimental groups were analysed. The two experimental groups comprised of patients with confirmed RCDs and CC. After a variety of data mining steps and statistical analyses a list of 12 features were compiled with the ability to distinguish between patients with RCDs and CC. Although the features of the biosignature needs to be identified and the biosignature validated, this study demonstrates the value of untargeted metabolomics to identify a metabolic biosignature to possibly be applied in the selection criteria for RCDs.

show abstract

“…Mass spectrometry (MS) is one of the main techniques for metabolomics studies (Dettmer et al 2007). However, missing values, that certain compounds cannot be identified/quantified in certain samples, occur widely in MS-based metabolomics data due to technical and biological reasons (Bijlsma et al 2006;Hrydziuszko and Viant 2012). Generally, there are three types of missing values, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) (Gelman and Hill 2006;Little and Rubin 2002).…”

Section: Introductionmentioning

confidence: 99%

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data

Wei

Wang

et al. 2017

Preprint

View full text Add to dashboard Cite

Introduction Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection of methods can significantly affect following data analyses and interpretations. According to the definition, there are three types of missing values, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).Objectives The aim of this study was to comprehensively compare common imputation methods for different types of missing values using two separate metabolomics data sets (977 and 198 serum samples respectively) to propose a strategy to deal with missing values in metabolomics studies.Methods Imputation methods included zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC). Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate the imputation accuracy for MCAR/MAR and MNAR correspondingly. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes sum of squared error were used to evaluate the overall sample distribution. Student's t-test followed by Pearson correlation analysis was conducted to evaluate the effect of imputation on univariate statistical analysis.Results Our findings demonstrated that RF imputation performed the best for MCAR/MAR and QRILC was the favored one for MNAR.peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/171967 doi: bioRxiv preprint first posted online Aug. 17, 2017; 4 Conclusion Combining with "modified 80% rule", we proposed a comprehensive strategy and developed a public-accessible web-tool for missing value imputation in metabolomics data.

show abstract

Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline

Cited by 189 publications

References 38 publications

Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens

Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens

Untargeted urine metabolomics reveals a biosignature for muscle respiratory chain deficiencies

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data

Contact Info

Product

Resources

About