2021
DOI: 10.1038/s41598-021-84824-3
|View full text |Cite
|
Sign up to set email alerts
|

DBnorm as an R package for the comparison and selection of appropriate statistical methods for batch effect correction in metabolomic studies

Abstract: As a powerful phenotyping technology, metabolomics provides new opportunities in biomarker discovery through metabolome-wide association studies (MWAS) and the identification of metabolites having a regulatory effect in various biological processes. While mass spectrometry-based (MS) metabolomics assays are endowed with high throughput and sensitivity, MWAS are doomed to long-term data acquisition generating an overtime-analytical signal drift that can hinder the uncovering of real biologically relevant change… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(14 citation statements)
references
References 57 publications
0
14
0
Order By: Relevance
“…After the conversion of the raw data obtained by LC–HRMS to the appropriate mzXML format, an analysis by the XCMS online software 3.7.1 (The Scripps Research Institute, San Diego) allowed peak detection, chromatogram alignment, and isotope annotation. Missing data were estimated using the emvd function of the dbnorm R package . Preprocessed data were normalized using MS Total Useful Signal (MSTUS), a normalization method available on the NOREVA online software. , Normalized data were log 2 transformed and linear models for microarray data (LIMMA) using R software were applied to compare the different conditions of metal exposition, control versus 100 nM, and control versus the IC 50 .…”
Section: Methodsmentioning
confidence: 99%
“…After the conversion of the raw data obtained by LC–HRMS to the appropriate mzXML format, an analysis by the XCMS online software 3.7.1 (The Scripps Research Institute, San Diego) allowed peak detection, chromatogram alignment, and isotope annotation. Missing data were estimated using the emvd function of the dbnorm R package . Preprocessed data were normalized using MS Total Useful Signal (MSTUS), a normalization method available on the NOREVA online software. , Normalized data were log 2 transformed and linear models for microarray data (LIMMA) using R software were applied to compare the different conditions of metal exposition, control versus 100 nM, and control versus the IC 50 .…”
Section: Methodsmentioning
confidence: 99%
“…When applying the analysis workflow, we processed one compound at a time, first with ‘pw_outlier()’, then with ‘pseudo_sdc()’ using batch 4 as the training batch to estimate pseudoQC samples across all batches, and to correct for signal drift and batch effects in the data. We tested two additional non-QC correction methods, specifically the ComBat [ 23 , 24 ] and ber [ 25 ] methods implemented in the dbnorm R package [ 14 ], and compared the data corrections based on the maximum distance of trueQC points (maxDist) along the first two PCs calculated from the full peak area matrix. A smaller maxDist indicated trueQC samples had less variability, and thus, a better correction of technical errors.…”
Section: Methodsmentioning
confidence: 99%
“…While a high frequency of QC sample inclusion may be useful for thoroughly capturing systematic errors, these samples occupy LC–MS run slots that might otherwise be allocated to experimental samples, increasing the size of the experiment and subsequently increasing the opportunity for error. To this end, non-QC-based correction approaches have been attempted [ 8 , 13 , 14 ]. However, the benefits from increased experimental throughput come at the expense of a potentially reduced ability to detect and correct for systematic errors.…”
Section: Introductionmentioning
confidence: 99%
“…Signal drift correction and batch effect removal are aimed to eliminate unwanted variation components, such as ion suppression, within- and between-batch variations, and both sample and instrument sensitivity changes over time in a long injection sequence. These methods can be rationally divided into several main approaches: QC (quality control) sample-based regression, model-based methods (mainly fall into two categories: matrix factorization and location-scale-based methods), internal standards, and QC metabolites-based scaling. The sample normalization strategy aims to reduce the effect of total amount variation on the quantification of individual metabolites that occurs due to biological variation and is divided into postacquisition or data-driven and preacquisition or data-free approaches and their combination. , Both procedures (correction and normalization) are directed to remove unwanted variations from various sources of errors and solve similar problems. For this reason, correction and normalization are often not divided and used together.…”
Section: Introductionmentioning
confidence: 99%
“…For this reason, correction and normalization are often not divided and used together. Probably, the most extensive selection of correction methods is provided by NOREVA. , Other solutions include NormalizeMets, a special section in MetaboAnalyst, and dbnorm . The most accurate evaluations of correction and normalization procedures are performed in studies. , Another relatively new type of signal processing is adjusting the biological variation that recalculates each signal by the contribution to phenotype and/or experimental factors and can be implemented by linear and linear mixed modeling.…”
Section: Introductionmentioning
confidence: 99%