Modern neuroimaging studies frequently combine data collected from multiple scanners and experimental conditions. Such data often contain substantial technical variability associated with image intensity scale (image intensity scales are not the same in different images) and scanner effects (images obtained from different scanners contain substantial technical biases). Here we evaluate and compare results of data analysis methods without any data transformation (RAW), with intensity normalization using RAVEL, with regional harmonization methods using ComBat, and a combination of RAVEL and ComBat. Methods are evaluated on a unique sample of 16 study participants who were scanned on both 1.5T and 3T scanners a few months apart. Neuroradiological evaluation was conducted for 7 different regions of interest (ROI’s) pertinent to Alzheimer’s disease (AD). Cortical measures and results indicate that: (1) RAVEL substantially improved the reproducibility of image intensities; (2) ComBat is preferred over RAVEL and the RAVEL-ComBat combination in terms of regional level harmonization due to more consistent harmonization across subjects and image-derived measures; (3) RAVEL and ComBat substantially reduced bias compared to analysis of RAW images, but RAVEL also resulted in larger variance; and (4) the larger root mean square deviation (RMSD) of RAVEL compared to ComBat is due mainly to its larger variance.
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery.
Studying small effects or subtle neuroanatomical variation requires large-scale sample size data. As a result, combining neuroimaging data from multiple datasets is necessary. Variation in acquisition protocols, magnetic field strength, scanner build, and many other non-biologically related factors can introduce undesirable bias into studies. Hence, harmonization is required to remove the bias-inducing factors from the data. ComBat, introduced by (Johnson et al., 2007), is one of the most common methods applied to features from structural images. ComBat models the data using a hierarchical Bayesian model and uses the empirical Bayes approach to infer the distribution of the unknown factors. The empirical Bayes harmonization method is computationally efficient and provides valid point estimates. However, it tends to underestimate uncertainty. This paper investigates a new approach, fully Bayesian ComBat, where Monte Carlo Sampling is used for statistical inference. Our experiments show that our new fully Bayesian approach offers more accurate harmonization, unconstrained posterior distributions, and representative uncertainty quantification at the expense of higher computation costs for the inference. This fully Bayesian approach generates a rich posterior distribution, which is also useful for generating simulated imaging features for improving classifier performance in a limited data setting. We show the generative capacity of our model for augmenting and improving the detection of patients with Alzheimer's disease. Posterior distributions for harmonized imaging measures can also be used for brain-wide uncertainty comparison and more principled downstream statistical analysis. Code for our new fully Bayesian ComBat extension is available at https://github.com/batmanlab/BayesComBat.
BackgroundInter‐scanner variability hinders the direct comparability of multi‐site/scanner MRI data for clinical research. The ComBat method is commonly used to reduce the variability based on an empirical Bayes framework1,2, harmonizing the data at the feature level (e.g., region‐of‐interest measures). However, directly harmonizing the scans at the voxel‐level using ComBat has been relatively less explored. In this study, we investigated the performance of the voxel‐wise ComBat. Also, going beyond voxels, we proposed a new ComBat approach which operates on a small group of voxels called superpixels3.MethodEighteen subjects (10 patients with Alzheimer's disease and 8 controls; age: 68.0 [9.3] years; 10 females) participated in this study. For each subject, T1‐weighted images were acquired on each of four 3T scanners with different manufacturers or models (i.e., GE, Philips, Siemens‐Prisma, Siemens‐Trio). After the standard image preprocessing including two‐step registration by using the Statistical Parametric Mapping (SPM12)4, the unharmonized scans (Raw data) were aligned in the standard template space. To reduce the computational load for ComBat at the voxel level (Voxel‐ComBat), we used a three‐dimensional superpixel algorithm3 to parcellate the images into hundreds of superpixels based on the study‐specific template, and then the ComBat was applied at the superpixel level (Figure 1). Compared to Voxel‐Combat operating on about half million voxels (computation time >>10,000 seconds), this superpixel ComBat (SP‐ComBat) operates on only a few hundred superpixels, significantly improving the computation efficiency (computation time < 5 seconds) while maintaining the harmonization performance. The harmonized scans were used to estimate cortical thickness by employing surface‐based morphometry5, and the coefficients of variation of thickness measures were calculated to evaluate the harmonization performance.ResultThe harmonized data provided similar contrasts across scanners compared to the Raw images in visual inspection (Figure 2) and had comparable distributions of the tissue‐specific signal intensity between scanners for both Voxel‐ComBat and SP‐ComBat (Figure 3). Also, these two methods significantly reduced the inter‐scanner variation (both p‐values < 0.001) in terms of cortical thickness measures (Figure 4).ConclusionThis study evaluated the feasibility and effectiveness of Voxel‐ComBat and proposed a new approach, SP‐ComBat, to optimize the efficiency of ComBat harmonization at the voxel level.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.