Functional principal component analysis (FPCA) is a popular approach in functional data analysis to explore major sources of variation in a sample of random curves. These major sources of variation are represented by functional principal components (FPCs). Most existing FPCA approaches use a set of flexible basis functions such as B-spline basis to represent the FPCs, and control the smoothness of the FPCs by adding roughness penalties. However, the flexible representations pose difficulties for users to understand and interpret the FPCs. In this article, we consider a variety of applications of FPCA and find that, in many situations, the shapes of top FPCs are simple enough to be approximated using simple parametric functions. We propose a parametric approach to estimate the top FPCs to enhance their interpretability for users. Our parametric approach can also circumvent the smoothing parameter selecting process in conventional nonparametric FPCA methods. In addition, our simulation study shows that the proposed parametric FPCA is more robust when outlier curves exist. The parametric FPCA method is demonstrated by analyzing several datasets from a variety of applications.
Metabolomic data normality is vital for many statistical analyses to identify significantly different metabolic features. However, despite the thousands of metabolomic publications every year, the study of metabolomic data distribution is rare. Using large-scale metabolomic data sets, we performed a comprehensive study of metabolomic data distributions. We showcased that metabolic features have diverse data distribution types, and the majority of them cannot be normalized correctly using conventional data transformation algorithms, including log and square root transformations. To understand the various non-normal data distributions, we proposed fitting metabolomic data into nine beta distributions, each representing a unique data distribution. The results of three large-scale data sets consistently show that two low normality types are very common. Next, we created the adaptive Box−Cox (ABC) transformation, a novel featurespecific data transformation approach for improving data normality. By tuning a power parameter based on a normality test result, ABC transformation was made to work for various data distribution types, and it showed great performance in normalizing skewed metabolomic data. Tested on a series of simulated data in Monte Carlo simulations, ABC transformation outperformed conventional data transformation approaches for both positively and negatively skewed data distributions. ABC transformation was further demonstrated in a real metabolomic study composed of three pairwise comparisons. Additional 84, 44, and 57 significant metabolites were newly confirmed after ABC transformation, corresponding to respective increases of 70.6, 13.4, and 22.9% in significant metabolites compared to the conventional metabolomic workflow. Some of these newly discovered metabolites showed promising biological meanings. ABC transformation was implemented in the R package ABCstats and is freely available on GitHub (https://github.com/HuanLab/ABCstats).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.