The problem of performing functional linear regression when the response variable is represented as a probability density function (PDF) is addressed. PDFs are interpreted as functional compositions, which are objects carrying primarily relative information. In this context, the unit integral constraint allows to single out one of the possible representations of a class of equivalent measures. On these bases, a function-on-scalar regression model with distributional response is proposed, by relying on the theory of Bayes Hilbert spaces. The geometry of Bayes spaces allows capturing all the key inherent features of distributional data (e.g., scale invariance, relative scale). A B-spline basis expansion combined with a functional version of the centred log-ratio transformation is utilized for actual computations. For this purpose, a new key result is proved to characterize B-spline representations in Bayes spaces. The potential of the methodological developments is shown on simulated data and a real case study, dealing with metabolomics data. A bootstrap-based study is performed for the uncertainty quantification of the obtained estimates.
Premise Seed germination over time is characterized by a sigmoid curve, called a germination curve, in which the percentage (or absolute number) of seeds that have completed germination is plotted against time. A number of individual coefficients have been developed to characterize this germination curve. However, as germination is considered to be a qualitative developmental response of an individual seed that occurs at one time point, but individual seeds within a given treatment respond at different time points, it has proven difficult to develop a single index that satisfactorily incorporates both percentage and rate. The aim of this paper is to develop a new coefficient, the continuous germination index (CGI), which quantifies seed germination as a continuous process, and to compare the CGI with other commonly used indexes. Methods To create the new index, the germination curves were smoothed using nondecreasing splines and the CGI was derived as the area under the resulting spline. For the comparison of the CGI with other common indexes, a regression model with functional response was developed. Results Using both an experimentally obtained wild pea (Pisum sativum subsp. elatius) seed data set and a hypothetical data set, we showed that the CGI is able to characterize the germination process better than most other indices. The CGI captures the local behavior of the germination curves particularly well. Discussion The CGI can be used advantageously for the characterization of the germination process. Moreover, B‐spline coefficients extracted by its construction can be employed for the further statistical processing of germination curves using functional data analysis methods.
In functional data analysis, some regions of the domain of the functions can be of more interest than others owing to the quality of measurement, relative scale of the domain, or simply some external reason (e.g. interest of stakeholders). Weighting the domain is of interest particularly with probability density functions (PDFs), as derived from distributional data, which often aggregate measurements of different quality or are affected by scale effects. A weighting scheme can be embedded into the underlying sample space of a PDF when it is considered as continuous compositions applying the theory of Bayes spaces. The origin of a Bayes space is determined by a given reference measure, and this can be easily changed through the well‐known chain rule. This work provides a formal framework for defining weights through a reference measure, and it is used to develop a weighting scheme on the bounded domain of distributional data. The impact on statistical analysis is illustrated through an application to functional principal component analysis of income distribution data. Moreover, a novel centred log‐ratio transformation is proposed to map a weighted Bayes space into an unweighted space, enabling to use most tools developed in functional data analysis (e.g. clustering and regression analysis) while accounting for the weighting scheme. The potential of our proposal is shown on a real case study using Italian income data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.