Background
In research questions such as in resistance breeding against the Beet necrotic yellow vein virus it is of interest to compare the virus concentrations of samples from different groups. The enzyme-linked immunosorbent assay (ELISA) counts as the standard tool to measure virus concentrations. Simple methods for data analysis such as analysis of variance (ANOVA), however, are impaired due to non-normality of the resulting optical density (OD) values as well as unequal variances in different groups.
Methods
To understand the relationship between the OD values from an ELISA test and the virus concentration per sample, we used a large serial dilution and modelled its non-linear form using a five parameter logistic regression model. Furthermore, we examined if the quality of the model can be increased if one or several of the model parameters are defined beforehand. Subsequently, we used the inverse of the best model to estimate the virus concentration for every measured OD value.
Results
We show that the transformed data are essentially normally distributed but provide unequal variances per group. Thus, we propose a generalised least squares model which allows for unequal variances of the groups to analyse the transformed data.
Conclusions
ANOVA requires normally distributed data as well as equal variances. Both requirements are not met with raw OD values from an ELISA test. A transformation with an inverse logistic function, however, gives the possibility to use linear models for data analysis of virus concentrations. We conclude that this method can be applied in every trial where virus concentrations of samples from different groups are to be compared via OD values from an ELISA test. To encourage researchers to use this method in their studies, we provide an R script for data transformation as well as the data from our trial.
We propose a novel structure selection method for high dimensional (d > 100) sparse vine copulas. Current sequential greedy approaches for structure selection require calculating spanning trees in hundreds of dimensions and fitting the pair copulas and their parameters iteratively throughout the structure selection process. Our method uses a connection between the vine and structural equation models (SEMs). The later can be estimated very fast using the Lasso, also in very high dimensions, to obtain sparse models. Thus, we obtain a structure estimate independently of the chosen pair copulas and parameters. Additionally, we define the novel concept of regularization paths for R-vine matrices. It relates sparsity of the vine copula model in terms of independence copulas to a penalization coefficient in the structural equation models. We illustrate our approach and provide many numerical examples. These include simulations and data applications in high dimensions, showing the superiority of our approach to other existing methods.
Modeling dependence in high dimensional systems has become an increasingly important topic. Most approaches rely on the assumption of a multivariate Gaussian distribution such as statistical models on directed acyclic graphs (DAGs). They are based on modeling conditional independencies and are scalable to high dimensions. In contrast, vine copula models accommodate more elaborate features like tail dependence and asymmetry, as well as independent modeling of the marginals. This flexibility comes however at the cost of exponentially increasing complexity for model selection and estimation. We show a novel connection between DAGs with limited number of parents and truncated vine copulas under sufficient conditions. This motivates a more general procedure exploiting the fast model selection and estimation of sparse DAGs while allowing for non-Gaussian dependence using vine copulas. We demonstrate in a simulation study and using a high dimensional data application that our approach outperforms standard methods for vine structure estimation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.