The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constraint-based learning of Bayesian networks. Most of the currently available feature selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. In that respect the SES algorithm subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm.The SES algorithm is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data.
Diverse molecular networks underlying plant growth and development are rapidly being uncovered. Integrating these data into the spatial and temporal context of dynamic organ growth remains a technical challenge. We developed 3DCellAtlas, an integrative computational pipeline that semiautomatically identifies cell types and quantifies both 3D cellular anisotropy and reporter abundance at single-cell resolution across whole plant organs. Cell identification is no less than 97.8% accurate and does not require transgenic lineage markers or reference atlases. Cell positions within organs are defined using an internal indexing system generating cellular level organ atlases where data from multiple samples can be integrated. Using this approach, we quantified the organ-wide cell-type-specific 3D cellular anisotropy driving Arabidopsis thaliana hypocotyl elongation. The impact ethylene has on hypocotyl 3D cell anisotropy identified the preferential growth of endodermis in response to this hormone. The spatiotemporal dynamics of the endogenous DELLA protein RGA, expansin gene EXPA3, and cell expansion was quantified within distinct cell types of Arabidopsis roots. A significant regulatory relationship between RGA, EXPA3, and growth was present in the epidermis and endodermis. The use of single-cell analyses of plant development enables the dynamics of diverse regulatory networks to be integrated with 3D organ growth.
The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.
We define a distribution on the unit sphere S d−1 called the elliptically symmetric angular Gaussian distribution. This distribution, which to our knowledge has not been studied before, is a subfamily of the angular Gaussian distribution closely analogous to the Kent subfamily of the general Fisher-Bingham distribution. Like the Kent distribution, it has ellipse-like contours, enabling modelling of rotational asymmetry about the mean direction, but it has the additional advantages of being simple and fast to simulate from, and having a density and hence likelihood that is easy and very quick to compute exactly. These advantages are especially beneficial for computationally intensive statistical methods, one example of which is a parametric bootstrap procedure for inference for the directional mean that we describe.
The characteristic function of the folded normal distribution and its moment function are derived. The entropy of the folded normal distribution and the Kullback-Leibler from the normal and half normal distributions are approximated using Taylor series. The accuracy of the results are also assessed using different criteria. The maximum likelihood estimates and confidence intervals for the parameters are obtained using the asymptotic theory and bootstrap method. The coverage of the confidence intervals is also examined.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.