The Hájek-Feldman dichotomy establishes that two Gaussian measures are either mutually absolutely continuous with respect to each other (and hence there is a Radon-Nikodym density for each measure with respect to the other one) or mutually singular. Unlike the case of finite dimensional Gaussian measures, there are non-trivial examples of both situations when dealing with Gaussian stochastic processes. This paper provides:(a) Explicit expressions for the optimal (Bayes) rule and the minimal classification error probability in several relevant problems of supervised binary classification of mutually absolutely continuous Gaussian processes. The approach relies on some classical results in the theory of Reproducing Kernel Hilbert Spaces (RKHS).(b) An interpretation, in terms of mutual singularity, for the "near perfect classification" phenomenon described by Delaigle and Hall (2012). We show that the asymptotically optimal rule proposed by these authors can be identified with the sequence of optimal rules for an approximating sequence of classification problems in the absolutely continuous case.(c) A new model-based method for variable selection in binary classification problems, which arises in a very natural way from the explicit knowledge of the RN-derivatives and the underlying RKHS structure. Different classifiers might be used from the selected variables. In particular, the classical, linear finite-dimensional Fisher rule turns out to be consistent under some standard conditions on the underlying functional model.
Technology is generating a huge and growing availability of observations of diverse nature. This big data is placing data learning as a central scientific discipline. It includes collection, storage, preprocessing, visualization and, essentially, statistical analysis of enormous batches of data. In this paper, we discuss the role of statistics regarding some of the issues raised by big data in this new paradigm and also propose the name of data learning to describe all the activities that allow to obtain relevant knowledge from this new source of information.
Esta es la versión de autor del artículo publicado en: This is an author produced version of a paper published in: The use of variable selection methods is particularly appealing in statistical problems with functional data. The obvious general criterion for variable selection is to choose the 'most representative' or 'most relevant' variables. However, it is also clear that a purely relevanceoriented criterion could lead to select many redundant variables. The mRMR (minimum Redundance Maximum Relevance) procedure, proposed by Ding and Peng (2005) and Peng et al. (2005) is an algorithm to systematically perform variable selection, achieving a reasonable trade-off between relevance and redundancy. In its original form, this procedure is based on the use of the so-called mutual information criterion to assess relevance and redundancy.Keeping the focus on functional data problems, we propose here a modified version of the mRMR method, obtained by replacing the mutual information by the new association measure (called distance correlation) suggested by Székely et al. (2007). We have also performed an extensive simulation study, including 1600 functional experiments (100 functional models × 4 sample sizes × 4 classifiers) and three real-data examples aimed at comparing the different versions of the mRMR methodology. The results are quite conclusive in favor of the new proposed alternative.
Objectives This paper focuses on the issue of intimate partner violence and, specifically, on the distribution of femicides over time and the existence of copycat effects. This is the subject of an ongoing debate often triggered by the social alarm following multiple intimate partner homicides (IPHs) occurring in a short span of time. The aim of this research is to study the evolution of IPHs and provide a far-reaching answer by rigorously analyzing and searching for patterns in data on femicides. Methods The study analyzes an official dataset, provided by the system VioGén of the Secretaría de Estado de Seguridad (Spanish State Secretariat for Security), including all the femicides occurred in Spain in 2007-2017. A statistical methodology to identify temporal interdependencies in count time series is proposed and applied to the dataset. The same methodology can be applied to other contexts. Results There has been a decreasing trend in the number of femicides per year. No interdependencies among the temporal distribution of femicides are observed. Therefore, according to data, the existence of copycat effect in femicides cannot be claimed. Conclusions Around 2011 there was a clear change in the average number of femicides which has not picked up. Results allow for an informed answer to the debate on copycat effect in Spanish femicides. The planning of femicides prevention activities should not be a reaction to a perceived increase in their occurrence. As a copycat effect is not detected in the studied time period, there is no evidence supporting the need to censor media reports on femicides.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.