Data depth is a concept that measures the centrality of a point in a given data cloud x 1 , x 2 , . . . , x n ∈ R d or in a multivariate distribution P X on R d . Every depth defines a family of so-called trimmed regions. The α-trimmed region is given by the set of points that have a depth of at least α. Data depth has been used to define multivariate measures of location and dispersion as well as multivariate dispersion orders.If the depth of a point can be represented as the minimum of the depths with respect to all unidimensional projections, we say that the depth satisfies the (weak) projection property. Many depths which have been proposed in the literature can be shown to satisfy the weak projection property. A depth is said to satisfy the strong projection property if for every α the unidimensional projection of the α-trimmed region equals the α-trimmed region of the projected distribution.After a short introduction into the general concept of data depth we formally define the weak and the strong projection property and give necessary and sufficient criteria for the projection property to hold. We further show that the projection property facilitates the construction of depths from univariate trimmed regions. We discuss some of the depths proposed in the literature which possess the projection property and define a general class of projection depths, which are constructed from univariate trimmed regions by using the above method.Finally, algorithmic aspects of projection depths are discussed. We describe an algorithm which enables the approximate computation of depths that satisfy the projection property.
For computing the exact value of the halfspace depth of a point w.r.t. a data cloud of n points in arbitrary dimension, a theoretical framework is suggested. Based on this framework a whole class of algorithms can be derived. In all of these algorithms the depth is calculated as the minimum over a finite number of depth values w.r.t. proper projections of the data cloud. Three variants of this class are studied in more detail. All of these algorithms are capable of dealing with data that are not in general position and even with data that contain ties. As is shown by simulations, all proposed algorithms prove to be very efficient.
Following the seminal idea of Tukey (1975), data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R-package ddalpha is a software directed to fuse experience of the applicant with recent achievements in the area of data depth and depth-based classification.ddalpha provides an implementation for exact and approximate computation of most reasonable and widely applied notions of data depth. These can be further used in the depth-based multivariate and functional classifiers implemented in the package, where the DDα-procedure is in the main focus. The package is expandable with user-defined custom depth methods and separators. The implemented functions for depth visualization and the built-in benchmark procedures may also serve to provide insights into the geometry of the data and the quality of pattern recognition.Being intrinsically nonparametric, a depth function captures the geometrical features of given data in an affine-invariant way. By that, it appears to be useful for description of data's location, scatter, and shape, allowing for multivariate inference, detection of outliers, ordering of multivariate distributions, and in particular classification, that recently became an important and rapidly developing application of the depth machinery. While the parameter-free nature of data depth ensures attractive theoretical properties of classifiers, its ability to reflect data topology provides promising predicting results on finite samples. Classification in the depth spaceConsider the following setting for supervised classification: Given a training sample consisting of q classes X 1 , ..., X q , each containing n i , i = 1, ..., q, observations in R d . For a new observation x 0 , a class should be determined, to which it most probably belongs. Depth-based learning started with plug-in type classifiers. Ghosh and Chaudhuri (2005b) construct a depth-based classifier, which, in its naïve form, assigns the observation x 0 to the class in which it has maximal depth. They suggest an extension of the classifier, that is consistent w.r.t. Bayes risk for classes stemming from elliptically symmetric distributions. Further Dutta and Ghosh (2011, 2012) suggest a robust classifier and a classifier for L p -symmetric distributions, see also Cui et al. (2008), Mosler and Hoberg (2006), and additionally Jörnsten (2004) for unsupervised classification.A novel way to perform depth-based classification has been suggested by Li et al. (2012): first map a pair of training classes into a two-dimensional depth space, which is called the DD-plot, and then perform classification by selecting a polynomial that minimizes empirical risk. Finding such an optimal polynomial numerically is a very challenging and -when done appropriatelycomputationally involved task, with a solution that in practice ca...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.