We consider the problem of classification of functional data into two groups by linear classifiers based on one-dimensional projections of functions. We reformulate the task to find the best classifier as an optimization problem and solve it by regularization techniques, namely the conjugate gradient method with early stopping, the principal component method and the ridge method. We study the empirical version with finite training samples consisting of incomplete functions observed on different subsets of the domain and show that the optimal, possibly zero, misclassification probability can be achieved in the limit along a possibly nonconvergent empirical regularization path. Being able to work with fragmentary training data we propose a domain extension and selection procedure that finds the best domain beyond the common observation domain of all curves. In a simulation study we compare the different regularization methods and investigate the performance of domain selection. Our methodology is illustrated on a medical data set, where we observe a substantial improvement of classification accuracy due to domain extension.We consider the problem of classification of a functional observation into one of two groups. Classification of functional data is a rich, long-standing topic comprehensively overviewed in Baíllo et al. (2011b). It was recently shown by Delaigle and Hall (2012a) that depending on the relative geometric position of the difference of the group means, representing the signal, and covariance operator, summarizing the structure of the noise, certain classifiers can have zero misclassification probability. This remarkable phenomenon, called perfect classification, is a special property of the infinite-dimensional setting and cannot occur in the multivariate context, unless in degenerate cases. It was demonstrated by Delaigle and Hall (2012a) that a particularly simple class of linear classifiers, based on a carefully chosen one-dimensional projection of the function to
Functional data are usually assumed to be observed on a common domain. However, it is often the case that some portion of the functional data is missing for some statistical unit, invalidating most of the existing techniques for functional data analysis. The development of methods able to handle partially observed or incomplete functional data is currently attracting increasing interest. We here briefly review this literature. We then focus on discrimination based on principal component analysis and illustrate a few possible methods via simulation studies and an application to the AneuRisk65 data set. We show that carrying out the analysis over the full domain, where at least one of the functional data is observed, may not be the optimal choice for classification purposes.
Overall mortality trends may be partially explained by cause-specific data. A recent example is provided by Woolf and Schoomaker (2019) who try to shed light on the decreasing trend of US life expectancy inspecting mortality by cause, finding that midlife mortality caused by drug overdoses, alcohol abuse, suicides and a diverse list of organ system diseases have particularly increased in the
When functional data are observed on parts of the domain, it is of interest to recover the missing parts of curves. Kraus (2015) proposed a linear reconstruction method based on ridge regularization. Kneip and Liebl (2020) argue that an assumption under which Kraus (2015) established the consistency of the ridge method is too restrictive and propose a principal component reconstruction method that they prove to be asymptotically optimal. In this note we relax the restrictive assumption that the true best linear reconstruction operator is Hilbert-Schmidt and prove that the ridge method achieves asymptotic optimality under essentially no assumptions. The result is illustrated in a simulation study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.