When classification methods are applied to high-dimensional data, selecting a subset of the predictors may lead to an improvement in the predictive ability of the estimated model, in addition to reducing the model complexity. In Functional Data Analysis (FDA), i.e., when data are functions, selecting a subset of predictors corresponds to selecting a subset of individual time instants in the time interval in which the functional data are measured. In this paper, we address the problem of selecting the most informative time instants in multivariate functional data, a case much less studied than its single-variate counterpart. Our proposal allows one to use in a very simple way high-order information of the data, e.g. monotonicity or convexity by means of the functional data derivatives. The aforementioned problem is addressed with tools of Global Optimization in continuous variables: the time instants are selected to maximize the correlation between the class label and the Support Vector Machine score used for classification. The effectiveness of the proposal is shown in univariate and multivariate datasets.
Functional Data Analysis (FDA) is devoted to the study of data which are functions. Support Vector Machine (SVM) is a benchmark tool for classification, in particular, of functional data. SVM is frequently used with a kernel (e.g.: Gaussian) which involves a scalar bandwidth parameter. In this paper, we propose to use kernels with functional bandwidths. In this way, accuracy may be improved, and the time intervals critical for classification are identified. Tuning the functional parameters of the new kernel is a challenging task expressed as a continuous optimization problem, solved by means of a heuristic. Our experiments with benchmark data sets show the advantages of using functional parameters and the effectiveness of our approach.
When continuously monitoring processes over time, data is collected along a whole period, from which only certain time instants and certain time intervals may play a crucial role in the data analysis. We develop a method that addresses the problem of selecting a finite and small set of short intervals (or instants) able to capture the information needed to predict a response variable from multivariate functional data using Support Vector Regression (SVR).In addition to improving interpretability, storage requirements, and monitoring cost, feature selection can potentially reduce overfitting by mitigating data autocorrelation. We propose a continuous optimization algorithm to fit the SVR parameters and select intervals and instants. Our approach takes advantage of the functional nature of the data by formulating a new bilevel optimization problem that integrates selection of intervals and instants, tuning of some key SVR parameters and fitting the SVR. We illustrate the usefulness of our proposal in some benchmark data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.