An artificial neural network (ANN) is an automatic way of capturing linear and nonlinear correlations, spatial and other structural dependence among features. This machine performs well in many application areas such as classification and prediction from magnetic resonance imaging, spatial data and computer vision tasks. Most commonly used ANNs assume the availability of large training data compared to the dimension of feature vector. However, in modern applications, as mentioned above, the training sample sizes are often low, and may be even lower than the dimension of feature vector. In this paper, we consider a single layer ANN classification model that is suitable for analyzing high-dimensional low sample-size (HDLSS) data. We investigate the theoretical properties of the sparse group lasso regularized neural network and show that under mild conditions, the classification risk converges to the optimal Bayes classifier’s risk (universal consistency). Moreover, we proposed a variation on the regularization term. A few examples in popular research fields are also provided to illustrate the theory and methods.
Generalized additive model is a powerful statistical learning and predictive modeling tool that has been applied in a wide range of applications. The need of high‐dimensional additive modeling is eminent in the context of dealing with high throughput data such as genetics data analysis. In this article, we studied a two‐step selection and estimation method for ultrahigh‐dimensional generalized additive models. The first step applies group lasso on the expanded bases of the functions. With high probability this selects all nonzero functions without having too much over selection. The second step uses adaptive group lasso with any initial estimators, including the group lasso estimator, that satisfies some regular conditions. The adaptive group lasso estimator is shown to be selection consistent with improved convergence rates. Tuning parameter selection is also discussed and shown to select the true model consistently under generalized information criterion procedure. The theoretical properties are supported by extensive numerical study.
Initial insurance losses are often reported with a textual description of the claim. The claims manager must determine the adequate case reserve for each known claim. In this paper, we present a framework for predicting the amount of loss given a textual description of the claim using a large number of words found in the descriptions. Prior work has focused on classifying insurance claims based on keywords selected by a human expert, whereas in this paper the focus is on loss amount prediction with automatic word selection. In order to transform words into numeric vectors, we use word cosine similarities and word embedding matrices. When we consider all unique words found in the training dataset and impose a generalised additive model to the resulting explanatory variables, the resulting design matrix is high dimensional. For this reason, we use a group lasso penalty to reduce the number of coefficients in the model. The scalable, analytical framework proposed provides for a parsimonious and interpretable model. Finally, we discuss the implications of the analysis, including how the framework may be used by an insurance company and how the interpretation of the covariates can lead to significant policy change. The code can be found in the TAGAM R package (github.com/scottmanski/TAGAM).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.