We present a semi‐supervised method for photometric supernova typing. Our approach is to first use the non‐linear dimension reduction technique diffusion map to detect structure in a data base of supernova light curves and subsequently employ random forest classification on a spectroscopically confirmed training set to learn a model that can predict the type of each newly observed supernova. We demonstrate that this is an effective method for supernova typing. As supernova numbers increase, our semi‐supervised method efficiently utilizes this information to improve classification, a property not enjoyed by template‐based methods. Applied to supernova data simulated by Kessler et al. to mimic those of the Dark Energy Survey, our methods achieve (cross‐validated) 95 per cent Type Ia purity and 87 per cent Type Ia efficiency on the spectroscopic sample, but only 50 per cent Type Ia purity and 50 per cent efficiency on the photometric sample due to their spectroscopic follow‐up strategy. To improve the performance on the photometric sample, we search for better spectroscopic follow‐up procedures by studying the sensitivity of our machine‐learned supernova classification on the specific strategy used to obtain training sets. With a fixed amount of spectroscopic follow‐up time, we find that, despite collecting data on a smaller number of supernovae, deeper magnitude‐limited spectroscopic surveys are better for producing training sets. For supernova Ia (II‐P) typing, we obtain a 44 per cent (1 per cent) increase in purity to 72 per cent (87 per cent) and 30 per cent (162 per cent) increase in efficiency to 65 per cent (84 per cent) of the sample using a 25th (24.5th) magnitude‐limited survey instead of the shallower spectroscopic sample used in the original simulations. When redshift information is available, we incorporate it into our analysis using a novel method of altering the diffusion map representation of the supernovae. Incorporating host redshifts leads to a 5 per cent improvement in Type Ia purity and 13 per cent improvement in Type Ia efficiency.
Surface enhanced Raman spectroscopy (SERS) is a rapid and highly sensitive spectroscopic technique that has the potential to measure chemical changes in bacterial cell surface in response to environmental changes. The objective of this study was to determine whether SERS had sufficient resolution to differentiate closely related bacteria within a genus grown on solid and liquid medium, and a single Arthrobacter strain grown in multiple chromate concentrations. Fourteen closely related Arthrobacter strains, based on their 16S rRNA gene sequences, were used in this study. After performing principal component analysis in conjunction with Linear Discriminant Analysis, we used a novel, adapted crossvalidation method, which more faithfully models the classification of spectra. All fourteen strains could be classified with up to 97% accuracy. The hierarchical trees comparing SERS spectra from the liquid and solid media datasets were different. Additionally, hierarchical trees created from the Raman data were different from those obtained using 16S rRNA gene sequences (a phylogenetic measure). A single bacterial strain grown on solid media culture with three different chromate levels also showed significant spectral distinction at discrete points identified by the new Elastic Net regularized regression method demonstrating the ability of SERS to detect environmentally induced changes in cell surface composition. This study demonstrates that SERS is effective in distinguishing between a large number of very closely related Arthrobacter strains and could be a valuable tool for rapid monitoring and characterization of phenotypic variations in a single population in response to environmental conditions.
We review current methods for building point spread function (PSF)‐matching kernels for the purposes of image subtraction or co‐addition. Such methods use a linear decomposition of the kernel on a series of basis functions. The correct choice of these basis functions is fundamental to the efficiency and effectiveness of the matching – the chosen bases should represent the underlying signal using a reasonably small number of shapes, and/or have a minimum number of user‐adjustable tuning parameters. We examine methods whose bases comprise multiple Gauss–Hermite polynomials, as well as a form‐free basis composed of delta‐functions. Kernels derived from delta‐functions are unsurprisingly shown to be more expressive; they are able to take more general shapes and perform better in situations where sum‐of‐Gaussian methods are known to fail. However, due to its many degrees of freedom (the maximum number allowed by the kernel size) this basis tends to overfit the problem and yields noisy kernels having large variance. We introduce a new technique to regularize these delta‐function kernel solutions, which bridges the gap between the generality of delta‐function kernels and the compactness of sum‐of‐Gaussian kernels. Through this regularization we are able to create general kernel solutions that represent the intrinsic shape of the PSF‐matching kernel with only one degree of freedom, the strength of the regularization λ. The role of λ is effectively to exchange variance in the resulting difference image with variance in the kernel itself. We examine considerations in choosing the value of λ, including statistical risk estimators and the ability of the solution to predict solutions for adjacent areas. Both of these suggest moderate strengths of λ between 0.1 and 1.0, although this optimization is likely data set dependent. This model allows for flexible representations of the convolution kernel that have significant predictive ability and will prove useful in implementing robust image subtraction pipelines that must address hundreds to thousands of images per night.
The lasso procedure is ubiquitous in the statistical and signal processing literature, and as such, is the target of substantial theoretical and applied research. While much of this research focuses on the desirable properties that lasso possesses-predictive risk consistency, sign consistency, correct model selection-all of it has assumes that the tuning parameter is chosen in an oracle fashion. Yet, this is impossible in practice. Instead, data analysts must use the data twice, once to choose the tuning parameter and again to estimate the model. But only heuristics have ever justified such a procedure. To this end, we give the first definitive answer about the risk consistency of lasso when the smoothing parameter is chosen via cross-validation. We show that under some restrictions on the design matrix, the lasso estimator is still risk consistent with an empirically chosen tuning parameter.
The lasso and related sparsity inducing algorithms have been the target of substantial theoretical and applied research. Correspondingly, many results are known about their behavior for a fixed or optimally chosen tuning parameter specified up to unknown constants. In practice, however, this oracle tuning parameter is inaccessible so one must use the data to select one. Common statistical practice is to use a variant of cross-validation for this task. However, little is known about the theoretical properties of the resulting predictions with such data-dependent methods. We consider the high-dimensional setting with random design wherein the number of predictors p grows with the number of observations n. Under typical assumptions on the data generating process, similar to those in the literature, we recover oracle rates up to a log factor when choosing the tuning parameter with cross-validation. Under weaker conditions, when the true model is not necessarily linear, we show that the lasso remains risk consistent relative to its linear oracle. We also generalize these results to the group lasso and square-root lasso and investigate the predictive and model selection performance of cross-validation via simulation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.