Obtaining accurate photometric redshift (photo-z) estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce photo-z estimations, there has been a shift towards using machine learning techniques. However, there has not been as much of a focus on how well different machine learning methods scale or perform with the ever-increasing amounts of data being produced. Here, we introduce a benchmark designed to analyse the performance and scalability of different supervised machine learning methods for photo-z estimation. Making use of the Sloan Digital Sky Survey (SDSS - DR12) dataset, we analysed a variety of the most used machine learning algorithms. By scaling the number of galaxies used to train and test the algorithms up to one million, we obtained several metrics demonstrating the algorithms’ performance and scalability for this task. Furthermore, by introducing a new optimisation method, time-considered optimisation, we were able to demonstrate how a small concession of error can allow for a great improvement in efficiency. From the algorithms tested we found that the Random Forest performed best with a mean squared error, MSE = 0.0042; however, as other algorithms such as Boosted Decision Trees and k-Nearest Neighbours performed very similarly, we used our benchmarks to demonstrate how different algorithms could be superior in different scenarios. We believe benchmarks like this will become essential with upcoming surveys, such as the Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST), which will capture billions of galaxies requiring photometric redshifts.
Knowing the redshift of galaxies is one of the first requirements of many cosmological experiments, and as it’s impossible to perform spectroscopy for every galaxy being observed, photometric redshift (photo-z) estimations are still of particular interest. Here, we investigate different deep learning methods for obtaining photo-z estimates directly from images, comparing these with ‘traditional’ machine learning algorithms which make use of magnitudes retrieved through photometry. As well as testing a convolutional neural network (CNN) and inception-module CNN, we introduce a novel mixed-input model which allows for both images and magnitude data to be used in the same model as a way of further improving the estimated redshifts. We also perform benchmarking as a way of demonstrating the performance and scalability of the different algorithms. The data used in the study comes entirely from the Sloan Digital Sky Survey (SDSS) from which 1 million galaxies were used, each having 5-filter (ugriz) images with complete photometry and a spectroscopic redshift which was taken as the ground truth. The mixed-input inception CNN achieved a mean squared error (MSE) =0.009, which was a significant improvement ($30{{\ \rm per\ cent}}$) over the traditional Random Forest (RF), and the model performed even better at lower redshifts achieving a MSE = 0.0007 (a $50{{\ \rm per\ cent}}$ improvement over the RF) in the range of z < 0.3. This method could be hugely beneficial to upcoming surveys such as Euclid and the Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST) which will require vast numbers of photo-z estimates produced as quickly and accurately as possible.
While it is well established that the rate of COVID-19 infections can be suppressed by social distancing, environmental effects may also affect the infection rate. Here we consider the hypothesis that natural Ultra-Violet (UV) light (UVA and UVB) is reducing COVID-19 infections by enhancing human immunity through vitamin-D and/or by suppressing the virus itself. We focus on the United Kingdom (UK), by examining daily COVID-19 infections (F) and UV Index (UVI) data over the period March to October 2020. We find an intriguing empirical anti-correlation between log10(F) and log10(UVI) with a correlation coefficient of -0.933 over the period from 11 May (when the first UK lockdown ended) to 28 October 2020. The anti-correlation may reflect causation with other factors which are correlated with the UVI. Either way, UVI should be included in modelling the pattern of COVID-19 infections and deaths. We started quantifying such correlations in other countries and regions.
In this paper we investigate how implementing machine learning could improve the efficiency of the search for Trans-Neptunian Objects (TNOs) within Dark Energy Survey (DES) data when used alongside orbit fitting. The discovery of multiple TNOs that appear to show a similarity in their orbital parameters has led to the suggestion that one or more undetected planets, an as yet undiscovered “Planet 9”, may be present in the outer solar system. DES is well placed to detect such a planet and has already been used to discover many other TNOs. Here, we perform tests on eight different supervised machine learning algorithms, using a data set consisting of simulated TNOs buried within real DES noise data. We found that the best performing classifier was the Random Forest which, when optimized, performed well at detecting the rare objects. We achieve an area under the receiver operating characteristic (ROC) curve, (AUC) = 0.996 ± 0.001. After optimizing the decision threshold of the Random Forest, we achieve a recall of 0.96 while maintaining a precision of 0.80. Finally, by using the optimized classifier to pre-select objects, we are able to run the orbit-fitting stage of our detection pipeline five times faster.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.