Many scientific investigations of photometric galaxy surveys require redshift estimates, whose uncertainty properties are best encapsulated by photometric redshift (photo-z) posterior probability density functions (PDFs). A plethora of photo-z PDF estimation methodologies abound, producing discrepant results with no consensus on a preferred approach. We present the results of a comprehensive experiment comparing twelve photo-z algorithms applied to mock data produced forLarge Synoptic Survey Telescope The Rubin Observatory Legacy Survey of Space and Time (lsst) Dark Energy Science Collaboration (desc). By supplying perfect prior information, in the form of the complete template library and a representative training set as inputs to each code, we demonstrate the impact of the assumptions underlying each technique on the output photo-z PDFs. In the absence of a notion of true, unbiased photo-z PDFs, we evaluate and interpret multiple metrics of the ensemble properties of the derived photo-z PDFs as well as traditional reductions to photo-z point estimates. We report systematic biases and overall over/under-breadth of the photo-z PDFs of many popular codes, which may indicate avenues for improvement in the algorithms or implementations. Furthermore, we raise attention to the limitations of established metrics for assessing photo-z PDF accuracy; though we identify the conditional density estimate (CDE) loss as a promising metric of photo-z PDF performance in the case where true redshifts are available but true photo-z PDFs are not, we emphasize the need for science-specific performance metrics.
Approximate Bayesian Computation (ABC) is typically used when the likelihood is either unavailable or intractable but where data can be simulated under different parameter settings using a forward model. Despite the recent interest in ABC, highdimensional data and costly simulations still remain a bottleneck in some applications.There is also no consensus as to how to best assess the performance of such methods without knowing the true posterior. We show how a nonparametric conditional density estimation (CDE) framework, which we refer to as ABC-CDE, help address three nontrivial challenges in ABC: (i) how to efficiently estimate the posterior distribution with limited simulations and different types of data, (ii) how to tune and compare the performance of ABC and related methods in estimating the posterior itself, rather than just certain properties of the density, and (iii) how to efficiently choose among a large set of summary statistics based on a CDE surrogate loss. We provide theoretical and empirical evidence that justify ABC-CDE procedures that directly estimate and assess the posterior based on an initial ABC sample, and we describe settings where standard ABC and regression-based approaches are inadequate.
It is well known in astronomy that propagating non-Gaussian prediction uncertainty in photometric redshift estimates is key to reducing bias in downstream cosmological analyses. Similarly, likelihoodfree inference approaches, which are beginning to emerge as a tool for cosmological analysis, require the full uncertainty landscape of the parameters of interest given observed data. However, most machine learning (ML) based methods with open-source software target point prediction or classification, and hence fall short in quantifying uncertainty in complex regression and parameter inference settings such as the applications mentioned above. As an alternative to methods that focus on predicting the response (or parameters) y from features x, we provide nonparametric conditional density estimation (CDE) tools for approximating and validating the entire probability density p(y | x) given training data for x and y. This density approach offers a more nuanced accounting of uncertainty in situations with, e.g., nonstandard error distributions and multimodal or heteroskedastic response variables that are often present in astronomical data sets. As there is no one-size-fits-all CDE method, and the ultimate choice of model depends on the application and the training sample size, the goal of this work is to provide a comprehensive range of statistical tools and open-source software for nonparametric CDE and method assessment which can accommodate different types of settings -involving, e.g., mixedtype input from multiple sources, functional data, and image covariates -and which in addition can easily be fit to the problem at hand. Specifically, we introduce CDE software packages in Python and R based on four ML prediction methods adapted and optimized for CDE: NNKCDE, RFCDE, FlexCode, and DeepCDE. Furthermore, we present the cdetools package, which includes functions for computing a CDE loss function for model selection and tuning of parameters, together with diagnostic functions for computing posterior quantiles and coverage probabilities. We provide sample code in Python and R as well as examples of applications to photometric redshift estimation and likelihood-free cosmology via CDE.
We study the significance of non-Gaussianity in the likelihood of weak lensing shear two-point correlation functions, detecting significantly non-zero skewness and kurtosis in one-dimensional marginal distributions of shear two-point correlation functions in simulated weak lensing data. We examine the implications in the context of future surveys, in particular LSST, with derivations of how the non-Gaussianity scales with survey area. We show that there is no significant bias in one-dimensional posteriors of Ωm and σ8 due to the non-Gaussian likelihood distributions of shear correlations functions using the mock data (100 deg2). We also present a systematic approach to constructing approximate multivariate likelihoods with one-dimensional parametric functions by assuming independence or more flexible non-parametric multivariate methods after decorrelating the data points using principal component analysis (PCA). While the use of PCA does not modify the non-Gaussianity of the multivariate likelihood, we find empirically that the one-dimensional marginal sampling distributions of the PCA components exhibit less skewness and kurtosis than the original shear correlation functions. Modeling the likelihood with marginal parametric functions based on the assumption of independence between PCA components thus gives a lower limit for the biases. We further demonstrate that the difference in cosmological parameter constraints between the multivariate Gaussian likelihood model and more complex non-Gaussian likelihood models would be even smaller for an LSST-like survey. In addition, the PCA approach automatically serves as a data compression method, enabling the retention of the majority of the cosmological information while reducing the dimensionality of the data vector by a factor of ∼5.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.