We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris Simulation to train and test various sophisticated machine learning algorithms. By using only essential dark matter halo physical properties and no merger history, our model predicts the gas mass, stellar mass, black hole mass, star formation rate, g − r color, and stellar metallicity fairly robustly. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon a solid hydrodynamical simulation. The promising reproduction of the listed galaxy properties demonstrably place ML as a promising and a signicantly more computationally efficient tool to study small-scale structure formation. We find that ML mimics a full-blown hydrodynamical simulation surprisingly well in a computation time of mere minutes. The population of galaxies simulated by ML, while not numerically identical to Illustris, is statistically robust and physically consistent with Illustris galaxies and follows the same fundamental observational constraints. Machine learning offers an intriguing and promising technique to create quick mock galaxy catalogs in the future.
We present a new exploratory framework to model galaxy formation and evolution in a hierarchical universe by using machine learning (ML). Our motivations are twofold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analyzing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various sophisticated machine learning algorithms (k-Nearest Neighbors, decision trees, random forests and extremely randomized trees). By using only essential dark matter halo physical properties for haloes of M > 10 12 M and a partial merger tree, our model predicts the hot gas mass, cold gas mass, bulge mass, total stellar mass, black hole mass and cooling radius at z = 0 for each central galaxy in a dark matter halo for the Millennium run. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon SAMs and demonstrably place ML as a promising and a computationally efficient tool to study small-scale structure formation.
The clustered nature of star formation should produce a high degree of structure in the combined phase and chemical space in the Galactic disk. To date, observed structure of this kind has been mostly limited to bound clusters and moving groups. In this paper we present a new dynamical model of the Galactic disk that takes into account the clustered nature of star formation. This model predicts that the combined phase and chemical space is rich in substructure, and that this structure is sensitive to both the precise nature of clustered star formation and the large-scale properties of the Galaxy. The model self-consistently evolves 4 billion stars over the last 5 Gyr in a realistic potential that includes an axisymmetric component, a bar, spiral arms, and giant molecular clouds (GMCs). All stars are born in clusters with an observationally-motivated range of initial conditions. As direct N-body calculations for billions of stars is computationally infeasible, we have developed a method of initializing star cluster particles to mimic the effects of direct N-body effects, while the actual orbit integrations are treated as test particles within the analytic potential. We demonstrate that the combination of chemical and phase space information is much more effective at identifying truly co-natal populations than either chemical or phase space alone. Furthermore, we show that co-moving pairs of stars are very likely to be co-natal if their velocity separation is < 2 km s −1 and their metallicity separation is < 0.05 dex. The results presented here bode well for harnessing the synergies between Gaia and spectroscopic surveys to reveal the assembly history of the Galactic disk.
It is challenging to reliably identify stars that were born together outside of actively star-forming regions and bound stellar systems. However, co-natal stars should be present throughout the Galaxy, and their demographics can shed light on the clustered nature of star formation and the dynamical state of the disk. In previous work we presented a set of simulations of the Galactic disk that followed the clustered formation and dynamical evolution of 4 billion individual stars over the last 5 Gyr. The simulations predict that a high fraction of co-moving stars with physical and 3D velocity separation of ∆r < 20 pc and ∆v < 1.5 km s −1 are co-natal. In this Letter, we use Gaia DR2 and LAMOST DR4 data to identify and study co-moving pairs. We find that the distribution of relative velocities and separations of pairs in the data is in good agreement with the predictions from the simulation. We identify 111 co-moving pairs in the Solar neighborhood with reliable astrometric and spectroscopic measurements. These pairs show a strong preference for having similar metallicities when compared to random field pairs. We therefore conclude that these pairs were very likely born together. The simulations predict that co-natal pairs originate preferentially from high-mass and relatively young (< 1 Gyr) star clusters. Gaia will eventually deliver well-determined metallicities for the brightest stars, enabling the identification of thousands of co-natal pairs due to disrupting star clusters in the solar neighborhood.
The Galactic disk is expected to be spatially and kinematically clustered on many scales due to both star formation and the Galactic potential. In this work we calculate the spatial and kinematic two-point correlation functions (TPCF) using a sample of 1.7 × 106 stars with radial velocities from Gaia DR2. Clustering is detected on spatial scales of 1–300 pc and a velocity scale of 15 km s−1. After removing bound structures, the data have a power-law index of γ ≈ −1 for 1 pc < Δr < 100 pc and γ ≲ −1.5 for Δr > 100 pc. We interpret these results with the aid of a star-by-star simulation of the Galaxy, in which stars are born in clusters orbiting in a realistic potential that includes spiral arms, a bar, and giant molecular clouds. We find that the simulation largely agrees with the observations at most spatial and kinematic scales. In detail, the TPCF in the simulation is shallower than the data at ≲20 pc scales, and steeper than the data at ≳30 pc. We also find a persistent clustering signal in the kinematic TPCF for the data at large Δv (>5 km s−1) that is not present in the simulations. We speculate that this mismatch between observations and simulations may be due to two processes: hierarchical star formation and transient spiral arms. We also predict that the addition of ages and metallicities measured with a precision of 50% and 0.05 dex, respectively, will enhance the clustering signal beyond current measurements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.