Cosmo-Pop
Our research focuses on developing generative models to understand galaxy formation and evolution, enabling precise cosmology.
Research Highlights
pop-cosmos: Star formation over 12 Gyr from generative modelling of a deep infrared-selected galaxy catalogue
In this work, we use our pop-cosmos generative model for the galaxy population, trained on a deep IR selected catalogue from COSMOS2020, to investigate the cosmic star formation history of the Universe. We find that the cosmic star formation rate density peaks at z = 1.3, approximately 600 Myr later than previously thought. We also show that colour-based quiescent galaxy selection suffers up to 20% contamination by dusty star-forming galaxies, while star formation rate-based selection reveals negligible quiescent galaxies below 109.5 M☉ at z < 1. Massive galaxies (1010—11 M☉) quench on ~1 Gyr timescales, with AGN activity peaking at the transition between star-forming and quiescent states, suggesting feedback operates in a critical regime during this phase transition.[arXiv:2509.20430][ads][Zenodo: Mock Catalog (v1.1.0)]
pop-cosmos: Insights from generative modeling of a deep, infrared-selected galaxy population
We present an extended pop-cosmos generative model for galaxy evolution to z~6 trained on 26-band COSMOS2020 photometric distributions. The framework utilizes score-based diffusion models to parameterize 16-dimensional prior over stellar population synthesis (SPS) parameters and depth-dependent photometric uncertainties. Model validation demonstrates concordance with empirical scaling relations including stellar mass functions, the star-forming main sequence, and gas-phase/stellar metallicity-mass correlations. Analysis reveals enhanced mid-infrared AGN activity correlates with elevated star formation rates. Bayesian inference on COSMOS2020 sources yields photometric redshift estimates with negligible bias, scatter, and outlier fraction. Performance metrics establish accuracy sufficient for forward-modeling and parameter inference for Stage IV cosmological surveys. We provide public access to the trained model, synthetic galaxy catalogs, and posterior distributions of redshift and SPS parameters for the COSMOS2020 sample. [arXiv:2506.12122][ads][doi][Zenodo: Mock Catalog (v1.0.1)][Zenodo: MCMC chains (v2.1.1)][Software]
pop-cosmos: A Comprehensive Picture of the Galaxy Population from COSMOS Data
We present pop-cosmos: A comprehensive model of galaxy populations calibrated to over 140,000 COSMOS survey galaxies, using a forward modeling approach that combines flexible score-based diffusion modeling with stellar population synthesis to jointly fit population and observational data models. The resulting calibrated model provides robust predictions on key astrophysical relationships including mass functions, mass-metallicity relations, star formation sequences, and dust attenuation across broad redshift ranges while accounting for observational complexities like parameter degeneracies and selection effects. [arXiv:2402.00930][ads][doi]
pop-cosmos: Scaleable Inference of Galaxy Properties and Redshifts with a Data-driven Population Model
We present an efficient Bayesian method for estimating photometric redshifts and galaxy properties using our pretrained "pop-cosmos" population model calibrated with photometric data. Our model employs a score-based diffusion approach for 16 stellar population synthesis parameters and includes detailed nebular emission modeling. Using GPU-accelerated sampling and a neural network emulator (Speculator), we analyzed 292,300 COSMOS2020 galaxies—three times more than previously possible—achieving minimal bias (~10⁻⁴), high accuracy (σMAD = 0.007), and low outlier rates (1.6%) when compared to spectroscopic data. Our results outperformed established methods like EAZY and LePhare while demonstrating good generalization to fainter galaxies, processing at 15 GPU-seconds per galaxy and establishing a framework for upcoming Stage IV galaxy surveys.[arXiv:2406.19437][ads][doi][Zenodo: MCMC chains (v1.3.0)]
Forward Modeling of Galaxy Populations for Cosmological Redshift Distribution Inference
This paper introduces a forward-modeling framework for estimating galaxy redshift distributions that combines population models, stellar population synthesis, and data characterization without requiring spectroscopic calibration. Testing on GAMA and VVDS surveys demonstrated accurate redshift predictions (bias of Δz ≲ 0.003 and Δz ≃ 0.01 respectively), sufficient for Stage III cosmological surveys with potential for Stage IV applications. [arXiv:2207.05819][ads][doi]
SPECULATOR: Emulating Stellar Population Synthesis for Fast and Accurate Galaxy Spectra and Photometry
A compact framework that accelerates stellar population synthesis (SPS) computations 103-104× while maintaining percent-level accuracy. Using PCA for finding spectral basis functions and neural networks to predict coefficients from SPS parameters, Speculator enables differentiable SPS with GPU compatibility for efficient galaxy modeling from observational data.[arXiv:1911.11778][ads][doi][Software]
Hierarchical Bayesian Inference of Photometric Redshifts with Stellar Population Synthesis Models
We introduce a Bayesian hierarchical framework combining stellar population synthesis models with neural emulators to analyze galaxy survey data. Our approach integrates spectral energy distribution modeling with population and noise models to characterize galaxy properties while separating sources of bias and uncertainty. Testing on COSMOS field galaxies with 26-band photometry demonstrates photometric redshift accuracy competitive with existing catalogs. This computationally efficient method addresses calibration issues for emission-line luminosities and offers a promising approach for meeting accuracy requirements of future cosmological surveys while potentially connecting cosmology and galaxy evolution studies.[arXiv:2207.07673][ads][doi]
Data-space Validation of High-dimensional Models by Comparing Sample Quantiles
We propose a method to assess high-dimensional model performance using sample-based quantile comparisons of observables. For high-dimensional data, we recommend projecting onto principal axes before comparison. After demonstrating our approach with 2D examples, we evaluate a score-based diffusion model for galaxy photometry (pop-cosmos) by comparing its predictions against nine broadband colors. This technique offers a broadly applicable approach for validating nonparametric population models and comparing sample sets across domains.[arXiv:2402.00930][ads][doi][Software]
Our Team
Dr. Justin Alsing
Assistant Professor
Stockholm University
Dr. Sinan Deger
Research Associate
University of Cambridge
Ms. Joy Gong
MPhil Student
University of Cambridge
Dr. Anik Halder
Research Associate
University of Cambridge
Mr. Gurjeet Jagwani
Machine Learning Research Software Engineer
University of Cambridge
Dr. Boris Leistedt
Lecturer
Imperial College London
Dr. Joel Leja
Associate Professor
Pennsylvania State University
Dr. Daniel Mortlock
Professor
Imperial College London
Dr. Hiranya Peiris (PI)
Professor of Astrophysics (1909)
University of Cambridge
Dr. Stephen Thorp
Research Associate
University of Cambridge
Dr. Madalina Tudorache
Research Associate
University of Cambridge
Mr. Benedict Van den Bussche
PhD Student
University of Cambridge