Cosmo-Pop

Our research focuses on developing generative models to understand galaxy formation and evolution, enabling precise cosmology.

Research Highlights

Cosmic star formation rate density

pop-cosmos: Star formation over 12 Gyr from generative modelling of a deep infrared-selected galaxy catalogue

In this work, we use our pop-cosmos generative model for the galaxy population, trained on a deep IR selected catalogue from COSMOS2020, to investigate the cosmic star formation history of the Universe. We find that the cosmic star formation rate density peaks at z = 1.3, approximately 600 Myr later than previously thought. We also show that colour-based quiescent galaxy selection suffers up to 20% contamination by dusty star-forming galaxies, while star formation rate-based selection reveals negligible quiescent galaxies below 109.5 M at z < 1. Massive galaxies (1010—11 M) quench on ~1 Gyr timescales, with AGN activity peaking at the transition between star-forming and quiescent states, suggesting feedback operates in a critical regime during this phase transition.[arXiv:2509.20430][ads][Zenodo: Mock Catalog (v1.1.0)]

Color-redshift relations

pop-cosmos: Insights from generative modeling of a deep, infrared-selected galaxy population

We present an extended pop-cosmos generative model for galaxy evolution to z~6 trained on 26-band COSMOS2020 photometric distributions. The framework utilizes score-based diffusion models to parameterize 16-dimensional prior over stellar population synthesis (SPS) parameters and depth-dependent photometric uncertainties. Model validation demonstrates concordance with empirical scaling relations including stellar mass functions, the star-forming main sequence, and gas-phase/stellar metallicity-mass correlations. Analysis reveals enhanced mid-infrared AGN activity correlates with elevated star formation rates. Bayesian inference on COSMOS2020 sources yields photometric redshift estimates with negligible bias, scatter, and outlier fraction. Performance metrics establish accuracy sufficient for forward-modeling and parameter inference for Stage IV cosmological surveys. We provide public access to the trained model, synthetic galaxy catalogs, and posterior distributions of redshift and SPS parameters for the COSMOS2020 sample. [arXiv:2506.12122][ads][doi][Zenodo: Mock Catalog (v1.0.1)][Zenodo: MCMC chains (v2.1.1)][Software]

COSMOS2020 filter transmission curves

pop-cosmos: A Comprehensive Picture of the Galaxy Population from COSMOS Data

We present pop-cosmos: A comprehensive model of galaxy populations calibrated to over 140,000 COSMOS survey galaxies, using a forward modeling approach that combines flexible score-based diffusion modeling with stellar population synthesis to jointly fit population and observational data models. The resulting calibrated model provides robust predictions on key astrophysical relationships including mass functions, mass-metallicity relations, star formation sequences, and dust attenuation across broad redshift ranges while accounting for observational complexities like parameter degeneracies and selection effects. [arXiv:2402.00930][ads][doi]

Photo-z versus spec-z

pop-cosmos: Scaleable Inference of Galaxy Properties and Redshifts with a Data-driven Population Model

We present an efficient Bayesian method for estimating photometric redshifts and galaxy properties using our pretrained "pop-cosmos" population model calibrated with photometric data. Our model employs a score-based diffusion approach for 16 stellar population synthesis parameters and includes detailed nebular emission modeling. Using GPU-accelerated sampling and a neural network emulator (Speculator), we analyzed 292,300 COSMOS2020 galaxies—three times more than previously possible—achieving minimal bias (~10⁻⁴), high accuracy (σMAD = 0.007), and low outlier rates (1.6%) when compared to spectroscopic data. Our results outperformed established methods like EAZY and LePhare while demonstrating good generalization to fainter galaxies, processing at 15 GPU-seconds per galaxy and establishing a framework for upcoming Stage IV galaxy surveys.[arXiv:2406.19437][ads][doi][Zenodo: MCMC chains (v1.3.0)]

Forward modeled n(z) for GAMA

Forward Modeling of Galaxy Populations for Cosmological Redshift Distribution Inference

This paper introduces a forward-modeling framework for estimating galaxy redshift distributions that combines population models, stellar population synthesis, and data characterization without requiring spectroscopic calibration. Testing on GAMA and VVDS surveys demonstrated accurate redshift predictions (bias of Δz ≲ 0.003 and Δz ≃ 0.01 respectively), sufficient for Stage III cosmological surveys with potential for Stage IV applications. [arXiv:2207.05819][ads][doi]

Speculator network

SPECULATOR: Emulating Stellar Population Synthesis for Fast and Accurate Galaxy Spectra and Photometry

A compact framework that accelerates stellar population synthesis (SPS) computations 103-104× while maintaining percent-level accuracy. Using PCA for finding spectral basis functions and neural networks to predict coefficients from SPS parameters, Speculator enables differentiable SPS with GPU compatibility for efficient galaxy modeling from observational data.[arXiv:1911.11778][ads][doi][Software]

Probabilistic graphical model for photo-z

Hierarchical Bayesian Inference of Photometric Redshifts with Stellar Population Synthesis Models

We introduce a Bayesian hierarchical framework combining stellar population synthesis models with neural emulators to analyze galaxy survey data. Our approach integrates spectral energy distribution modeling with population and noise models to characterize galaxy properties while separating sources of bias and uncertainty. Testing on COSMOS field galaxies with 26-band photometry demonstrates photometric redshift accuracy competitive with existing catalogs. This computationally efficient method addresses calibration issues for emission-line luminosities and offers a promising approach for meeting accuracy requirements of future cosmological surveys while potentially connecting cosmology and galaxy evolution studies.[arXiv:2207.07673][ads][doi]

Quantile-quantile plotting

Data-space Validation of High-dimensional Models by Comparing Sample Quantiles

We propose a method to assess high-dimensional model performance using sample-based quantile comparisons of observables. For high-dimensional data, we recommend projecting onto principal axes before comparison. After demonstrating our approach with 2D examples, we evaluate a score-based diffusion model for galaxy photometry (pop-cosmos) by comparing its predictions against nine broadband colors. This technique offers a broadly applicable approach for validating nonparametric population models and comparing sample sets across domains.[arXiv:2402.00930][ads][doi][Software]

Our Team

Justin Alsing

Dr. Justin Alsing

Assistant Professor

Stockholm University

Sinan Deger

Dr. Sinan Deger

Research Associate

University of Cambridge

Joy Gong

Ms. Joy Gong

MPhil Student

University of Cambridge

Anik Halder

Dr. Anik Halder

Research Associate

University of Cambridge

Gurjeet Jagwani

Mr. Gurjeet Jagwani

Machine Learning Research Software Engineer

University of Cambridge

Boris Leistedt

Dr. Boris Leistedt

Lecturer

Imperial College London

Joel Leja

Dr. Joel Leja

Associate Professor

Pennsylvania State University

Daniel Mortlock

Dr. Daniel Mortlock

Professor

Imperial College London

Hiranya Peiris (PI)

Dr. Hiranya Peiris (PI)

Professor of Astrophysics (1909)

University of Cambridge

Stephen Thorp

Dr. Stephen Thorp

Research Associate

University of Cambridge

Madalina Tudorache

Dr. Madalina Tudorache

Research Associate

University of Cambridge

Benedict Van den Bussche

Mr. Benedict Van den Bussche

PhD Student

University of Cambridge