This is the data release for Colloms et al. 2025 (arXiv, dcc). In this work we demonstrate the use of normalising flows for emulation of population synthesis simulations and continuous inference of the simulation inputs, natal spin, common envelope efficiency, and the relative rates between five formation channels.
This data release includes hyperposterior samples from our discrete and continuous inference, the normalising flow models used for the analysis, the processed gravitational wave event samples, and additional auxiliary data for the remaining paper results.
The analysis was produced using the updated AMAZE framework, which can be found in this git repository. The results were produced with this version of the code. The population synthesis models used to train the normalising flows were initially produced for Zevin et al. (2021) and are contained in this data release.
Data included:
- `inference_samples.tar.gz` contains the hyperposterior samples from our three inference results: continuous inference with normalising flows, discrete inference with normalising flows, and discrete inference with KDEs. Each inference result contains five hdf5 files, each for a different run instance with a different random seed, which were combined for the published result.
- `cont_GWTC3/` contains the continuous result files with the natal spin and common envelope efficiency and the underlying branching fraction samples
- `discrete_GWTC3/flow/` contains the discrete result files using normalising flows
- `discrete_GWTC3/KDEs/` contains the discrete result files using KDEs
Within each hdf5 file, the key:
-
- ‘model_selection/samples’ contains the hyperposterior samples for natal spin, common envelope efficiency, and the five underlying branching fractions inferred (for the discrete results, natal spin common and envelope efficiency samples are represented by the model index).
- ‘model_selection/obsdata’ contains the combined GW posterior samples of chirp mass, mass ratio, effective inspiral spin, and redshift for each event used in the inference.
- ‘model_selection/lnprob’ contains the log probability of each hyperposterior sample.
- ‘model_selection/raw_samples’ contains the raw MCMC samples without the flooring to a particular model index for the discrete result samples. These are identical to ‘model_selection/samples’ for the continuous inference.
This also includes
-
- `cont_detectable_GWTC3/`containing the hyperposterior samples for natal spin, common envelope efficiency, and the detectable branching fractions with continuous inference.
- `flow_models.tar.gz` contains the normalising flow models (as pytorch version 1.12.1 Model objects), the training and validation losses for each training epoch, the mapping constants used for the initial transformation of the training data, and a config file with the architecture for each normalising flow. We include these data products for each formation channel: common envelope (CE), chemically homogeneous evolution (CHE), globular clusters (GC), nuclear star clusters (NSC), and stable mass transfer (SMT).
- `{channel}.pt` is the trained normalising flow model used in the analysis, as a pytorch model. These may be loaded as an Nflow objects within the AMAZE framework.
- `{channel}_loss_history.csv` contains the training epoch number, training loss, validation loss, and learning rate at each epoch of training for each normalising flow.
- `{channel}_mappings.npy` contains the constants used in the logistic mapping for the chirp mass, mass ratio, and redshift samples for each channel. See Colloms et al. Appendix A for details of how these are used.
- We also include `flows_mapping.json` as a human readable version of the mappings
- `flowconfig.json` contains the network architecture (number of transforms, number of neurons per layer, and number of spline bins) used for each normalising flow.
- `gw_events.tar.gz` contains the posterior samples from the GWTC-2.1 and GWTC-3 data releases. Each event contains samples of chirp mass, mass ratio, effective inspiral spin, and redshift, along with a prior value calculated for each sample `p_theta_jcb`. These samples were created with the notebook process_GWTC_data.ipynb.
- `plot_data.tar.gz` contains auxiliary data used for plotting the samples drawn from the normalising flow and KDE models, and the log likelihood ratio between the normalising flows and the KDEs.
- `dataspace_samps.hdf5`contains samples from the normalising flow used to make Figure 5, and samples from parametric results drawn from the default models used in Abbott et al. (2022). The normalising flow samples are stored in `flow_samps/{channel}`, where the number of samples for each channel is representative of the inferred branching fractions from continuous inference.
- `emulation_samps.hdf5` contains normalising flow and KDE samples from the CE channel model representations at a natal spin value of 0 and a CE efficiency value of 2, used to make Figure 1.
- `test_flow_samps.hdf5` contains samples from the CE flow model trained with this test population removed, at natal spin value of 0.1 and a CE efficiency value of 1, used to make Figure 2.
- `KLs.json` contains the Kullbeck – Liebler divergence values between the normalising flows and the KDEs at for each channel, at each of the population synthesis model points.