22 datasets found

Synthetic datasets of the UK Biobank cohort
zenodo.org
data.niaid.nih.gov
bin, csv, pdf, zip
Updated Sep 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio Gasparrini; Antonio Gasparrini; Jacopo Vanoli; Jacopo Vanoli (2025). Synthetic datasets of the UK Biobank cohort [Dataset]. http://doi.org/10.5281/zenodo.13983170
Explore at:
bin, csv, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13983170
Dataset updated
Sep 17, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonio Gasparrini; Antonio Gasparrini; Jacopo Vanoli; Jacopo Vanoli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository stores synthetic datasets derived from the database of the UK Biobank (UKB) cohort.

The datasets were generated for illustrative purposes, in particular for reproducing specific analyses on the health risks associated with long-term exposure to air pollution using the UKB cohort. The code used to create the synthetic datasets is available and documented in a related GitHub repo, with details provided in the section below. These datasets can be freely used for code testing and for illustrating other examples of analyses on the UKB cohort.

The synthetic data have been used so far in two analyses described in related peer-reviewed publications, which also provide information about the original data sources:

Vanoli J, et al. Long-term associations between time-varying exposure to ambient PM2.5 and mortality: an analysis of the UK Biobank. Epidemiology. 2025;36(1):1-10. DOI: 10.1097/EDE.0000000000001796 [freely available here, with code provided in this GitHub repo]

Vanoli J, et al. Confounding issues in air pollution epidemiology: an empirical assessment with the UK Biobank cohort. International Journal of Epidemiology. 2025;54(5):dyaf163. DOI: 10.1093/ije/dyaf163 [freely available here, with code provided in this GitHub repo]

Note: while the synthetic versions of the datasets resemble the real ones in several aspects, the users should be aware that these data are fake and must not be used for testing and making inferences on specific research hypotheses. Even more importantly, these data cannot be considered a reliable description of the original UKB data, and they must not be presented as such.

The work was supported by the Medical Research Council-UK (Grant ID: MR/Y003330/1).

Content

The series of synthetic datasets (stored in two versions with csv and RDS formats) are the following:

synthbdcohortinfo: basic cohort information regarding the follow-up period and birth/death dates for 502,360 participants.

synthbdbasevar: baseline variables, mostly collected at recruitment.

synthpmdata: annual average exposure to PM_2.5 for each participant reconstructed using their residential history.

synthoutdeath: death records that occurred during the follow-up with date and ICD-10 code.

In addition, this repository provides these additional files:

codebook: a pdf file with a codebook for the variables of the various datasets, including references to the fields of the original UKB database.

asscentre: a csv file with information on the assessment centres used for recruitment of the UKB participants, including code, names, and location (as northing/easting coordinates of the British National Grid).

Countries_December_2022_GB_BUC: a zip file including the shapefile defining the boundaries of the countries in Great Britain (England, Wales, and Scotland), used for mapping purposes [source].

Generation of the synthetic data

The datasets resemble the real data used in the analysis, and they were generated using the R package synthpop (www.synthpop.org.uk). The generation process involves two steps, namely the synthesis of the main data (cohort info, baseline variables, annual PM_2.5 exposure) and then the sampling of death events. The R scripts for performing the data synthesis are provided in the GitHub repo (subfolder Rcode/synthcode).

The first part merges all the data, including the annual PM_2.5 levels, into a single wide-format dataset (with a row for each subject), generates a synthetic version, adds fake IDs, and then extracts (and reshapes) the single datasets. In the second part, a Cox proportional hazard model is fitted on the original data to estimate risks associated with various predictors (including the main exposure represented by PM_2.5), and then these relationships are used to simulate death events in each year. Details on the modelling aspects are provided in the article.

This process guarantees that the synthetic data do not hold specific information about the original records, thus preserving confidentiality. At the same time, the multivariate distribution and correlation across variables, as well as the mortality risks, resemble those of the original data, so the results of descriptive and inferential analyses are similar to those in the original assessments. However, as noted above, the data are used only for illustrative purposes, and they must not be used to test other research hypotheses.
Linkage-Disequilibrium (LD) matrices for six continental ancestry groups...
zenodo.org
application/gzip
Updated Jan 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shadi Zabad; Shadi Zabad (2025). Linkage-Disequilibrium (LD) matrices for six continental ancestry groups from the UK Biobank [Dataset]. http://doi.org/10.5281/zenodo.14614207
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14614207
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shadi Zabad; Shadi Zabad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains Linkage Disequilibrium (LD) matrices for six ancestry groups from the UK Biobank.

LD matrices record the SNP-by-SNP correlations in a given sample of individuals from the general population. In this case, we threshold the matrices so that we only record the correlations between variants in the same LD block (defined by LDetect). The continental ancestry groups are defined by the Pan-UKB initiative as:

EUR = European ancestry (N=362446)

CSA = Central/South Asian ancestry (N=8284)

AFR = African ancestry (N=6255)

EAS = East Asian ancestry (N=2700)

MID = Middle Eastern ancestry (N=1567)

AMR = Admixed American ancestry (N=987)

The sample sizes here are restricted to unrelated individuals in the UK Biobank. The matrices were computed using magenpy and quantized to int8 data type for better compressibility. The standard matrices (EUR.tar.gz, AFR.tar.gz, ...) contain pairwise correlations for 1.4 million HapMap3+ variants. For European samples, we also provide LD matrices that record pairwise correlations for up to 18 million variants (EUR_18m_variants.tar.gz)

For more details on how these matrices were computed, please consult our manuscript:

Towards whole-genome inference of polygenic scores with fast and memory-efficient algorithms
Shadi Zabad, Chirayu Anant Haryan, Simon Gravel, Sanchit Misra, Yue Li

To access these matrices, consult the codebase of magenpy, our custom python package with special data structures for processing these LD matrices.
GWAS summary statistics for Standing Height from the UK Biobank (5-fold...
zenodo.org
application/gzip
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shadi Zabad; Shadi Zabad (2024). GWAS summary statistics for Standing Height from the UK Biobank (5-fold cross-validation) [Dataset]. http://doi.org/10.5281/zenodo.14270953
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14270953
Dataset updated
Dec 3, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shadi Zabad; Shadi Zabad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains GWAS summary statistics for Standing Height in the UK Biobank.

The GWAS study used data from "White British" samples (N = 337225), which were randomly divided into 5 folds for the purposes of cross-validation. The upload contains, for each fold, GWAS summary statistics for the training and test set. The test summary statistics can be used to evaluate PRS models via pseudo-validation methods. Association testing was done with plink2.

The structure of the data is as follows:

train

fold_1

chr_1.PHENO1.glm.linear

chr_2.PHENO1.glm.linear

...

fold_2

fold_3

...

test

fold_1

fold_2

fold_3

...

For more details about the GWAS study, Quality Control (QC) criteria, or other information, please consult our publication:

Zabad, S., Gravel, S., & Li, Y. (2023). Fast and accurate Bayesian polygenic risk modeling with variational inference. The American Journal of Human Genetics, 110(5), 741–761. https://doi.org/10.1016/j.ajhg.2023.03.009

If you use this data in your work, please cite the publication above.
h
NURTuRE Chronic Kidney Disease (NCKD)
healthdatagateway.org
unknown
Updated Jun 14, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NURTuRE (2017). NURTuRE Chronic Kidney Disease (NCKD) [Dataset]. https://healthdatagateway.org/en/dataset/1396
Explore at:
unknownAvailable download formats
Dataset updated
Jun 14, 2017
Dataset authored and provided by
NURTuRE
License
https://saildatabank.com/data/apply-to-work-with-the-data/https://saildatabank.com/data/apply-to-work-with-the-data/
Description
The NURTuRE project was devised to create a national kidney biobank as recommended in the UK Renal Research Strategy 2016. Strategic Aims: To work towards achieving this NURTuRE will:

Create a national Kidney Bio Bank for collection and storage of biological samples from 3,000 CKD patients and up to 800 NS patients, to provide a strategic resource for fundamental and translational research.

Develop and implement proactive UK protocol driven cohort studies in CKD and NS to investigate determinants of and risk factors for clinically important adverse outcomes.

Engage patient cohorts, with consent to approach for any future research study. NURTuRE Objectives:

The provision of comprehensive clinical and laboratory data from cohort studies.

The provision of high quality bio-samples with centralised storage/retrieval.

To carry out core biomarker analysis of biopsy specimens in biofluids of all patients recruited and parallel assessment.

Follow-up specimen collection. First patient recruitment - By 31 June 2017 CKD - baseline and 100 % follow up collections, over 2 years NS: baseline and 20% follow up - over 3 years. Healthy Volunteers - baseline

Biological samples availability - Samples are available via the NURTuRE biobank - https://nurturebiobank.org/
Association of all hypermetropia and hypermetropia (low or moderate/high),...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phillippa M. Cumberland; Yanchun Bao; Pirro G. Hysi; Paul J. Foster; Christopher J. Hammond; Jugnoo S. Rahi (2023). Association of all hypermetropia and hypermetropia (low or moderate/high), by key socio-demographic factors. [Dataset]. http://doi.org/10.1371/journal.pone.0139780.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0139780.t004
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Phillippa M. Cumberland; Yanchun Bao; Pirro G. Hysi; Paul J. Foster; Christopher J. Hammond; Jugnoo S. Rahi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
No qualifications, State school examinations at 16 years of age (‘O’ levels), at 18 years (‘A’ levels) or University/other professional qualification+: Number of eyes;++ model adjusted for eye laterality, gender, age (continuous), educational qualification, accommodation tenure, ethnicity and test centre.Association of all hypermetropia and hypermetropia (low or moderate/high), by key socio-demographic factors.
DreamData
kaggle.com
zip
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RairoRapha (2025). DreamData [Dataset]. https://www.kaggle.com/datasets/thetraveller/dreambiome
Explore at:
zip(18982050 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
RairoRapha
Description
📊 Data Sources (Scientific & Real-World)

DreamBiome is built entirely on real human data — no synthetic, no invented corpora.
The system integrates three research-backed data pillars:

1. Dream Reports (DreamBank + Dryad RSOS)

140+ real dream narratives

Includes series such as jasmine1, norms-m, etc.

Used in cognitive science, psychology, and computational linguistics

Contains word counts, metadata, and (Dryad) numerical annotations

Sources:
- DreamBank (Hall & Van de Castle / UC Santa Cruz): https://dreambank.net/
- Dryad RSOS “textual dream analysis” dataset: https://doi.org/10.5061/dryad.4t880

2. Sleep Architecture (Sleep-EDF Hypnogram Database)

70+ professionally scored nights

30-second epochs labeled across W, N1, N2, N3, REM

Includes efficiency, total sleep time, awakenings, and REM%

Used to generate the “sleep arcs” (dominant stage per quarter-night)

Source:
- PhysioNet Sleep-EDF Expanded Dataset: https://physionet.org/content/sleep-edfx/1.0.0/

3. Epidemiological Context (Insomnia Prevalence)

Lightweight region-level insomnia statistics

Global, UK Biobank, and East Asia meta-estimates

Used to contextualize DreamBiome World generation

Sources:
- Ohayon (2002). Epidemiology of insomnia. Sleep Medicine Reviews.
- Lane et al. (2019). UK Biobank sleep data.
- Jiang et al. (2015). East Asia insomnia prevalence. Journal of Sleep Research.
Effect sizes for 200+ polygenic scores
figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Privé (2023). Effect sizes for 200+ polygenic scores [Dataset]. http://doi.org/10.6084/m9.figshare.14074760.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14074760.v2
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Florian Privé
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PGS-effects.csv.gz: vectors of effect sizes for 215 polygenic scores (PGS)- pred-cor: partial correlations of these PGS with the corresponding phenotypes, in eight ancestry groups from the UK Biobank- phenotype-description.xlsx: description of all phenotypes used in the study (30 were discarded due to very low prediction)-> these report the best prediction from penalized regression and LDpred2.We also provide these files separately for penalized regression (PLR) and LDpred2-auto (without using the test set).The effect size file for penalized regression is very small because vectors of effects are very sparse.Those are based on the UK Biobank data only.
24 genome-wide significant loci discovered in the metaUSAT multivariable...
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waheed Ul-Rahman Ahmed; Manal I. A. Patel; Michael Ng; James McVeigh; Krina Zondervan; Akira Wiberg; Dominic Furniss (2023). 24 genome-wide significant loci discovered in the metaUSAT multivariable meta-analysis of inguinal, femoral, umbilical, hiatus hernia in 57,418 cases and 287,090 controls in UK Biobank. [Dataset]. http://doi.org/10.1371/journal.pone.0272261.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0272261.t004
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Waheed Ul-Rahman Ahmed; Manal I. A. Patel; Michael Ng; James McVeigh; Krina Zondervan; Akira Wiberg; Dominic Furniss
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistically significant signals from the metaUSAT analysis are shown in the left-hand column. The central column shows the association p-values for those SNPs in the six original GWAS analyses, with the direction of effect indicated by a + or–sign. Candidate genes are those selected from the prioritised genes (using the four mapping strategies described previously for all GWAS-discovered loci) or genes in proximity as identified within the UCSC genome browser.
Data from: Brain Ages Derived from Different MRI Modalities are Associated...
zenodo.org
data.niaid.nih.gov
csv
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrei-Claudiu Roibu; Andrei-Claudiu Roibu; Stanislaw Adaszewski; Torsten Schindler; Stephen M. Smith; Stephen M. Smith; Ana I.L. Namburete; Ana I.L. Namburete; Frederik J. Lange; Frederik J. Lange; Stanislaw Adaszewski; Torsten Schindler (2025). Brain Ages Derived from Different MRI Modalities are Associated with Distinct Biological Phenotypes [Dataset]. http://doi.org/10.5281/zenodo.8110876
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8110876
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrei-Claudiu Roibu; Andrei-Claudiu Roibu; Stanislaw Adaszewski; Torsten Schindler; Stephen M. Smith; Stephen M. Smith; Ana I.L. Namburete; Ana I.L. Namburete; Frederik J. Lange; Frederik J. Lange; Stanislaw Adaszewski; Torsten Schindler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

Brain ageing is a highly variable, spatially and temporally heterogeneous process, marked by numerous structural and functional changes. These can cause discrepancies between individuals’ chronological age and the apparent age of their brain, as inferred from neuroimaging data. Machine learning models, and particularly Convolutional Neural Networks (CNNs), have proven adept in capturing patterns relating to ageing induced changes in the brain. The differences between the predicted and chronological ages, referred to as brain age deltas, have emerged as useful biomarkers for exploring those factors which promote accelerated ageing or resilience, such as pathologies or lifestyle factors. However, previous studies rely only on structural neuroimaging for predictions, overlooking potentially informative functional and microstructural changes. Here we show that multiple contrasts derived from different MRI modalities can predict brain age, each encoding bespoke brain ageing information. By using 3D CNNs and UK Biobank data, we found that 57 contrasts derived from structural, susceptibility-weighted, diffusion, and functional MRI can successfully predict brain age. For each contrast, different patterns of association with non-imaging phenotypes were found, resulting in a total of 191 unique, statistically significant associations. Furthermore, we found that ensembling data from multiple contrasts results in both higher prediction accuracies and stronger correlations to non-imaging measurements. Our results demonstrate that other 3D contrasts and modalities, which have not been considered so far for the task of brain age prediction, encode different information about the ageing brain. We envision our work as being the starting point for future investigations into the causal links underpinning the observed brain age deltas and non-imaging measurement associations. For instance, drug effects can be monitored, given that certain medications correlated with accelerated brain ageing. Furthermore, continued development of brain age models could facilitate their deployment in clinical trials for recruitment and monitoring, and hospitals for diagnostic and screening tasks.

Data Description

This dataset contains the full correlation results with all nIDPs in the UK Biobank. These are presented in datasets split by sex in Female and Male subjects. For easier data manipulation, two smaller datasets have also been made available, containing just those correlation which pass the False Discovery Rate (FDR) threshold.

As experiments were also conducted for ensembles using multiple contrasts, similar datasets are provided for those.

Finally, global datasets are also provided. These are the concatenation of the associations contained in the Male and Female datasets.

Paper & Code

The original paper for this article can be accessed here:

https://ieeexplore.ieee.org/abstract/document/10196736

To access the codes relevant for this project, please access the project GitHub Repos:

https://github.com/AndreiRoibu/AgeMapper

If using this work, please cite it based on the above paper, or using the following BibTex:

@inproceedings{roibu2023brain, title={Brain Ages Derived from Different MRI Modalities are Associated with Distinct Biological Phenotypes}, author={Roibu, Andrei-Claudiu and Adaszewski, Stanislaw and Schindler, Torsten and Smith, Stephen M and Namburete, Ana IL and Lange, Frederik J}, booktitle={2023 10th IEEE Swiss Conference on Data Science (SDS)}, pages={17--25}, year={2023}, organization={IEEE}, doi={10.1109/SDS57534.2023.00010} }

Data Access

The data for this project is freely available upon application at the UK Biobank. For more information regarding the individual nIDPs, please access the UK Biobank Showcase website at: https://biobank.ctsu.ox.ac.uk/showcase/search.cgi

Funding

ACR is supported by EPSRC Grant EP/S024093/1, F. Hoffmann-La Roche AG and a 2021 Industrial Fellowship offered by the Royal Commission for the Exhibition of 1851. SMS is supported by a Wellcome Trust Collaborative Award 215573/Z/19/Z. AILN is grateful for support from the Academy of Medical Sciences under the Springboard Awards scheme (SBF005/1136), and the Bill and Melinda Gates Foundation. FJL is supported by a Wellcome Trust Collaborative Award (215573/Z/19/Z). The WIN is supported by core funding from the Wellcome Trust (203139/Z/16/Z). The computational aspects were supported by the Wellcome Trust (203141/Z/16/Z) and the NIHR Oxford BRC. Corresponding authors: ACR (andreiroibu@icloud.com), SA (stanislaw.adaszewski@roche.com) and AILN (ana.namburete@cs.ox.ac.uk).
Four loci significantly associated with overlap hernia in 5,219 cases and...
figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waheed Ul-Rahman Ahmed; Manal I. A. Patel; Michael Ng; James McVeigh; Krina Zondervan; Akira Wiberg; Dominic Furniss (2023). Four loci significantly associated with overlap hernia in 5,219 cases and 26,095 controls in UK Biobank. [Dataset]. http://doi.org/10.1371/journal.pone.0272261.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0272261.t002
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Waheed Ul-Rahman Ahmed; Manal I. A. Patel; Michael Ng; James McVeigh; Krina Zondervan; Akira Wiberg; Dominic Furniss
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Four loci significantly associated with overlap hernia in 5,219 cases and 26,095 controls in UK Biobank.
p
SynergiQC
catalog.paradim.science
Updated Oct 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philippe Joubert (2025). SynergiQC [Dataset]. https://catalog.paradim.science/index.php/en/synergiqc
Explore at:
Dataset updated
Oct 31, 2025
Dataset provided by
Cartographies RSN
Authors
Philippe Joubert
Description
The dataset contains lung cancer CT images (DICOM format) with segmentations (DICOM SEG) of tumors. Clinical and research data associated with images are available through IUCPQ-UL Biobank.
European (British) LD files for GhostKnockoffGWAS
zenodo.org
nde-dev.biothings.io
zip
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
benjamin chu; benjamin chu (2025). European (British) LD files for GhostKnockoffGWAS [Dataset]. http://doi.org/10.5281/zenodo.15191305
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15191305
Dataset updated
Apr 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
benjamin chu; benjamin chu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 10, 2025
Area covered
United Kingdom
Description
This contains pre-processed LD files (Sigma matrix, S matrix, ...etc) computed on unrelated British samples of the UK-Biobank (n = 306604). It is intended to be used as an input to the GhostKnockoffGWAS pipeline.

This is the output of applying solveblock executable directly on 306,604 unrelated British samples of the UK-Biobank.

Quasi-independent blocks are computed by applying the snp_ldsplit function with parameters thr_r2=0.01, max_r2=0.3, min_size = 500, and max_size = {1000, 1500, 3000, 6000, 10000}.

SNPs with minor allele frequency less than 0.01 or Hardy-Weinburg equilibrium p-value less than 1e-6 are removed.

Only HG19 coordinates are available.

Knockoff optimization were carried out by the Knockoffs.jl julia package: https://github.com/biona001/Knockoffs.jl

The result (i.e. files available in this site) is saved in .csv and .h5 formatted files for easier access, which is directly readable by GhostKnockoffGWAS.

Note: We previously released another set of EUR LD files. This set of LD files should be preferred over the previous one. The main difference with this entry is that the previous entry used quasi-independent blocks from LDetect computed on the 1000 genomes project. Here we compute the independent blocks using snp_ldsplit directly on the UK-Biobank British samples.
r
ASPREE Genome-wide SNP Genotyping Dataset
researchdata.edu.au
bridges.monash.edu
Updated Nov 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Lacaze; John McNeill (2022). ASPREE Genome-wide SNP Genotyping Dataset [Dataset]. http://doi.org/10.26180/21097654.V1
Explore at:
Unique identifier
https://doi.org/10.26180/21097654.V1
Dataset updated
Nov 16, 2022
Dataset provided by
Monash University
Authors
Paul Lacaze; John McNeill
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ASPirin in Reducing Events in the Elderly (ASPREE) is a clinical trial and longitudinal study of healthy ageing, involving 16,703 Australians aged over 70 years and 2,411 Americans aged over 65 enrolled. The primary ASPREE trial has received >$60M in NIH funding since 2012. Each ASPREE participants’ health is tracked longitudinally through extensive phenotyping and collection of clinical outcome data. The ASPREE Healthy Ageing Biobank is an associated biorepository of blood, saliva and urine samples from >15,000 ASPREE participants, 10,000 of whom have now provided a matched follow-up 3-year sample. Biospecimens are consented for genetic and biomarker studies, enabling ASPREE to conduct molecular epidemiology and healthy ageing research. ASPREE facilitates over 12 sub-studies funded through NHMRC.

Genome wide association analysis has been undertaken in multiple areas:

All-cause and vascular dementia

Stroke

Genetic modifiers for Alzheimer's disease

Poygenic resilence scores capture genetic effect for Alzheimers disease

Genome wide association analysis for Alzheimers Disease

https://bioplatforms.com/projects/aspree-framework-initiative/

To make data requests, you may undertake an approval process by contacting Paul Lacaze on Paul.Lacaze@monash.edu
Segmentation Networks and Representative Meshes from UK Biobank
zenodo.org
zip
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Devran Ugurlu; Shuang Qian; Elliot Fairweather; Charlene Mauger; Bram Ruijsink; Laura Dal Toso; Yu Deng; Marina Strocchi; Reza Razavi; Alistair Young; Pablo Lamata; Steven Niederer; Martin Bishop; Devran Ugurlu; Shuang Qian; Elliot Fairweather; Charlene Mauger; Bram Ruijsink; Laura Dal Toso; Yu Deng; Marina Strocchi; Reza Razavi; Alistair Young; Pablo Lamata; Steven Niederer; Martin Bishop (2025). Segmentation Networks and Representative Meshes from UK Biobank [Dataset]. http://doi.org/10.5281/zenodo.15649643
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15649643
Dataset updated
Jun 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Devran Ugurlu; Shuang Qian; Elliot Fairweather; Charlene Mauger; Bram Ruijsink; Laura Dal Toso; Yu Deng; Marina Strocchi; Reza Razavi; Alistair Young; Pablo Lamata; Steven Niederer; Martin Bishop; Devran Ugurlu; Shuang Qian; Elliot Fairweather; Charlene Mauger; Bram Ruijsink; Laura Dal Toso; Yu Deng; Marina Strocchi; Reza Razavi; Alistair Young; Pablo Lamata; Steven Niederer; Martin Bishop
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We present a database of representative left and right ventricular meshes constructed from patient-specific models based on a large cohort of ~55k participants from UK Biobank. It comprises 1423 representative tetrahedral finite element meshes across sex (male, female), body mass index (range: 16 - 42 kg/m²) and age (range: 49 - 80 years).

For each mesh, it also includes:

a realistic biventricular myocardial fibre structure

a morphological coordinate system which describes the positions within ventricles based on (1) the apical-basal (Z), (2) transmural (ρ) (from endocardium to epicardium), (3) rotational (Φ) (anterior, anteroseptal, inferior, inferolateral, anterolateral) and (4) chamber-wise (left ventricle and right ventricle) coordinates.

We also present trained network weights and nnUNet plan and hyperparameter selection files for cine MR segmentation models trained separately for the following views: 2 chamber, 3 chamber, 4 chamber and short axis. These are supplied as a zip of relevant nnUNet files for each view: Dataset101_UKBB_LAX_2Ch.zip, Dataset102_UKBB_LAX_3Ch.zip, Dataset103_UKBB_LAX_4Ch.zip, Dataset100_UKBB_Petersen_SAX.zip.
Regional association plots for ancestry groups in the discovery cohort.
figshare.com
bin
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy D. Stockwell; Michael C. Chang; Anubha Mahajan; William Forrest; Neha Anegondi; Rion K. Pendergrass; Suresh Selvaraj; Jens Reeder; Eric Wei; Victor A. Iglesias; Natalie M. Creps; Laura Macri; Andrea N. Neeranjan; Marcel P. van der Brug; Suzie J. Scales; Mark I. McCarthy; Brian L. Yaspan (2023). Regional association plots for ancestry groups in the discovery cohort. [Dataset]. http://doi.org/10.1371/journal.pgen.1010609.s004
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1010609.s004
Dataset updated
Aug 28, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Amy D. Stockwell; Michael C. Chang; Anubha Mahajan; William Forrest; Neha Anegondi; Rion K. Pendergrass; Suresh Selvaraj; Jens Reeder; Eric Wei; Victor A. Iglesias; Natalie M. Creps; Laura Macri; Andrea N. Neeranjan; Marcel P. van der Brug; Suzie J. Scales; Mark I. McCarthy; Brian L. Yaspan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AFR = African descent; AMR = Admixed American descent; EAS = East Asian descent; EUR = European descent. (XLSX)
eRNA GReX
zenodo.org
zip
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael J. Betti; Michael J. Betti; Eric Gamazon; Eric Gamazon (2024). eRNA GReX [Dataset]. http://doi.org/10.5281/zenodo.11212496
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11212496
Dataset updated
Jun 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael J. Betti; Michael J. Betti; Eric Gamazon; Eric Gamazon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains all model weights and corresponding datasets generated by Betti et al. in the manuscript Genetically regulated enhancer RNA expression predicts enhancer-promoter contact frequency and reveals genetic mechanisms at complex trait-associated loci. The following are the contents of the sub-directories in this dataset:

coloc: Colocalization results for genome-wide significant (p < 5 x 10^-8) GWAS associations in the UK Biobank with eRNA and canonical gene eQTLs (Supplementary Tables 11 and 12).

contact_model_training: Input datasets from whole blood and brain, respectively, that were used to train the neural network-based models of contact frequency.

eqtl_mapping: eQTLs mapped across 49 cell and tissue types for both eRNAs and canonical genes.

scz_mr: Inputs and results for Mendelian randomization analysis of eRNA and canonical gene-based TWAS of schizophrenia.

scz_twas: eRNA and canonical gene-based TWAS results of schizophrenia.

trained_models: Model weights and SNP covariance matrices for genetically regulated eRNA expression (GReX) across 49 cell and tissue types.

uk_biobank_twas: eRNA-based TWAS summary statistics for 4,671 UK Biobank traits across 49 cell and tissue types.

Please cite:

Betti, M.J., Aldrich, M.C., Lin, P., & Gamazon, E.R. (2024). Genetically regulated enhancer RNA expression predicts enhancer-promoter contact frequency and reveals genetic mechanisms at complex trait-associated loci. Preprint.

Betti, M.J., Aldrich, M.C., Lin, P., & Gamazon, E.R. (2024). eRNA GReX (Version 1.0). Zenodo. 10.5281/zenodo.11212496
Caribbean LD files for GhostKnockoffGWAS
zenodo.org
zip
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
benjamin chu; benjamin chu (2025). Caribbean LD files for GhostKnockoffGWAS [Dataset]. http://doi.org/10.5281/zenodo.15192021
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15192021
Dataset updated
Apr 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
benjamin chu; benjamin chu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 10, 2025
Area covered
Caribbean
Description
This contains pre-processed LD files (Sigma matrix, S matrix, ...etc) computed on Caribbean samples of the UK-Biobank (n = 4517). It is intended to be used as an input to the GhostKnockoffGWAS pipeline.

This is the output of applying solveblock executable directly on 4517 Caribbean samples of the UK-Biobank.

Quasi-independent blocks are computed by applying the snp_ldsplit function with parameters thr_r2=0.01, max_r2=0.3, min_size = 500, and max_size = {1000, 1500, 3000, 6000, 10000}.

SNPs with minor allele frequency less than 0.01 or Hardy-Weinburg equilibrium p-value less than 1e-6 are removed.

Only HG19 coordinates are available.

Knockoff optimization were carried out by the Knockoffs.jl julia package: https://github.com/biona001/Knockoffs.jl

The result (i.e. files available in this site) is saved in .csv and .h5 formatted files for easier access, which is directly readable by GhostKnockoffGWAS.
Chinese LD files for GhostKnockoffGWAS
zenodo.org
zip
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
benjamin chu; benjamin chu (2025). Chinese LD files for GhostKnockoffGWAS [Dataset]. http://doi.org/10.5281/zenodo.15198714
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15198714
Dataset updated
Apr 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
benjamin chu; benjamin chu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This contains pre-processed LD files (Sigma matrix, S matrix, ...etc) computed on Chinese samples of the UK-Biobank (n = 1574). It is intended to be used as an input to the GhostKnockoffGWAS pipeline.

This is the output of applying solveblock executable directly on 1574 Chinese samples of the UK-Biobank.

Quasi-independent blocks are computed by applying the snp_ldsplit function with parameters thr_r2=0.01, max_r2=0.3, min_size = 500, and max_size = {1000, 1500, 3000, 6000, 10000}.

SNPs with minor allele frequency less than 0.01 or Hardy-Weinburg equilibrium p-value less than 1e-6 are removed.

Only HG19 coordinates are available.

Knockoff optimization were carried out by the Knockoffs.jl julia package: https://github.com/biona001/Knockoffs.jl

The result (i.e. files available in this site) is saved in .csv and .h5 formatted files for easier access, which is directly readable by GhostKnockoffGWAS.
d
Data from: In search of the genetic variants of human sex ratio at birth:...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siliang Song; Jianzhi Zhang (2025). In search of the genetic variants of human sex ratio at birth: Was Fisher wrong about sex ratio evolution? [Dataset]. http://doi.org/10.5061/dryad.vdncjsz43
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.vdncjsz43
Dataset updated
Aug 4, 2025
Dataset provided by
Dryad Digital Repository
Authors
Siliang Song; Jianzhi Zhang
Description
The human sex ratio (fraction of males) at birth is close to 0.5 at the population level, an observation commonly explained by Fisher's principle. However, past human studies yielded conflicting results regarding the existence of sex ratio-influencing mutations-a prerequisite to Fisherâ€™s principle, raising the question of whether the nearly even population sex ratio is instead dictated by the random X/Y chromosome segregation in male meiosis. Here we show that, because a personâ€™s offspring sex ratio (OSR) has an enormous measurement error, a gigantic sample is required to detect OSR-influencing genetic variants. Conducting a UK Biobank-based genome-wide association study that is more powerful than previous studies, we detect an OSR-associated genetic variant, which awaits verification in independent samples. Given the abysmal precision in measuring OSR, it is unsurprising that the estimated heritability of OSR is effectively zero. We further show that OSRâ€™s estimated heritability would ..., GWAS: When conducting the GWAS in the UKB, we did not simply use the sibling sex ratio as the trait, because of the difficulty in accounting for different estimation errors of the sibling sex ratio for different families as a result of the variation in family size.Â For example, individual A has one brother and zero sister, while individual B has four brothers and one sister.Â Although A has a higher sibling sex ratio than B, Bâ€™s siblings obviously provide stronger evidence for a male-biased sibling sex ratio than Aâ€™s siblings.Â To properly weigh the data by the family size, we considered the birth of each sibling as an independent event.Â In the above example, we would associate Aâ€™s genotype with one male birth and associate Bâ€™s genotype with four male births and one female birth.Â In GWAS, a male birth is coded as 1 and a female birth is coded as 0.Â The UKB participants have a total of 873,715 full siblings, leading to an unprecedented statistical power.Â In our GWAS in the UKB, we i..., , # In search of the genetic variants of human sex ratio at birth: Was Fisher wrong about sex ratio evolution?

https://doi.org/10.5061/dryad.vdncjsz43

Description of the data and file structure

GWAS summary statistics and simulation data of the paper "In search of the genetic variants of human sex ratio at birth: Was Fisher wrong about sex ratio evolution?"

Files and variables

File: Human_sex_ratio_scrit.zip

Description:Â Scripts for the project. For descriptions of each script files, see README.md in the zip file orÂ https://github.com/song88180/Human_sex_ratio

File: GWAS_OSR_cov_logistic.tsv

Description:Â GWAS summary statistics of offspring sex ratio. Cells with "NA" means the value is not available.Â

Variables

CHROM: chromosome number

POS: SNP position (GRCh37)

ID: rsid of the SNP

REF: reference allele

ALT: alternative allele

P: P-value

t...
Data files for the manuscript entitled, "Single-cell DNA methylome and 3D...
zenodo.org
txt, zip
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeyuan Johnson Chen; Zeyuan Johnson Chen; Sankha Subhra Das; Sankha Subhra Das; Asha Kar; Asha Kar; Seung Hyuk Tony Lee; Seung Hyuk Tony Lee; Kevin Abuhanna; Kevin Abuhanna; Marcus Alvarez; Marcus Alvarez; Mihir Sukhatme; Mihir Sukhatme; Zitian Wang; Zitian Wang; Kyla Gelev; Kyla Gelev; Sandhya Rajkumar; Sandhya Rajkumar; Matthew Heffel; Yi Zhang; Oren Avram; Oren Avram; Elior Rahmani; Sriram Sankararaman; Sriram Sankararaman; Sini Heinonen; Sini Heinonen; Peltoniemi Hilkka; Eran Halperin; Kirsi Pietiläinen; Kirsi Pietiläinen; Chongyuan Luo; Paivi Pajukanta; Paivi Pajukanta; Matthew Heffel; Yi Zhang; Elior Rahmani; Peltoniemi Hilkka; Eran Halperin; Chongyuan Luo (2025). Data files for the manuscript entitled, "Single-cell DNA methylome and 3D genome atlas of human subcutaneous adipose tissue." [Dataset]. http://doi.org/10.5281/zenodo.15318595
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15318595
Dataset updated
Jun 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zeyuan Johnson Chen; Zeyuan Johnson Chen; Sankha Subhra Das; Sankha Subhra Das; Asha Kar; Asha Kar; Seung Hyuk Tony Lee; Seung Hyuk Tony Lee; Kevin Abuhanna; Kevin Abuhanna; Marcus Alvarez; Marcus Alvarez; Mihir Sukhatme; Mihir Sukhatme; Zitian Wang; Zitian Wang; Kyla Gelev; Kyla Gelev; Sandhya Rajkumar; Sandhya Rajkumar; Matthew Heffel; Yi Zhang; Oren Avram; Oren Avram; Elior Rahmani; Sriram Sankararaman; Sriram Sankararaman; Sini Heinonen; Sini Heinonen; Peltoniemi Hilkka; Eran Halperin; Kirsi Pietiläinen; Kirsi Pietiläinen; Chongyuan Luo; Paivi Pajukanta; Paivi Pajukanta; Matthew Heffel; Yi Zhang; Elior Rahmani; Peltoniemi Hilkka; Eran Halperin; Chongyuan Luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the genome-wide association study (GWAS) statistics in the UK Biobank and Source Data files for our paper Chen ZJ, Das SS, Kar A, Lee SHT, Abuhanna KD, Alvarez M, Sukhatme MG, Wang Z, Gelev KZ, Heffel MG, Zhang Y, Avram O, Rahmani E, Sankararaman S, Heinonen S, Peltoniemi H, Halperin E, Pietiläinen KH, Luo C, Pajukanta P. Single-cell DNA methylome and 3D genome atlas of human subcutaneous adipose tissue.
Further details of these analyses can be found in the Methods and Results part of this paper.

Repository contents

GWAS summary statistics in the UK Biobank for C-reactive protein (CRP), body mass index (BMI), metabolic-dysfunction associated steatotic liver disease (MASLD), and waist-to-hip ratio adjusted for BMI (WHRadjBMI):

GWAS.zip

Figure source data:

Figure2.zip

Figure3.zip

Figure4.zip

Figure5.zip

Figure6.zip

ExtendedDataFigure1.zip

ExtendedDataFigure2.zip

ExtendedDataFigure3.zip

ExtendedDataFigure4.zip

ExtendedDataFigure5.zip

ExtendedDataFigure6.zip

ExtendedDataFigure7.zip

ExtendedDataFigure8.zip

ExtendedDataFigure9.zip

ExtendedDataFigure10.zip

SupplementaryFigure1.zip

SupplementaryFigure2.zip

SupplementaryFigure3.zip

SupplementaryFigure4.zip

SupplementaryFigure5.zip

SupplementaryFigure6.zip

SupplementaryFigure7.zip

Facebook

Twitter

Click to copy link

Link copied

Cite

Antonio Gasparrini; Antonio Gasparrini; Jacopo Vanoli; Jacopo Vanoli (2025). Synthetic datasets of the UK Biobank cohort [Dataset]. http://doi.org/10.5281/zenodo.13983170

Synthetic datasets of the UK Biobank cohort

Explore at:

bin, csv, zip, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13983170

Dataset updated

Sep 17, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Antonio Gasparrini; Antonio Gasparrini; Jacopo Vanoli; Jacopo Vanoli

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This repository stores synthetic datasets derived from the database of the UK Biobank (UKB) cohort.

The datasets were generated for illustrative purposes, in particular for reproducing specific analyses on the health risks associated with long-term exposure to air pollution using the UKB cohort. The code used to create the synthetic datasets is available and documented in a related GitHub repo, with details provided in the section below. These datasets can be freely used for code testing and for illustrating other examples of analyses on the UKB cohort.

The synthetic data have been used so far in two analyses described in related peer-reviewed publications, which also provide information about the original data sources:

Vanoli J, et al. Long-term associations between time-varying exposure to ambient PM2.5 and mortality: an analysis of the UK Biobank. Epidemiology. 2025;36(1):1-10. DOI: 10.1097/EDE.0000000000001796 [freely available here, with code provided in this GitHub repo]
Vanoli J, et al. Confounding issues in air pollution epidemiology: an empirical assessment with the UK Biobank cohort. International Journal of Epidemiology. 2025;54(5):dyaf163. DOI: 10.1093/ije/dyaf163 [freely available here, with code provided in this GitHub repo]

Note: while the synthetic versions of the datasets resemble the real ones in several aspects, the users should be aware that these data are fake and must not be used for testing and making inferences on specific research hypotheses. Even more importantly, these data cannot be considered a reliable description of the original UKB data, and they must not be presented as such.

The work was supported by the Medical Research Council-UK (Grant ID: MR/Y003330/1).

Content

The series of synthetic datasets (stored in two versions with csv and RDS formats) are the following:

synthbdcohortinfo: basic cohort information regarding the follow-up period and birth/death dates for 502,360 participants.
synthbdbasevar: baseline variables, mostly collected at recruitment.
synthpmdata: annual average exposure to PM_2.5 for each participant reconstructed using their residential history.
synthoutdeath: death records that occurred during the follow-up with date and ICD-10 code.

In addition, this repository provides these additional files:

codebook: a pdf file with a codebook for the variables of the various datasets, including references to the fields of the original UKB database.
asscentre: a csv file with information on the assessment centres used for recruitment of the UKB participants, including code, names, and location (as northing/easting coordinates of the British National Grid).
Countries_December_2022_GB_BUC: a zip file including the shapefile defining the boundaries of the countries in Great Britain (England, Wales, and Scotland), used for mapping purposes [source].

Generation of the synthetic data

The datasets resemble the real data used in the analysis, and they were generated using the R package synthpop (www.synthpop.org.uk). The generation process involves two steps, namely the synthesis of the main data (cohort info, baseline variables, annual PM_2.5 exposure) and then the sampling of death events. The R scripts for performing the data synthesis are provided in the GitHub repo (subfolder Rcode/synthcode).

The first part merges all the data, including the annual PM_2.5 levels, into a single wide-format dataset (with a row for each subject), generates a synthetic version, adds fake IDs, and then extracts (and reshapes) the single datasets. In the second part, a Cox proportional hazard model is fitted on the original data to estimate risks associated with various predictors (including the main exposure represented by PM_2.5), and then these relationships are used to simulate death events in each year. Details on the modelling aspects are provided in the article.

This process guarantees that the synthetic data do not hold specific information about the original records, thus preserving confidentiality. At the same time, the multivariate distribution and correlation across variables, as well as the mortality risks, resemble those of the original data, so the results of descriptive and inferential analyses are similar to those in the original assessments. However, as noted above, the data are used only for illustrative purposes, and they must not be used to test other research hypotheses.

Clear search

Close search

Google apps

Main menu

Synthetic datasets of the UK Biobank cohort

Content

Generation of the synthetic data

Linkage-Disequilibrium (LD) matrices for six continental ancestry groups...

GWAS summary statistics for Standing Height from the UK Biobank (5-fold...

NURTuRE Chronic Kidney Disease (NCKD)

Association of all hypermetropia and hypermetropia (low or moderate/high),...

DreamData

📊 Data Sources (Scientific & Real-World)

1. Dream Reports (DreamBank + Dryad RSOS)

2. Sleep Architecture (Sleep-EDF Hypnogram Database)

3. Epidemiological Context (Insomnia Prevalence)

Effect sizes for 200+ polygenic scores

24 genome-wide significant loci discovered in the metaUSAT multivariable...

Data from: Brain Ages Derived from Different MRI Modalities are Associated...

Four loci significantly associated with overlap hernia in 5,219 cases and...

SynergiQC

European (British) LD files for GhostKnockoffGWAS

ASPREE Genome-wide SNP Genotyping Dataset

Segmentation Networks and Representative Meshes from UK Biobank

Regional association plots for ancestry groups in the discovery cohort.

eRNA GReX

Caribbean LD files for GhostKnockoffGWAS

Chinese LD files for GhostKnockoffGWAS

Data from: In search of the genetic variants of human sex ratio at birth:...

Description of the data and file structure

Files and variables

File: Human_sex_ratio_scrit.zip

File: GWAS_OSR_cov_logistic.tsv

Variables

Data files for the manuscript entitled, "Single-cell DNA methylome and 3D...

Synthetic datasets of the UK Biobank cohort

Content

Generation of the synthetic data