4 datasets found

SynthRAD2025 Grand Challenge dataset: generating synthetic CT for...

zenodo.org

pdf

Updated Jul 2, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Adrian Thummerer; Adrian Thummerer; Erik van der Bijl; Erik van der Bijl; Arthur Jr. Galapon; Arthur Jr. Galapon; Florian Kamp; Florian Kamp; Matteo Maspero; Matteo Maspero (2025). SynthRAD2025 Grand Challenge dataset: generating synthetic CT for radiotherapy [Dataset]. http://doi.org/10.5281/zenodo.14918089

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14918089

Dataset updated

Jul 2, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Adrian Thummerer; Adrian Thummerer; Erik van der Bijl; Erik van der Bijl; Arthur Jr. Galapon; Arthur Jr. Galapon; Florian Kamp; Florian Kamp; Matteo Maspero; Matteo Maspero

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Time period covered

Mar 1, 2025

Description

Dataset Description

Dataset Structure

A detailed description available in "SynthRAD2025_dataset_description.pdf". A paper describing the dataset has been submitted to Medical Physics and is available as pre-print at: https://arxiv.org/abs/2502.17609" target="_blank" rel="noopener">https://arxiv.org/abs/2502.17609. The dataset is divided into two tasks:

Task 1 (MRI-to-CT conversion) is provided in Task1.zip.
Task 2 (CBCT-to-CT conversion) is provided in Task2.zip.

After extraction, the dataset is organized as follows:

Within each task, cases are categorized into three anatomical regions:

Head-and-neck (HN)
Thorax (TH)
Abdomen (AB)

Each anatomical region contains individual patient folders, named using a unique seven-letter alphanumeric code:
[Task Number][Anatomy][Center][PatientID]
Example: 1HNA001

Each patient folder in the training dataset contains (for other sets see Table below):

ct.mha: preprocessed CT image
mr.mha or cbct.mha (depending on the task): preprocessed MR or CBCT image
mask.mha: Binary mask of the patient outline (dilated)

An overview folder within each anatomical region contains:

[task]_[anatomy]_parameters.xlsx: Imaging protocol details for each patient.
[task][anatomy][center][PatientID]_overview.png: A visualization of axial, coronal, and sagittal slices of CBCT/MR, CT, mask, and difference images.

Dataset Overview

The SynthRAD2025 dataset is part of the second edition of the SynthRAD deep learning challenge (https://synthrad2025.grand-challenge.org/), which benchmarks synthetic CT generation for MRI- and CBCT-based radiotherapy workflows.

Task 1: MRI-to-CT conversion for MR-only and MR-guided photon/proton radiotherapy, consisting of 890 MRI-CT pairs.
Task 2: CBCT-to-CT conversion for daily adaptive radiotherapy workflows, consisting of 1,472 CBCT-CT pairs.

Imaging data was collected from five European university medical centers:

Netherlands: UMC Groningen, UMC Utrecht, Radboud UMC
Germany: LMU Klinikum Munich, UK Cologne

All centers have independently approved the study in accordance with their institutional review boards or medical ethics committee regulations.

Inclusion criteria:

Patients treated with external beam radiotherapy (photon or proton therapy) at one of the data-providing centers.
Imaging data available from one of the three anatomical regions.
No restrictions on age, sex, tumor characteristics, or staging.

License

The dataset is provided under two different licenses:

Data from centers A, B, C, and E is provided under a CC-BY-NC 4.0 International License (creativecommons.org/licenses /by-nc/4.0/).
Data from center D is provided with a limited license which permits it's use only for the duration of the challenge and remains valid only while the challenge is active (Limited Use License Center D). By downloading Center D's data, participants agree to these terms. Once the challenge ends, access to the data ends, the download link will be deactivated, and all downloaded data must be deleted. After requesting participation in the challenge on the SynthRAD2025 website, participants can access the download link for center D at https://synthrad2025.grand-challenge.org/data/.

Data Release Schedule

Subset	Files	Release Date	Link
Training	Input, CT, Mask	01-03-2025	https://doi.org/10.5281/zenodo.14918213
Training Center D	Input, CT, Mask	01-03-2025	Check the download link at: https://synthrad2025.grand-challenge.org/data/ Limited use License: License
Validation Input	Input, Mask	01-06-2025	https://doi.org/10.5281/zenodo.14918504
Validation Input Center D	Input, Mask	01-06-2025	Check the download link at: https://synthrad2025.grand-challenge.org/data/ Limited use License: License
Validation Ground Truth	CT, Deformed CT	01-03-2030	https://doi.org/10.5281/zenodo.14918605
Test	Input, CT, Deformed CT, Mask	01-03-2030	https://doi.org/10.5281/zenodo.14918722

Dataset Composition

The number of cases collected at each center for training, validation, and test sets.

Training Set

<td style="width:

Task	Center	HN	TH	AB	Total
1	A	91	91	65	247
	B	0	91	91	182
	C	65	0	19	84
	D	65	0	0	65
	E	0	0	0	0
	Total	221	182	175	578
2	A	65	65	64	195
	B	65	65	65	195
	C	65	63	62	190
	D	65	63	53	181
	E	65	65	65

MSMD - Multimodal Sheet Music Dataset
zenodo.org
data.europa.eu
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer; Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer (2020). MSMD - Multimodal Sheet Music Dataset [Dataset]. http://doi.org/10.5281/zenodo.2597505
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2597505
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer; Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MSMD is a synthetic dataset of 497 pieces of (classical) music that contains both audio and score representations of the pieces aligned at a fine-grained level (344,742 pairs of noteheads aligned to their audio/MIDI counterpart). It can be used for training and evaluating multimodal models that enable crossing from one modality to the other, such as retrieving sheet music using recordings or following a performance in the score image.

Please find further information and a corresponding Python package on this Github page: https://github.com/CPJKU/msmd

If you use this dataset, please cite:
[1] Matthias Dorfer, Jan Hajič jr., Andreas Arzt, Harald Frostel, Gerhard Widmer.
Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification (PDF).
Transactions of the International Society for Music Information Retrieval, issue 1, 2018.
Bank Account Fraud Dataset Suite (NeurIPS 2022)
kaggle.com
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sérgio Jesus (2023). Bank Account Fraud Dataset Suite (NeurIPS 2022) [Dataset]. https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sérgio Jesus
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Bank Account Fraud (BAF) suite of datasets has been published at NeurIPS 2022 and it comprises a total of 6 different synthetic bank account fraud tabular datasets. BAF is a realistic, complete, and robust test bed to evaluate novel and existing methods in ML and fair ML, and the first of its kind!

This suite of datasets is: - Realistic, based on a present-day real-world dataset for fraud detection; - Biased, each dataset has distinct controlled types of bias; - Imbalanced, this setting presents a extremely low prevalence of positive class; - Dynamic, with temporal data and observed distribution shifts;
- Privacy preserving, to protect the identity of potential applicants we have applied differential privacy techniques (noise addition), feature encoding and trained a generative model (CTGAN).

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2F4271ec763b04362801df2660c6e2ec30%2FScreenshot%20from%202022-11-29%2017-42-41.png?generation=1669743799938811&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Faf502caf5b9e370b869b85c9d4642c5c%2FScreenshot%20from%202022-12-15%2015-17-59.png?generation=1671117525527314&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Ff3789bd484ee392d648b7809429134df%2FScreenshot%20from%202022-11-29%2017-40-58.png?generation=1669743681526133&alt=media" alt="">

Each dataset is composed of: - 1 million instances; - 30 realistic features used in the fraud detection use-case; - A column of “month”, providing temporal information about the dataset; - Protected attributes, (age group, employment status and % income).

Detailed information (datasheet) on the suite: https://github.com/feedzai/bank-account-fraud/blob/main/documents/datasheet.pdf

Check out the github repository for more resources and some example notebooks: https://github.com/feedzai/bank-account-fraud

Read the NeurIPS 2022 paper here: https://arxiv.org/abs/2211.13358

Learn more about Feedzai Research here: https://research.feedzai.com/

Please, use the following citation of BAF dataset suite @article{jesusTurningTablesBiased2022, title={Turning the {{Tables}}: {{Biased}}, {{Imbalanced}}, {{Dynamic Tabular Datasets}} for {{ML Evaluation}}}, author={Jesus, S{\'e}rgio and Pombal, Jos{\'e} and Alves, Duarte and Cruz, Andr{\'e} and Saleiro, Pedro and Ribeiro, Rita P. and Gama, Jo{\~a}o and Bizarro, Pedro}, journal={Advances in Neural Information Processing Systems}, year={2022} }
VasTexture: Vast repository of textures and PBR Materials extracted from...
zenodo.org
jpeg, zip
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). VasTexture: Vast repository of textures and PBR Materials extracted from images using unsupervised approach [Dataset]. http://doi.org/10.5281/zenodo.11391127
Explore at:
zip, jpegAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11391127
Dataset updated
Apr 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
VasTexture: Vast repository of textures and SVBRDF/PBR Materials extracted from images using an unsupervised approach.

This is an old version For latest version: https://zenodo.org/records/12629301

This dataset contains hundreds of thousands (hopefully millions soon) of textures and PBR/SV-BRDF materials extracted from real-world natural images.

The repository is composed of RGB images of textures given as RGB images (each image is one uniform texture) and folders of PBR/SVBRDF materials given as a set of property maps (base color, roughness, metallic, etc).

Visualisation of sampled PBRs and Textures can be seen in: PBR_examples.jpg and Textures_Examples.jpg

Link to the main project page

Link to paper

File structure

Texture images are given in the Extracted_textures_*.zip files.

Each image in this zip file is a single texture, the textures were extracted and cropped from the open images dataset.

PBR Materials are available in PBR_*.zip files these PBRs were generated from the texture images in an unsupervised way (with no human intervention). Each subfolder in this file contains the properties map of the PBRs (roughness, metallic, etc, suitable for blender/unreal engine). Visualization of the rendered material appears in the file Material_View.jpg in each PBR folder.

PBR materials that were generated by mixing other PBR materials are available in files with the names PBR_mix*.zip

Samples for each case can be found in files named: Sample_*.zip

Documented code used to extract the textures and generate the PBRs is available at:

Texture_And_Material_ExtractionCode_And_Documentation.zip

Details:

The materials and textures were extracted from real-world images using an unsupervised extraction method (code supplied). As such they are far more diverse and wide in scope compared to existing repositories, at the same time they are much more noisy and contain more outliers compared to existing repositories. This repository is probably more useful for things that demand large-scale and very diverse data, yet can use noisy and lower quality compared to professional repositories with manually made assets like ambientCG. It can be very useful for creating machine learning datasets, or large-scale procedural generation. It is less suitable for areas that demand precise clean and categorized PBR like CGI art and graphic design. For preview It is recommended to look at PBR_examples.jpg and Textures_Examples.jpg or download the Sample files and look at the Material_View.jpg files to visualize the quality of the materials.

Scale:

Currently, there are a few hundred of thousands PBR materials and textures but the goal is to make this into over a million in the near future.

Data generation code:

The Python scripts used to extract these assets are supplied at:

Texture_And_Material_ExtractionCode_And_Documentation.zip

The code could be run in any folder of random images extract regions with uniform textures and turn these into PBR materials.

Alternative download sources:

Alternative download sources:

https://sites.google.com/view/infinitexture/home

https://e.pcloud.link/publink/show?code=kZON5TZtxLfdvKrVCzn12NADBFRNuCKHm70

https://icedrive.net/s/jfY1xSDNkVwtYDYD4FN5wha2A8Pz

Paper

This work was done as part of the paper "Learning Zero-Shot Material States Segmentation,

by Implanting Natural Image Patterns in Synthetic Data".

@article{eppel2024learning,

title={Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data},

author={Eppel, Sagi and Li, Jolina and Drehwald, Manuel and Aspuru-Guzik, Alan},

journal={arXiv preprint arXiv:2403.03309},

year={2024}

}

License:

All the code and repositories are available on CC0 (free to use) licenses.

Textures were extracted from the open images dataset which is an Apache license.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

SynthRAD2025 Grand Challenge dataset: generating synthetic CT for radiotherapy

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14918089

Dataset updated

Jul 2, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Adrian Thummerer; Adrian Thummerer; Erik van der Bijl; Erik van der Bijl; Arthur Jr. Galapon; Arthur Jr. Galapon; Florian Kamp; Florian Kamp; Matteo Maspero; Matteo Maspero

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Time period covered

Mar 1, 2025

Description

Dataset Description

Dataset Structure

Task 1 (MRI-to-CT conversion) is provided in Task1.zip.
Task 2 (CBCT-to-CT conversion) is provided in Task2.zip.

After extraction, the dataset is organized as follows:

Within each task, cases are categorized into three anatomical regions:

Head-and-neck (HN)
Thorax (TH)
Abdomen (AB)

Each anatomical region contains individual patient folders, named using a unique seven-letter alphanumeric code:
[Task Number][Anatomy][Center][PatientID]
Example: 1HNA001

Each patient folder in the training dataset contains (for other sets see Table below):

ct.mha: preprocessed CT image
mr.mha or cbct.mha (depending on the task): preprocessed MR or CBCT image
mask.mha: Binary mask of the patient outline (dilated)

An overview folder within each anatomical region contains:

[task]_[anatomy]_parameters.xlsx: Imaging protocol details for each patient.
[task][anatomy][center][PatientID]_overview.png: A visualization of axial, coronal, and sagittal slices of CBCT/MR, CT, mask, and difference images.

Dataset Overview

Task 1: MRI-to-CT conversion for MR-only and MR-guided photon/proton radiotherapy, consisting of 890 MRI-CT pairs.
Task 2: CBCT-to-CT conversion for daily adaptive radiotherapy workflows, consisting of 1,472 CBCT-CT pairs.

Imaging data was collected from five European university medical centers:

Netherlands: UMC Groningen, UMC Utrecht, Radboud UMC
Germany: LMU Klinikum Munich, UK Cologne

All centers have independently approved the study in accordance with their institutional review boards or medical ethics committee regulations.

Inclusion criteria:

Patients treated with external beam radiotherapy (photon or proton therapy) at one of the data-providing centers.
Imaging data available from one of the three anatomical regions.
No restrictions on age, sex, tumor characteristics, or staging.

License

The dataset is provided under two different licenses:

Data from centers A, B, C, and E is provided under a CC-BY-NC 4.0 International License (creativecommons.org/licenses /by-nc/4.0/).
Data from center D is provided with a limited license which permits it's use only for the duration of the challenge and remains valid only while the challenge is active (Limited Use License Center D). By downloading Center D's data, participants agree to these terms. Once the challenge ends, access to the data ends, the download link will be deactivated, and all downloaded data must be deleted. After requesting participation in the challenge on the SynthRAD2025 website, participants can access the download link for center D at https://synthrad2025.grand-challenge.org/data/.

Data Release Schedule

Subset	Files	Release Date	Link
Training	Input, CT, Mask	01-03-2025	https://doi.org/10.5281/zenodo.14918213
Training Center D	Input, CT, Mask	01-03-2025	Check the download link at: https://synthrad2025.grand-challenge.org/data/ Limited use License: License
Validation Input	Input, Mask	01-06-2025	https://doi.org/10.5281/zenodo.14918504
Validation Input Center D	Input, Mask	01-06-2025	Check the download link at: https://synthrad2025.grand-challenge.org/data/ Limited use License: License
Validation Ground Truth	CT, Deformed CT	01-03-2030	https://doi.org/10.5281/zenodo.14918605
Test	Input, CT, Deformed CT, Mask	01-03-2030	https://doi.org/10.5281/zenodo.14918722

Dataset Composition

The number of cases collected at each center for training, validation, and test sets.

Training Set

<td style="width:

Task	Center	HN	TH	AB	Total
1	A	91	91	65	247
	B	0	91	91	182
	C	65	0	19	84
	D	65	0	0	65
	E	0	0	0	0
	Total	221	182	175	578
2	A	65	65	64	195
	B	65	65	65	195
	C	65	63	62	190
	D	65	63	53	181
	E	65	65	65

Clear search

Close search

Google apps

Main menu

SynthRAD2025 Grand Challenge dataset: generating synthetic CT for...

Dataset Description

Dataset Structure

Dataset Overview

License

Data Release Schedule

Dataset Composition

Training Set

MSMD - Multimodal Sheet Music Dataset

Bank Account Fraud Dataset Suite (NeurIPS 2022)

VasTexture: Vast repository of textures and PBR Materials extracted from...

VasTexture: Vast repository of textures and SVBRDF/PBR Materials extracted from images using an unsupervised approach.

This is an old version For latest version: https://zenodo.org/records/12629301

File structure

Details:

Scale:

Data generation code:

Alternative download sources:

Paper

License:

SynthRAD2025 Grand Challenge dataset: generating synthetic CT for radiotherapy

Dataset Description

Dataset Structure

Dataset Overview

License

Data Release Schedule

Dataset Composition

Training Set