4 datasets found
  1. SynthRAD2025 Grand Challenge dataset: generating synthetic CT for...

    • zenodo.org
    pdf
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adrian Thummerer; Adrian Thummerer; Erik van der Bijl; Erik van der Bijl; Arthur Jr. Galapon; Arthur Jr. Galapon; Florian Kamp; Florian Kamp; Matteo Maspero; Matteo Maspero (2025). SynthRAD2025 Grand Challenge dataset: generating synthetic CT for radiotherapy [Dataset]. http://doi.org/10.5281/zenodo.14918089
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Adrian Thummerer; Adrian Thummerer; Erik van der Bijl; Erik van der Bijl; Arthur Jr. Galapon; Arthur Jr. Galapon; Florian Kamp; Florian Kamp; Matteo Maspero; Matteo Maspero
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    Mar 1, 2025
    Description

    Dataset Description

    Dataset Structure

    A detailed description available in "SynthRAD2025_dataset_description.pdf". A paper describing the dataset has been submitted to Medical Physics and is available as pre-print at: https://arxiv.org/abs/2502.17609" target="_blank" rel="noopener">https://arxiv.org/abs/2502.17609. The dataset is divided into two tasks:

    • Task 1 (MRI-to-CT conversion) is provided in Task1.zip.
    • Task 2 (CBCT-to-CT conversion) is provided in Task2.zip.

    After extraction, the dataset is organized as follows:

    Within each task, cases are categorized into three anatomical regions:

    • Head-and-neck (HN)
    • Thorax (TH)
    • Abdomen (AB)

    Each anatomical region contains individual patient folders, named using a unique seven-letter alphanumeric code:
    [Task Number][Anatomy][Center][PatientID]
    Example: 1HNA001

    Each patient folder in the training dataset contains (for other sets see Table below):

    • ct.mha: preprocessed CT image
    • mr.mha or cbct.mha (depending on the task): preprocessed MR or CBCT image
    • mask.mha: Binary mask of the patient outline (dilated)

    An overview folder within each anatomical region contains:

    • [task]_[anatomy]_parameters.xlsx: Imaging protocol details for each patient.
    • [task][anatomy][center][PatientID]_overview.png: A visualization of axial, coronal, and sagittal slices of CBCT/MR, CT, mask, and difference images.

    Dataset Overview

    The SynthRAD2025 dataset is part of the second edition of the SynthRAD deep learning challenge (https://synthrad2025.grand-challenge.org/), which benchmarks synthetic CT generation for MRI- and CBCT-based radiotherapy workflows.

    • Task 1: MRI-to-CT conversion for MR-only and MR-guided photon/proton radiotherapy, consisting of 890 MRI-CT pairs.
    • Task 2: CBCT-to-CT conversion for daily adaptive radiotherapy workflows, consisting of 1,472 CBCT-CT pairs.

    Imaging data was collected from five European university medical centers:

    • Netherlands: UMC Groningen, UMC Utrecht, Radboud UMC
    • Germany: LMU Klinikum Munich, UK Cologne

    All centers have independently approved the study in accordance with their institutional review boards or medical ethics committee regulations.

    Inclusion criteria:

    • Patients treated with external beam radiotherapy (photon or proton therapy) at one of the data-providing centers.
    • Imaging data available from one of the three anatomical regions.
    • No restrictions on age, sex, tumor characteristics, or staging.

    License

    The dataset is provided under two different licenses:

    • Data from centers A, B, C, and E is provided under a CC-BY-NC 4.0 International License (creativecommons.org/licenses /by-nc/4.0/).
    • Data from center D is provided with a limited license which permits it's use only for the duration of the challenge and remains valid only while the challenge is active (Limited Use License Center D). By downloading Center D's data, participants agree to these terms. Once the challenge ends, access to the data ends, the download link will be deactivated, and all downloaded data must be deleted. After requesting participation in the challenge on the SynthRAD2025 website, participants can access the download link for center D at https://synthrad2025.grand-challenge.org/data/.

    Data Release Schedule

    Subset

    Files

    Release Date

    Link

    Training

    Input, CT, Mask

    01-03-2025

    https://doi.org/10.5281/zenodo.14918213

    Training Center D

    Input, CT, Mask

    01-03-2025

    Check the download link at:
    https://synthrad2025.grand-challenge.org/data/

    Limited use License:
    License

    Validation Input

    Input, Mask

    01-06-2025

    https://doi.org/10.5281/zenodo.14918504

    Validation Input Center D

    Input, Mask

    01-06-2025

    Check the download link at:
    https://synthrad2025.grand-challenge.org/data/

    Limited use License:
    License

    Validation Ground Truth

    CT, Deformed CT

    01-03-2030

    https://doi.org/10.5281/zenodo.14918605

    Test

    Input, CT, Deformed CT, Mask

    01-03-2030

    https://doi.org/10.5281/zenodo.14918722

    Dataset Composition

    The number of cases collected at each center for training, validation, and test sets.

    Training Set

    <td style="width:

    TaskCenterHNTHABTotal
    1A919165247
    B09191182
    C6501984
    D650065
    E0000
    Total221182175578
    2A656564195
    B656565195
    C656362190
    D656353181
    E656565
  2. MSMD - Multimodal Sheet Music Dataset

    • zenodo.org
    • data.europa.eu
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer; Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer (2020). MSMD - Multimodal Sheet Music Dataset [Dataset]. http://doi.org/10.5281/zenodo.2597505
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer; Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MSMD is a synthetic dataset of 497 pieces of (classical) music that contains both audio and score representations of the pieces aligned at a fine-grained level (344,742 pairs of noteheads aligned to their audio/MIDI counterpart). It can be used for training and evaluating multimodal models that enable crossing from one modality to the other, such as retrieving sheet music using recordings or following a performance in the score image.

    Please find further information and a corresponding Python package on this Github page: https://github.com/CPJKU/msmd

    If you use this dataset, please cite:
    [1] Matthias Dorfer, Jan Hajič jr., Andreas Arzt, Harald Frostel, Gerhard Widmer.
    Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification (PDF).
    Transactions of the International Society for Music Information Retrieval, issue 1, 2018.

  3. Bank Account Fraud Dataset Suite (NeurIPS 2022)

    • kaggle.com
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sérgio Jesus (2023). Bank Account Fraud Dataset Suite (NeurIPS 2022) [Dataset]. https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sérgio Jesus
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Bank Account Fraud (BAF) suite of datasets has been published at NeurIPS 2022 and it comprises a total of 6 different synthetic bank account fraud tabular datasets. BAF is a realistic, complete, and robust test bed to evaluate novel and existing methods in ML and fair ML, and the first of its kind!

    This suite of datasets is: - Realistic, based on a present-day real-world dataset for fraud detection; - Biased, each dataset has distinct controlled types of bias; - Imbalanced, this setting presents a extremely low prevalence of positive class; - Dynamic, with temporal data and observed distribution shifts;
    - Privacy preserving, to protect the identity of potential applicants we have applied differential privacy techniques (noise addition), feature encoding and trained a generative model (CTGAN).

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2F4271ec763b04362801df2660c6e2ec30%2FScreenshot%20from%202022-11-29%2017-42-41.png?generation=1669743799938811&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Faf502caf5b9e370b869b85c9d4642c5c%2FScreenshot%20from%202022-12-15%2015-17-59.png?generation=1671117525527314&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Ff3789bd484ee392d648b7809429134df%2FScreenshot%20from%202022-11-29%2017-40-58.png?generation=1669743681526133&alt=media" alt="">

    Each dataset is composed of: - 1 million instances; - 30 realistic features used in the fraud detection use-case; - A column of “month”, providing temporal information about the dataset; - Protected attributes, (age group, employment status and % income).

    Detailed information (datasheet) on the suite: https://github.com/feedzai/bank-account-fraud/blob/main/documents/datasheet.pdf

    Check out the github repository for more resources and some example notebooks: https://github.com/feedzai/bank-account-fraud

    Read the NeurIPS 2022 paper here: https://arxiv.org/abs/2211.13358

    Learn more about Feedzai Research here: https://research.feedzai.com/

    Please, use the following citation of BAF dataset suite @article{jesusTurningTablesBiased2022, title={Turning the {{Tables}}: {{Biased}}, {{Imbalanced}}, {{Dynamic Tabular Datasets}} for {{ML Evaluation}}}, author={Jesus, S{\'e}rgio and Pombal, Jos{\'e} and Alves, Duarte and Cruz, Andr{\'e} and Saleiro, Pedro and Ribeiro, Rita P. and Gama, Jo{\~a}o and Bizarro, Pedro}, journal={Advances in Neural Information Processing Systems}, year={2022} }

  4. VasTexture: Vast repository of textures and PBR Materials extracted from...

    • zenodo.org
    jpeg, zip
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). VasTexture: Vast repository of textures and PBR Materials extracted from images using unsupervised approach [Dataset]. http://doi.org/10.5281/zenodo.11391127
    Explore at:
    zip, jpegAvailable download formats
    Dataset updated
    Apr 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    VasTexture: Vast repository of textures and SVBRDF/PBR Materials extracted from images using an unsupervised approach.

    This is an old version For latest version: https://zenodo.org/records/12629301

    This dataset contains hundreds of thousands (hopefully millions soon) of textures and PBR/SV-BRDF materials extracted from real-world natural images.

    The repository is composed of RGB images of textures given as RGB images (each image is one uniform texture) and folders of PBR/SVBRDF materials given as a set of property maps (base color, roughness, metallic, etc).

    Visualisation of sampled PBRs and Textures can be seen in: PBR_examples.jpg and Textures_Examples.jpg

    Link to the main project page

    Link to paper

    File structure

    Texture images are given in the Extracted_textures_*.zip files.

    Each image in this zip file is a single texture, the textures were extracted and cropped from the open images dataset.

    PBR Materials are available in PBR_*.zip files these PBRs were generated from the texture images in an unsupervised way (with no human intervention). Each subfolder in this file contains the properties map of the PBRs (roughness, metallic, etc, suitable for blender/unreal engine). Visualization of the rendered material appears in the file Material_View.jpg in each PBR folder.

    PBR materials that were generated by mixing other PBR materials are available in files with the names PBR_mix*.zip

    Samples for each case can be found in files named: Sample_*.zip

    Documented code used to extract the textures and generate the PBRs is available at:

    Texture_And_Material_ExtractionCode_And_Documentation.zip

    Details:

    The materials and textures were extracted from real-world images using an unsupervised extraction method (code supplied). As such they are far more diverse and wide in scope compared to existing repositories, at the same time they are much more noisy and contain more outliers compared to existing repositories. This repository is probably more useful for things that demand large-scale and very diverse data, yet can use noisy and lower quality compared to professional repositories with manually made assets like ambientCG. It can be very useful for creating machine learning datasets, or large-scale procedural generation. It is less suitable for areas that demand precise clean and categorized PBR like CGI art and graphic design. For preview It is recommended to look at PBR_examples.jpg and Textures_Examples.jpg or download the Sample files and look at the Material_View.jpg files to visualize the quality of the materials.

    Scale:

    Currently, there are a few hundred of thousands PBR materials and textures but the goal is to make this into over a million in the near future.

    Data generation code:

    The Python scripts used to extract these assets are supplied at:

    Texture_And_Material_ExtractionCode_And_Documentation.zip

    The code could be run in any folder of random images extract regions with uniform textures and turn these into PBR materials.

    Alternative download sources:

    Alternative download sources:

    https://sites.google.com/view/infinitexture/home

    https://e.pcloud.link/publink/show?code=kZON5TZtxLfdvKrVCzn12NADBFRNuCKHm70

    https://icedrive.net/s/jfY1xSDNkVwtYDYD4FN5wha2A8Pz

    Paper

    This work was done as part of the paper "Learning Zero-Shot Material States Segmentation,

    by Implanting Natural Image Patterns in Synthetic Data".

    @article{eppel2024learning,

    title={Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data},

    author={Eppel, Sagi and Li, Jolina and Drehwald, Manuel and Aspuru-Guzik, Alan},

    journal={arXiv preprint arXiv:2403.03309},

    year={2024}

    }

    License:

    All the code and repositories are available on CC0 (free to use) licenses.

    Textures were extracted from the open images dataset which is an Apache license.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Adrian Thummerer; Adrian Thummerer; Erik van der Bijl; Erik van der Bijl; Arthur Jr. Galapon; Arthur Jr. Galapon; Florian Kamp; Florian Kamp; Matteo Maspero; Matteo Maspero (2025). SynthRAD2025 Grand Challenge dataset: generating synthetic CT for radiotherapy [Dataset]. http://doi.org/10.5281/zenodo.14918089
Organization logo

SynthRAD2025 Grand Challenge dataset: generating synthetic CT for radiotherapy

Explore at:
pdfAvailable download formats
Dataset updated
Jul 2, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adrian Thummerer; Adrian Thummerer; Erik van der Bijl; Erik van der Bijl; Arthur Jr. Galapon; Arthur Jr. Galapon; Florian Kamp; Florian Kamp; Matteo Maspero; Matteo Maspero
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Time period covered
Mar 1, 2025
Description

Dataset Description

Dataset Structure

A detailed description available in "SynthRAD2025_dataset_description.pdf". A paper describing the dataset has been submitted to Medical Physics and is available as pre-print at: https://arxiv.org/abs/2502.17609" target="_blank" rel="noopener">https://arxiv.org/abs/2502.17609. The dataset is divided into two tasks:

  • Task 1 (MRI-to-CT conversion) is provided in Task1.zip.
  • Task 2 (CBCT-to-CT conversion) is provided in Task2.zip.

After extraction, the dataset is organized as follows:

Within each task, cases are categorized into three anatomical regions:

  • Head-and-neck (HN)
  • Thorax (TH)
  • Abdomen (AB)

Each anatomical region contains individual patient folders, named using a unique seven-letter alphanumeric code:
[Task Number][Anatomy][Center][PatientID]
Example: 1HNA001

Each patient folder in the training dataset contains (for other sets see Table below):

  • ct.mha: preprocessed CT image
  • mr.mha or cbct.mha (depending on the task): preprocessed MR or CBCT image
  • mask.mha: Binary mask of the patient outline (dilated)

An overview folder within each anatomical region contains:

  • [task]_[anatomy]_parameters.xlsx: Imaging protocol details for each patient.
  • [task][anatomy][center][PatientID]_overview.png: A visualization of axial, coronal, and sagittal slices of CBCT/MR, CT, mask, and difference images.

Dataset Overview

The SynthRAD2025 dataset is part of the second edition of the SynthRAD deep learning challenge (https://synthrad2025.grand-challenge.org/), which benchmarks synthetic CT generation for MRI- and CBCT-based radiotherapy workflows.

  • Task 1: MRI-to-CT conversion for MR-only and MR-guided photon/proton radiotherapy, consisting of 890 MRI-CT pairs.
  • Task 2: CBCT-to-CT conversion for daily adaptive radiotherapy workflows, consisting of 1,472 CBCT-CT pairs.

Imaging data was collected from five European university medical centers:

  • Netherlands: UMC Groningen, UMC Utrecht, Radboud UMC
  • Germany: LMU Klinikum Munich, UK Cologne

All centers have independently approved the study in accordance with their institutional review boards or medical ethics committee regulations.

Inclusion criteria:

  • Patients treated with external beam radiotherapy (photon or proton therapy) at one of the data-providing centers.
  • Imaging data available from one of the three anatomical regions.
  • No restrictions on age, sex, tumor characteristics, or staging.

License

The dataset is provided under two different licenses:

  • Data from centers A, B, C, and E is provided under a CC-BY-NC 4.0 International License (creativecommons.org/licenses /by-nc/4.0/).
  • Data from center D is provided with a limited license which permits it's use only for the duration of the challenge and remains valid only while the challenge is active (Limited Use License Center D). By downloading Center D's data, participants agree to these terms. Once the challenge ends, access to the data ends, the download link will be deactivated, and all downloaded data must be deleted. After requesting participation in the challenge on the SynthRAD2025 website, participants can access the download link for center D at https://synthrad2025.grand-challenge.org/data/.

Data Release Schedule

Subset

Files

Release Date

Link

Training

Input, CT, Mask

01-03-2025

https://doi.org/10.5281/zenodo.14918213

Training Center D

Input, CT, Mask

01-03-2025

Check the download link at:
https://synthrad2025.grand-challenge.org/data/

Limited use License:
License

Validation Input

Input, Mask

01-06-2025

https://doi.org/10.5281/zenodo.14918504

Validation Input Center D

Input, Mask

01-06-2025

Check the download link at:
https://synthrad2025.grand-challenge.org/data/

Limited use License:
License

Validation Ground Truth

CT, Deformed CT

01-03-2030

https://doi.org/10.5281/zenodo.14918605

Test

Input, CT, Deformed CT, Mask

01-03-2030

https://doi.org/10.5281/zenodo.14918722

Dataset Composition

The number of cases collected at each center for training, validation, and test sets.

Training Set

<td style="width:

TaskCenterHNTHABTotal
1A919165247
B09191182
C6501984
D650065
E0000
Total221182175578
2A656564195
B656565195
C656362190
D656353181
E656565
Search
Clear search
Close search
Google apps
Main menu