34 datasets found
  1. NeurIPS 2021 dataset

    • figshare.com
    hdf
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Zappia (2024). NeurIPS 2021 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25958374.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jul 28, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Luke Zappia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt

  2. h

    neurips-2025-papers

    • huggingface.co
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huy Dang (2025). neurips-2025-papers [Dataset]. https://huggingface.co/datasets/huyxdang/neurips-2025-papers
    Explore at:
    Dataset updated
    Nov 19, 2025
    Authors
    Huy Dang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    NeurIPS 2025 Papers Dataset

    This dataset contains all accepted papers from NeurIPS 2025, scraped from OpenReview.

      Dataset Statistics
    
    
    
    
    
      Overview
    

    Total Papers: 5772 Unique Paper IDs: 5772 ✅ No duplicate IDs

      Track Distribution
    

    Main Track: 5,275 papers (91.4%) Datasets and Benchmarks Track: 497 papers (8.6%)

      Award Distribution
    

    Poster: 4,949 papers (85.7%) Oral: 84 papers (1.5%) Spotlight: 739 papers (12.8%)

      Track × Award… See the full description on the dataset page: https://huggingface.co/datasets/huyxdang/neurips-2025-papers.
    
  3. h

    ConViS-Bench

    • huggingface.co
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    submission1335 (2025). ConViS-Bench [Dataset]. https://huggingface.co/datasets/submission1335/ConViS-Bench
    Explore at:
    Dataset updated
    Sep 20, 2025
    Authors
    submission1335
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset is associated to submission 1335 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). See the provided README for information about dataset downloading and running the evaluations.

  4. Data from: Re-assembling the past: The RePAIR dataset and benchmark for real...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    txt, zip
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue (2024). Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving [Dataset]. http://doi.org/10.5281/zenodo.13993089
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue
    Description

    Accepted by NeurIPS 2024 Datasets and Benchmarks Track

    We introduce the RePair puzzle-solving dataset, a large-scale real world dataset of fractured frescoes from the archaelogical campus of Pompeii. Our dataset consists of over 1000 fractured frescoes. The RePAIR stands as a realistic computational challenge for methods for 2D and 3D puzzle solving, and serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding. Please visit our website for more dataset information, access to source code scripts and for an interactive gallery viewing of the dataset samples.

    Access the entire dataset

    We provide a compressed version of our dataset in two seperate files. One for the 2D version and one for the 3D version.

    Our full dataset contains over one thousand individual fractured fragments divided into groups with its corresponding folder and all compressed into their individual sub-set format regarding whether they are 2D or 3D. Regarding the 2D dataset, each fragment is saved as a .PNG image and each group has the corresponding ground truth transformation to solve the puzzle as a .TXT file. Considering the 3D dataset, each fragment is saved as a mesh using the widely .OBJ format with the corresponding material (.MTL) and texture (.PNG) file. The meshes are already in the assembled position and orientation, so that no additional information is needed. All additional metadata information are given as .JSON files.

    Important Note

    Please be advised that downloading and reusing this dataset is permitted only upon acceptance of the following license terms.

    The Istituto Italiano di Tecnologia (IIT) declares, and the user (“User”) acknowledges, that the "RePAIR puzzle-solving dataset" contains 3D scans, texture maps, rendered images and meta-data of fresco fragments acquired at the Archaeological Site of Pompeii. IIT is authorised to publish the RePAIR puzzle-solving dataset herein only for scientific and cultural purposes and in connection with an academic publication referenced as Tsemelis et al., "Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving", NeurIPS 2024. Use of the RePAIR puzzle-solving dataset by User is limited to downloading, viewing such images; comparing these with data or content in other datasets. User is not authorised to use, in particular explicitly excluding any commercial use nor in conjunction with the promotion of a commercial enterprise and/or its product(s) or service(s), reproduce, copy, distribute the RePAIR puzzle-solving dataset. User will not use the RePAIR puzzle-solving dataset in any way prohibited by applicable laws. RePAIR puzzle-solving dataset therein is being provided to User without warranty of any kind, either expressed or implied. User will be solely responsible for their use of such RePAIR puzzle-solving dataset. In no event shall IIT be liable for any damages arising from such use.

  5. Human bone marrow mononuclear cells

    • figshare.com
    hdf
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marius Lange (2025). Human bone marrow mononuclear cells [Dataset]. http://doi.org/10.6084/m9.figshare.28302875.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jan 29, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Marius Lange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The gene expression portion of the NeurIPS 2021 challenge 10x multiome dataset (Luecken et al., NeurIPS datasets and benchmarks track 2021), originally obtained from GEO. Contains single-cell gene expression of 69,249 cells for 13,431 genes. The adata.X field contains normalized data and adata.layers['counts'] contains raw expression values. We computed a latent space using scANVI (Xu et al., MSB 2021), following their tutorial.

  6. h

    mmlongbench-doc-results

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IXCLab@Shanghai AI Lab, mmlongbench-doc-results [Dataset]. https://huggingface.co/datasets/OpenIXCLab/mmlongbench-doc-results
    Explore at:
    Dataset authored and provided by
    IXCLab@Shanghai AI Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📊 MMLongBench-Doc Evaluation Results

    Official evaluation results: GPT-4.1 (2025-04-14) & GPT-4o (2024-11-20) 📄 Paper: MMLongBench-Doc, NeurIPS 2024 Datasets and Benchmarks Track (Spotlight)

  7. TreeFinder

    • kaggle.com
    zip
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bibi 9 (2025). TreeFinder [Dataset]. https://www.kaggle.com/datasets/zhihaow/tree-finder
    Explore at:
    zip(1965923052 bytes)Available download formats
    Dataset updated
    Oct 24, 2025
    Authors
    Bibi 9
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🌲 TreeFinder: TreeFinder: A US-Scale Benchmark Dataset for Individual Tree Mortality Monitoring Using High-Resolution Aerial Imagery

    Accepted to NeurIPS 2025 (Datasets & Benchmarks Track)

    TreeFinder is the first large-scale, high-resolution benchmark dataset for mapping individual dead trees across the contiguous United States (CONUS). Built to advance computer vision methods for ecological monitoring and carbon assessment, TreeFinder provides pixel-level annotations of dead trees from high-resolution aerial imagery, enriched with ecological metadata and paired with performance benchmarks.

    📦 What's in the Dataset?

    • 1,000 Sites across 48 U.S. States
      • Spatially diverse sampling of forested regions across CONUS
    • 23,000 Hectares of 0.6m NAIP Imagery
      • Aerial imagery from the National Agriculture Imagery Program (NAIP), including 4 channels (RGB + NIR)
    • 20,000+ Manually Annotated Dead Trees
      • Pixel-level masks created through expert labeling and validated via multi-temporal image comparison
    • ML-Ready Patches
      • Each raw scene is tiled into $224 \times 224$ patches for deep learning, with associated segmentation masks

    🧠 Benchmark Models

    We provide benchmark performance results using five semantic segmentation models, including: - U-Net and DeepLabV3+ (CNN-based) - ViT, SegFormer, and Mask2Former (Transformer-based) - DOFA (a multimodal foundation model trained on satellite data) Each model is trained and evaluated across various domain generalization settings (e.g., region, climate, forest type) to test robustness.

    🗺️ Metadata & Scenarios

    Each patch is enriched with: - Geographic Coordinates - Köppen–Geiger Climate Zone - Primary Tree Type (from USDA Forest Service maps)

    These metadata enable benchmarking under challenging scenarios like: - Cross-region generalization (e.g., East → West) - Climate domain shifts - Forest type transfer

    🧪 Why Use TreeFinder?

    TreeFinder enables the development and evaluation of machine learning models for high-impact environmental tasks such as: - Forest health monitoring - Carbon flux modeling - Wildfire risk assessment It is designed to foster cross-disciplinary collaboration between the machine learning and Earth science communities by providing a reproducible, challenging, and ecologically grounded benchmark.

    📚 Citation

    If you use TreeFinder in your research, please cite the following paper:

    Zhihao Wang, Cooper Li, Ruichen Wang, Lei Ma, George Hurtt, Xiaowei Jia, Gengchen Mai, Zhili Li, Yiqun Xie.
    TreeFinder: A US-Scale Benchmark Dataset for Individual Tree Mortality Monitoring Using High-Resolution Aerial Imagery.
    In Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), Datasets and Benchmarks Track, 2025.

  8. LagrangeBench Datasets

    • zenodo.org
    zip
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur P. Toshev; Nikolaus A. Adams; Artur P. Toshev; Nikolaus A. Adams (2023). LagrangeBench Datasets [Dataset]. http://doi.org/10.5281/zenodo.10021926
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Artur P. Toshev; Nikolaus A. Adams; Artur P. Toshev; Nikolaus A. Adams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets from the Paper LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite at NeurIPS 2023 Track on Datasets and Benchmarks.

  9. Breaking Bad Dataset

    • kaggle.com
    zip
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dazitu616 (2023). Breaking Bad Dataset [Dataset]. https://www.kaggle.com/datasets/dazitu616/breaking-bad-dataset/code
    Explore at:
    zip(1150713325 bytes)Available download formats
    Dataset updated
    Feb 16, 2023
    Authors
    Dazitu616
    Description

    Dataset accompanying the NeurIPS 2022 Dataset and Benchmark Track paper: Breaking Bad: A Dataset for Geometric Fracture and Reassembly. Please refer to our project page for more details.

    License: The Breaking Bad dataset collects 3D meshes from ShapeNet and Thingi10K thus inheriting their terms of use. Please refer to ShapeNet and Thingi10K for more details. We release each model in our dataset with an as-permissive-as-possible license compatible with its underlying base model. Please refer to ShapeNet and Thingi10K for restrictions and depositor requirements of each model.

  10. h

    MedSG-Bench

    • huggingface.co
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MedSG-Bench (2025). MedSG-Bench [Dataset]. https://huggingface.co/datasets/MedSG-Bench/MedSG-Bench
    Explore at:
    Dataset updated
    Sep 28, 2025
    Authors
    MedSG-Bench
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🖥 MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

    📖 Paper | 💻 Code | 🤗 Dataset

    🔥 MedSG-Bench is accepted at NeurIPS 2025 Datasets and Benchmarks Track as a Spotlight.

      MedSG-Bench
    

    MedSG-Bench is the first benchmark for medical image sequences grounding. 👉 We also provide MedSG-188K, a grounding instruction-tuning dataset. 👉 MedSeq-Grounder, the model trained on MedSG-188K, is available here.

      Metadata
    

    This dataset… See the full description on the dataset page: https://huggingface.co/datasets/MedSG-Bench/MedSG-Bench.

  11. Data from: Datasets for a data-centric image classification benchmark for...

    • zenodo.org
    • openagrar.de
    txt, zip
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch (2023). Datasets for a data-centric image classification benchmark for noisy and ambiguous label estimation [Dataset]. http://doi.org/10.5281/zenodo.7180818
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch
    Description

    This is the official data repository of the Data-Centric Image Classification (DCIC) Benchmark. The goal of this benchmark is to measure the impact of tuning the dataset instead of the model for a variety of image classification datasets. Full details about the collection process, the structure and automatic download at

    Paper: https://arxiv.org/abs/2207.06214

    Source Code: https://github.com/Emprime/dcic

    The license information is given below as download.

    Citation

    Please cite as

    @article{schmarje2022benchmark,
      author = {Schmarje, Lars and Grossmann, Vasco and Zelenka, Claudius and Dippel, Sabine and Kiko, Rainer and Oszust, Mariusz and Pastell, Matti and Stracke, Jenny and Valros, Anna and Volkmann, Nina and Koch, Reinahrd},
      journal = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
      title = {{Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation}},
      year = {2022}
    }

    Please see the full details about the used datasets below, which should also be cited as part of the license.

    @article{schoening2020Megafauna,
    author = {Schoening, T and Purser, A and Langenk{\"{a}}mper, D and Suck, I and Taylor, J and Cuvelier, D and Lins, L and Simon-Lled{\'{o}}, E and Marcon, Y and Jones, D O B and Nattkemper, T and K{\"{o}}ser, K and Zurowietz, M and Greinert, J and Gomes-Pereira, J},
    doi = {10.5194/bg-17-3115-2020},
    journal = {Biogeosciences},
    number = {12},
    pages = {3115--3133},
    title = {{Megafauna community assessment of polymetallic-nodule fields with cameras: platform and methodology comparison}},
    volume = {17},
    year = {2020}
    }
    
    @article{Langenkamper2020GearStudy,
    author = {Langenk{\"{a}}mper, Daniel and van Kevelaer, Robin and Purser, Autun and Nattkemper, Tim W},
    doi = {10.3389/fmars.2020.00506},
    issn = {2296-7745},
    journal = {Frontiers in Marine Science},
    title = {{Gear-Induced Concept Drift in Marine Images and Its Effect on Deep Learning Classification}},
    volume = {7},
    year = {2020}
    }
    
    
    @article{peterson2019cifar10h,
    author = {Peterson, Joshua and Battleday, Ruairidh and Griffiths, Thomas and Russakovsky, Olga},
    doi = {10.1109/ICCV.2019.00971},
    issn = {15505499},
    journal = {Proceedings of the IEEE International Conference on Computer Vision},
    pages = {9616--9625},
    title = {{Human uncertainty makes classification more robust}},
    volume = {2019-Octob},
    year = {2019}
    }
    
    @article{schmarje2019,
    author = {Schmarje, Lars and Zelenka, Claudius and Geisen, Ulf and Gl{\"{u}}er, Claus-C. and Koch, Reinhard},
    doi = {10.1007/978-3-030-33676-9_26},
    issn = {23318422},
    journal = {DAGM German Conference of Pattern Regocnition},
    number = {November},
    pages = {374--386},
    publisher = {Springer},
    title = {{2D and 3D Segmentation of uncertain local collagen fiber orientations in SHG microscopy}},
    volume = {11824 LNCS},
    year = {2019}
    }
    
    @article{schmarje2021foc,
    author = {Schmarje, Lars and Br{\"{u}}nger, Johannes and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Kiko, Rainer and Koch, Reinhard},
    doi = {10.3390/s21196661},
    issn = {1424-8220},
    journal = {Sensors},
    number = {19},
    pages = {6661},
    title = {{Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy}},
    volume = {21},
    year = {2021}
    }
    
    @article{schmarje2022dc3,
    author = {Schmarje, Lars and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Zelenka, Claudius and Kiko, Rainer and Stracke, Jenny and Volkmann, Nina and Koch, Reinhard},
    journal = {Proceedings of the European Conference on Computer Vision (ECCV)},
    title = {{A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering}},
    year = {2022}
    }
    
    
    @article{obuchowicz2020qualityMRI,
    author = {Obuchowicz, Rafal and Oszust, Mariusz and Piorkowski, Adam},
    doi = {10.1186/s12880-020-00505-z},
    issn = {1471-2342},
    journal = {BMC Medical Imaging},
    number = {1},
    pages = {109},
    title = {{Interobserver variability in quality assessment of magnetic resonance images}},
    volume = {20},
    year = {2020}
    }
    
    
    @article{stepien2021cnnQuality,
    author = {St{\c{e}}pie{\'{n}}, Igor and Obuchowicz, Rafa{\l} and Pi{\'{o}}rkowski, Adam and Oszust, Mariusz},
    doi = {10.3390/s21041043},
    issn = {1424-8220},
    journal = {Sensors},
    number = {4},
    title = {{Fusion of Deep Convolutional Neural Networks for No-Reference Magnetic Resonance Image Quality Assessment}},
    volume = {21},
    year = {2021}
    }
    
    @article{volkmann2021turkeys,
    author = {Volkmann, Nina and Br{\"{u}}nger, Johannes and Stracke, Jenny and Zelenka, Claudius and Koch, Reinhard and Kemper, Nicole and Spindler, Birgit},
    doi = {10.3390/ani11092655},
    journal = {Animals 2021},
    pages = {1--13},
    title = {{Learn to train: Improving training data for a neural network to detect pecking injuries in turkeys}},
    volume = {11},
    year = {2021}
    }
    
    @article{volkmann2022keypoint,
    author = {Volkmann, Nina and Zelenka, Claudius and Devaraju, Archana Malavalli and Br{\"{u}}nger, Johannes and Stracke, Jenny and Spindler, Birgit and Kemper, Nicole and Koch, Reinhard},
    doi = {10.3390/s22145188},
    issn = {1424-8220},
    journal = {Sensors},
    number = {14},
    pages = {5188},
    title = {{Keypoint Detection for Injury Identification during Turkey Husbandry Using Neural Networks}},
    volume = {22},
    year = {2022}
    }

  12. Z

    Data from: WikiDBs - A Large-Scale Corpus Of Relational Databases From...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vogel, Liane; Bodensohn, Jan-Micha; Binnig, Carsten (2024). WikiDBs - A Large-Scale Corpus Of Relational Databases From Wikidata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11559813
    Explore at:
    Dataset updated
    Dec 12, 2024
    Authors
    Vogel, Liane; Bodensohn, Jan-Micha; Binnig, Carsten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.

    WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.

    WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.

  13. h

    construct-validity-review

    • huggingface.co
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Bean (2025). construct-validity-review [Dataset]. https://huggingface.co/datasets/ambean/construct-validity-review
    Explore at:
    Dataset updated
    Jul 29, 2025
    Authors
    Andrew Bean
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the results of the review paper Measuring What Matters, presented at NeurIPS 2025 Datasets and Benchmarks Track.

  14. CREAK-Commonsense Reasoning over Entity Knowledge

    • kaggle.com
    zip
    Updated Jul 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haowen Wang (2023). CREAK-Commonsense Reasoning over Entity Knowledge [Dataset]. https://www.kaggle.com/datasets/hwwang98/creak-commonsense-reasoning-over-entity-knowledge
    Explore at:
    zip(973400 bytes)Available download formats
    Dataset updated
    Jul 17, 2023
    Authors
    Haowen Wang
    Description

    CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge

    This repository contains the data and code for the baseline described in the following paper:

    CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge
    Yasumasa Onoe, Michael J.Q. Zhang, Eunsol Choi, Greg Durrett
    NeurIPS 2021 Datasets and Benchmarks Track @article{onoe2021creak, title={CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge}, author={Onoe, Yasumasa and Zhang, Michael J.Q. and Choi, Eunsol and Durrett, Greg}, journal={OpenReview}, year={2021} }

    ***** [New] November 8th, 2021: The contrast set has been updated. *****

    We have increased the size of the contrast set to 500 examples. Please check the paper for new numbers.

    Datasets

    Data Files

    CREAK data files are located under data/creak.

    • train.json contains 10,176 training examples.
    • dev.json contains 1,371 development examples.
    • test_without_labels.json contains 1,371 test examples (labels are not included).
    • contrast_set.json contains 500 contrastive examples.

    The data files are formatted as jsonlines. Here is a single training example: { 'ex_id': 'train_1423', 'sentence': 'Lauryn Hill separates two valleys as it is located between them.', 'explanation': 'Lauren Hill is actually a person and not a mountain.', 'label': 'false', 'entity': 'Lauryn Hill', 'en_wiki_pageid': '162864', 'entity_mention_loc': [[0, 11]] }

    FieldDescription
    ex_idExample ID
    sentenceClaim
    explanationExplanation by the annotator why the claim is TRUE/FALSE
    labelLabel: 'true' or 'false'
    entitySeed entity
    en_wiki_pageidEnglish Wikipedia Page ID for the seed entity
    entity_mention_locLocation(s) of the seed entity in the claim

    Baselines

    See this README

    Leaderboards

    https://www.cs.utexas.edu/~yasumasa/creak/leaderboard.html

    We host results only for Closed-Book methods that have been finetuned on only In-Domain data.

    To submit your results, please send your system name and prediction files for the dev, test, and contrast sets to yasumasa@utexas.edu.

    Contact

    Please contact at yasumasa@utexas.edu if you have any questions.

  15. SentinelKilnDB

    • kaggle.com
    zip
    Updated Sep 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishabhsnip (2025). SentinelKilnDB [Dataset]. https://www.kaggle.com/datasets/rishabhsnip/sentinelkiln-dataset
    Explore at:
    zip(3803190363 bytes)Available download formats
    Dataset updated
    Sep 24, 2025
    Authors
    Rishabhsnip
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    SentinelKilnDB - A Large-Scale Dataset and Benchmark for OBB Brick Kiln Detection in South Asia Using Satellite Imagery

    NeurIPS 2025 Datasets & Benchmarks Track

    Abstract

    Air pollution was responsible for 2.6 million deaths across South Asia in 2021 alone, with brick manufacturing contributing significantly to this burden. In particular, the Indo-Gangetic Plain; a densely populated and highly polluted region spanning northern India, Pakistan, Bangladesh, and parts of Afghanistan sees brick kilns contributing 8–14% of ambient air pollution. Traditional monitoring approaches, such as field surveys and manual annotation using tools like Google Earth Pro, are time and labor-intensive. Prior ML-based efforts for automated detection have relied on costly high-resolution commercial imagery and non-public datasets, limiting reproducibility and scalability. In this work, we introduce SENTINELKILNDB, a publicly available, hand-validated benchmark of 62,671 brick kilns spanning three kiln types Fixed Chimney Bull’s Trench Kiln (FCBK), Circular FCBK (CFCBK), and Zigzag kilns—annotated with oriented bounding boxes (OBBs) across 2.8 million km2 using free and globally accessible Sentinel-2 imagery. We benchmark state-of-the-art oriented object detection models and evaluate generalization across in-region, out-of-region, and super-resolution settings. SENTINELKILNDB enables rigorous evaluation of geospatial generalization and robustness for low-resolution object detection, and provides a new testbed for ML models addressing real-world environmental and remote sensing challenges at a continental scale. Datasets and code are available in SentinelKiln Dataset and SentinelKiln Benchmark, under the Creative Commons Attribution–NonCommercial 4.0 International License.

    https://storage.googleapis.com/kagglesdsdata/datasets/8335452/13157427/statistics.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20251021%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20251021T155911Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=2237f1e6dc8cfa987f3555fbf2fdddf3b7bd43edc607186f509808f10bd2fc0c8290ece4a95262021ac1f74527dfb0ba161464f5290664602af9264f37ec37f7686192d706ba2d98161db9f6272dcc86ec5708f6453b758e962ef30e7f3b2eacec860b388d01dde7ac7e4a8d9a2d5724c8b53b35dd37fb5fddf64b8575f74ab3bde126c23c2d0cc0623f63b274b3aeba860c28db0455b5928a1f2f260e81c3af6efeb6cf9c146af36927cfca080b783de29ae6225bf44dbe05245af84712e06ab06de6f4e2b42904361b121aec0e88cd6b82c5fb6846b36254be3ccd1a363647b3a8b8908e020526ed2d8520aa3ec56eddda2cfd7a5ec1702f620d7e5e20d6e0" alt="Statistics">

    Useful Links

    Project Page - https://lnkd.in/dn2SKwWv
    Official Paper - https://neurips.cc/virtual/2025/poster/121530
    Github - https://github.com/rishabh-mondal/NeurIPS_2025
    Sustainability Lab - https://sustainability-lab.github.io

    For questions or collaborations, please contact:

    Rishabh Mondal - rishabh.mondal@iitgn.ac.in
    Nipun Batra - nipun.batra@iitgn.ac.in

    Dataset Overview

    This dataset contains Sentinel-2 satellite imagery focused on identifying and classifying brick kilns across the Indo-Gangetic Plain and neighboring South Asian countries, including Afghanistan, Pakistan, and Bangladesh.

    • Imagery Source: Sentinel-2 (Surface Reflectance)
    • Image Size: 128 × 128 pixels
    • Spatial Resolution: 10 m/pixel
    • Timeframe: November 2023 – February 2024
    • Geographic Coverage: Indo-Gangetic Plain, Afghanistan, Pakistan, Bangladesh
    • Overlap: 30-pixel overlap between patches
    • File Naming Convention: lat,lon.png and lat,lon.txt

    Classes

    • CFCBK – Continuous Fixed Chimney Bull’s Trench Kiln
    • FCBK – Fixed Chimney Bull’s Trench Kiln
    • Zigzag – Zigzag Kiln

    Annotation Formats

    • YOLO OBB:
      class_name, x1, y1, x2, y2, x3, y3, x4, y4

    • YOLO AA:
      class_name, x_center, y_center, width, height

    • DOTA Format:
      x1, y1, x2, y2, x3, y3, x4, y4, class_name, difficult

    Dataset Splits

    The dataset is split using a class-wise stratified approach for balanced representation.

    SplitImages (.png)Label Files (.txt)No. of BBoxes
    Train71,85647,21463,787
    Val23,95215,73821,042
    Test18,49210,27812,819
    Total114,30073,23997,648

    Each split contains separate folders for images and annotations:

    dataset/
    ├── train/
    │  ├── images/
    │  └── labels/
    ├── val/
    │  ├── images/
    │  └── labels/
    └── test/
      ├── images/
      └── labels/
    

    ...

  16. D

    PDEBench Datasets

    • darus.uni-stuttgart.de
    • opendatalab.com
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Makoto Takamoto; Timothy Praditia; Raphael Leiteritz; Dan MacKinlay; Francesco Alesiani; Dirk Pflüger; Mathias Niepert (2024). PDEBench Datasets [Dataset]. http://doi.org/10.18419/DARUS-2986
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    DaRUS
    Authors
    Makoto Takamoto; Timothy Praditia; Raphael Leiteritz; Dan MacKinlay; Francesco Alesiani; Dirk Pflüger; Mathias Niepert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    This dataset contains benchmark data, generated with numerical simulation based on different PDEs, namely 1D advection, 1D Burgers', 1D and 2D diffusion-reaction, 1D diffusion-sorption, 1D, 2D, and 3D compressible Navier-Stokes, 2D Darcy flow, and 2D shallow water equation. This dataset is intended to progress the scientific ML research area. In general, the data are stored in HDF5 format, with the array dimensions packed according to the convention [b,t,x1,...,xd,v], where b is the batch size (i.e. number of samples), t is the time dimension, x1,...,xd are the spatial dimensions, and v is the number of channels (i.e. number of variables of interest). More detailed information are also provided in our Github repository (https://github.com/pdebench/PDEBench) and our submitting paper to NeurIPS 2022 Benchmark track.

  17. Z

    Data from: LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive...

    • data.niaid.nih.gov
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junjue, Wang; Zhuo, Zheng; Ailong, Ma; Xiaoyan, Lu; Yanfei, Zhong (2024). LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5706577
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Wuhan University
    Authors
    Junjue, Wang; Zhuo, Zheng; Ailong, Ma; Xiaoyan, Lu; Yanfei, Zhong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The benchmark code is available at: https://github.com/Junjue-Wang/LoveDA

    Highlights:

    5987 high spatial resolution (0.3 m) remote sensing images from Nanjing, Changzhou, and Wuhan

    Focus on different geographical environments between Urban and Rural

    Advance both semantic segmentation and domain adaptation tasks

    Three considerable challenges: multi-scale objects, complex background samples, and inconsistent class distributions

    Reference:

    @inproceedings{wang2021loveda, title={Love{DA}: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation}, author={Junjue Wang and Zhuo Zheng and Ailong Ma and Xiaoyan Lu and Yanfei Zhong}, booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks}, editor = {J. Vanschoren and S. Yeung}, year={2021}, volume = {1}, pages = {}, url={https://datasets-benchmarks proceedings.neurips.cc/paper/2021/file/4e732ced3463d06de0ca9a15b6153677-Paper-round2.pdf} }

    License:

    The owners of the data and of the copyright on the data are RSIDEA, Wuhan University. Use of the Google Earth images must respect the "Google Earth" terms of use. All images and their associated annotations in LoveDA can be used for academic purposes only, but any commercial use is prohibited. (CC BY-NC-SA 4.0)

  18. h

    NoRA-1.1

    • huggingface.co
    Updated Oct 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anirban Das (2025). NoRA-1.1 [Dataset]. https://huggingface.co/datasets/axd353/NoRA-1.1
    Explore at:
    Dataset updated
    Oct 20, 2025
    Authors
    Anirban Das
    License

    Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
    License information was derived automatically

    Description

    NoRA-1.1 Dataset

    License: Creative Commons Attribution–NonCommercial 2.0 (CC BY-NC 2.0) This dataset is part of the When No Paths Lead to Rome benchmark for systematic neural relational reasoning,accepted for NeurIPS 2025 Datasets and Benchmarks Track.
    It includes one training split and three test splits (test_d_na, test_bl_na, and test_opec_na).

  19. LITHOS-DATASET

    • kaggle.com
    zip
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paola Ruiz Puentes (2025). LITHOS-DATASET [Dataset]. https://www.kaggle.com/datasets/paolaruizpuentes/lithos-dataset
    Explore at:
    zip(20029690253 bytes)Available download formats
    Dataset updated
    May 14, 2025
    Authors
    Paola Ruiz Puentes
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Companion dataset to the paper “Towards Automated Petrography,” accepted to the NeurIPS 2025 Datasets and Benchmarks track.

    The largest and most diverse publicly available experimental framework for automated petrography. LITHOS includes 211,604 high-resolution RGB patches of polarized light and 105,802 expert-annotated grains across 25 mineral categories. Each annotation includes the mineral class, spatial coordinates, expert-measured major and minor axes, capturing grain geometry and orientation.

  20. k

    FAD: A Chinese Dataset for Fake Audio Detection

    • dataon.kisti.re.kr
    • data.niaid.nih.gov
    • +1more
    Updated Jun 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haoxin Ma;Jiangyan Yi (2022). FAD: A Chinese Dataset for Fake Audio Detection [Dataset]. https://dataon.kisti.re.kr/search/view.do?mode=view&svcId=de34c2d5f0649d30185d71299b5ef977
    Explore at:
    Dataset updated
    Jun 9, 2022
    Authors
    Haoxin Ma;Jiangyan Yi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a
    Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for
    noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for
    audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging.
    The FAD dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/FAD
    The FAD dataset is designed to evaluate the methods of fake audio detection and fake algorithms recognition and other relevant studies. To better study the robustness of the methods under noisy
    conditions when applied in real life, we construct the corresponding noisy dataset. The total FAD dataset consists of two versions: clean version and noisy version. Both versions are divided into
    disjoint training, development and test sets in the same way. There is no speaker overlap across these three subsets. Each test sets is further divided into seen and unseen test sets. Unseen test sets can
    evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audios and fake audios in the unseen test set are unknown to the model.
    For the noisy speech part, we select three noise database for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the
    remaining subsets come from different noise databases. In each version of FAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. More detailed statistics are demonstrated in the Tabel 2. Clean Real Audios Collection
    From the point of eliminating the interference of irrelevant factors, we collect clean real audios from
    two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset. Clean Fake Audios Generation
    We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios. Noisy Audios Simulation
    Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different
    SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes. This data set is licensed with a CC BY-NC-ND 4.0 license.
    You can cite the data using the following BibTeX entry:
    @inproceedings{ma2022fad,
    title={FAD: A Chinese Dataset for Fake Audio Detection},
    author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xunrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu},
    booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks },
    year={2022},
    }

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Luke Zappia (2024). NeurIPS 2021 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25958374.v1
Organization logoOrganization logo

NeurIPS 2021 dataset

Explore at:
hdfAvailable download formats
Dataset updated
Jul 28, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Luke Zappia
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt

Search
Clear search
Close search
Google apps
Main menu