34 datasets found
  1. NeurIPS 2021 dataset

    • figshare.com
    hdf
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Zappia (2024). NeurIPS 2021 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25958374.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jul 28, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Luke Zappia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt

  2. h

    ConViS-Bench

    • huggingface.co
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    submission1335 (2025). ConViS-Bench [Dataset]. https://huggingface.co/datasets/submission1335/ConViS-Bench
    Explore at:
    Dataset updated
    Sep 20, 2025
    Authors
    submission1335
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset is associated to submission 1335 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). See the provided README for information about dataset downloading and running the evaluations.

  3. h

    neurips-2025-papers

    • huggingface.co
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huy Dang (2025). neurips-2025-papers [Dataset]. https://huggingface.co/datasets/huyxdang/neurips-2025-papers
    Explore at:
    Dataset updated
    Nov 19, 2025
    Authors
    Huy Dang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    NeurIPS 2025 Papers Dataset

    This dataset contains all accepted papers from NeurIPS 2025, scraped from OpenReview.

      Dataset Statistics
    
    
    
    
    
      Overview
    

    Total Papers: 5772 Unique Paper IDs: 5772 ✅ No duplicate IDs

      Track Distribution
    

    Main Track: 5,275 papers (91.4%) Datasets and Benchmarks Track: 497 papers (8.6%)

      Award Distribution
    

    Poster: 4,949 papers (85.7%) Oral: 84 papers (1.5%) Spotlight: 739 papers (12.8%)

      Track × Award… See the full description on the dataset page: https://huggingface.co/datasets/huyxdang/neurips-2025-papers.
    
  4. Data from: Re-assembling the past: The RePAIR dataset and benchmark for real...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    txt, zip
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue (2024). Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving [Dataset]. http://doi.org/10.5281/zenodo.13993089
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue
    Description

    Accepted by NeurIPS 2024 Datasets and Benchmarks Track

    We introduce the RePair puzzle-solving dataset, a large-scale real world dataset of fractured frescoes from the archaelogical campus of Pompeii. Our dataset consists of over 1000 fractured frescoes. The RePAIR stands as a realistic computational challenge for methods for 2D and 3D puzzle solving, and serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding. Please visit our website for more dataset information, access to source code scripts and for an interactive gallery viewing of the dataset samples.

    Access the entire dataset

    We provide a compressed version of our dataset in two seperate files. One for the 2D version and one for the 3D version.

    Our full dataset contains over one thousand individual fractured fragments divided into groups with its corresponding folder and all compressed into their individual sub-set format regarding whether they are 2D or 3D. Regarding the 2D dataset, each fragment is saved as a .PNG image and each group has the corresponding ground truth transformation to solve the puzzle as a .TXT file. Considering the 3D dataset, each fragment is saved as a mesh using the widely .OBJ format with the corresponding material (.MTL) and texture (.PNG) file. The meshes are already in the assembled position and orientation, so that no additional information is needed. All additional metadata information are given as .JSON files.

    Important Note

    Please be advised that downloading and reusing this dataset is permitted only upon acceptance of the following license terms.

    The Istituto Italiano di Tecnologia (IIT) declares, and the user (“User”) acknowledges, that the "RePAIR puzzle-solving dataset" contains 3D scans, texture maps, rendered images and meta-data of fresco fragments acquired at the Archaeological Site of Pompeii. IIT is authorised to publish the RePAIR puzzle-solving dataset herein only for scientific and cultural purposes and in connection with an academic publication referenced as Tsemelis et al., "Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving", NeurIPS 2024. Use of the RePAIR puzzle-solving dataset by User is limited to downloading, viewing such images; comparing these with data or content in other datasets. User is not authorised to use, in particular explicitly excluding any commercial use nor in conjunction with the promotion of a commercial enterprise and/or its product(s) or service(s), reproduce, copy, distribute the RePAIR puzzle-solving dataset. User will not use the RePAIR puzzle-solving dataset in any way prohibited by applicable laws. RePAIR puzzle-solving dataset therein is being provided to User without warranty of any kind, either expressed or implied. User will be solely responsible for their use of such RePAIR puzzle-solving dataset. In no event shall IIT be liable for any damages arising from such use.

  5. Human bone marrow mononuclear cells

    • figshare.com
    hdf
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marius Lange (2025). Human bone marrow mononuclear cells [Dataset]. http://doi.org/10.6084/m9.figshare.28302875.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jan 29, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Marius Lange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The gene expression portion of the NeurIPS 2021 challenge 10x multiome dataset (Luecken et al., NeurIPS datasets and benchmarks track 2021), originally obtained from GEO. Contains single-cell gene expression of 69,249 cells for 13,431 genes. The adata.X field contains normalized data and adata.layers['counts'] contains raw expression values. We computed a latent space using scANVI (Xu et al., MSB 2021), following their tutorial.

  6. h

    MedSG-Bench

    • huggingface.co
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MedSG-Bench (2025). MedSG-Bench [Dataset]. https://huggingface.co/datasets/MedSG-Bench/MedSG-Bench
    Explore at:
    Dataset updated
    Sep 28, 2025
    Authors
    MedSG-Bench
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🖥 MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

    📖 Paper | 💻 Code | 🤗 Dataset

    🔥 MedSG-Bench is accepted at NeurIPS 2025 Datasets and Benchmarks Track as a Spotlight.

      MedSG-Bench
    

    MedSG-Bench is the first benchmark for medical image sequences grounding. 👉 We also provide MedSG-188K, a grounding instruction-tuning dataset. 👉 MedSeq-Grounder, the model trained on MedSG-188K, is available here.

      Metadata
    

    This dataset… See the full description on the dataset page: https://huggingface.co/datasets/MedSG-Bench/MedSG-Bench.

  7. Breaking Bad Dataset

    • kaggle.com
    zip
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dazitu616 (2023). Breaking Bad Dataset [Dataset]. https://www.kaggle.com/datasets/dazitu616/breaking-bad-dataset/code
    Explore at:
    zip(1150713325 bytes)Available download formats
    Dataset updated
    Feb 16, 2023
    Authors
    Dazitu616
    Description

    Dataset accompanying the NeurIPS 2022 Dataset and Benchmark Track paper: Breaking Bad: A Dataset for Geometric Fracture and Reassembly. Please refer to our project page for more details.

    License: The Breaking Bad dataset collects 3D meshes from ShapeNet and Thingi10K thus inheriting their terms of use. Please refer to ShapeNet and Thingi10K for more details. We release each model in our dataset with an as-permissive-as-possible license compatible with its underlying base model. Please refer to ShapeNet and Thingi10K for restrictions and depositor requirements of each model.

  8. TreeFinder

    • kaggle.com
    zip
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bibi 9 (2025). TreeFinder [Dataset]. https://www.kaggle.com/datasets/zhihaow/tree-finder
    Explore at:
    zip(1965923052 bytes)Available download formats
    Dataset updated
    Oct 24, 2025
    Authors
    Bibi 9
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🌲 TreeFinder: TreeFinder: A US-Scale Benchmark Dataset for Individual Tree Mortality Monitoring Using High-Resolution Aerial Imagery

    Accepted to NeurIPS 2025 (Datasets & Benchmarks Track)

    TreeFinder is the first large-scale, high-resolution benchmark dataset for mapping individual dead trees across the contiguous United States (CONUS). Built to advance computer vision methods for ecological monitoring and carbon assessment, TreeFinder provides pixel-level annotations of dead trees from high-resolution aerial imagery, enriched with ecological metadata and paired with performance benchmarks.

    📦 What's in the Dataset?

    • 1,000 Sites across 48 U.S. States
      • Spatially diverse sampling of forested regions across CONUS
    • 23,000 Hectares of 0.6m NAIP Imagery
      • Aerial imagery from the National Agriculture Imagery Program (NAIP), including 4 channels (RGB + NIR)
    • 20,000+ Manually Annotated Dead Trees
      • Pixel-level masks created through expert labeling and validated via multi-temporal image comparison
    • ML-Ready Patches
      • Each raw scene is tiled into $224 \times 224$ patches for deep learning, with associated segmentation masks

    🧠 Benchmark Models

    We provide benchmark performance results using five semantic segmentation models, including: - U-Net and DeepLabV3+ (CNN-based) - ViT, SegFormer, and Mask2Former (Transformer-based) - DOFA (a multimodal foundation model trained on satellite data) Each model is trained and evaluated across various domain generalization settings (e.g., region, climate, forest type) to test robustness.

    🗺️ Metadata & Scenarios

    Each patch is enriched with: - Geographic Coordinates - Köppen–Geiger Climate Zone - Primary Tree Type (from USDA Forest Service maps)

    These metadata enable benchmarking under challenging scenarios like: - Cross-region generalization (e.g., East → West) - Climate domain shifts - Forest type transfer

    🧪 Why Use TreeFinder?

    TreeFinder enables the development and evaluation of machine learning models for high-impact environmental tasks such as: - Forest health monitoring - Carbon flux modeling - Wildfire risk assessment It is designed to foster cross-disciplinary collaboration between the machine learning and Earth science communities by providing a reproducible, challenging, and ecologically grounded benchmark.

    📚 Citation

    If you use TreeFinder in your research, please cite the following paper:

    Zhihao Wang, Cooper Li, Ruichen Wang, Lei Ma, George Hurtt, Xiaowei Jia, Gengchen Mai, Zhili Li, Yiqun Xie.
    TreeFinder: A US-Scale Benchmark Dataset for Individual Tree Mortality Monitoring Using High-Resolution Aerial Imagery.
    In Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), Datasets and Benchmarks Track, 2025.

  9. LagrangeBench Datasets

    • zenodo.org
    zip
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur P. Toshev; Nikolaus A. Adams; Artur P. Toshev; Nikolaus A. Adams (2023). LagrangeBench Datasets [Dataset]. http://doi.org/10.5281/zenodo.10021926
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Artur P. Toshev; Nikolaus A. Adams; Artur P. Toshev; Nikolaus A. Adams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets from the Paper LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite at NeurIPS 2023 Track on Datasets and Benchmarks.

  10. Data from: Datasets for a data-centric image classification benchmark for...

    • zenodo.org
    • openagrar.de
    txt, zip
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch (2023). Datasets for a data-centric image classification benchmark for noisy and ambiguous label estimation [Dataset]. http://doi.org/10.5281/zenodo.7180818
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch
    Description

    This is the official data repository of the Data-Centric Image Classification (DCIC) Benchmark. The goal of this benchmark is to measure the impact of tuning the dataset instead of the model for a variety of image classification datasets. Full details about the collection process, the structure and automatic download at

    Paper: https://arxiv.org/abs/2207.06214

    Source Code: https://github.com/Emprime/dcic

    The license information is given below as download.

    Citation

    Please cite as

    @article{schmarje2022benchmark,
      author = {Schmarje, Lars and Grossmann, Vasco and Zelenka, Claudius and Dippel, Sabine and Kiko, Rainer and Oszust, Mariusz and Pastell, Matti and Stracke, Jenny and Valros, Anna and Volkmann, Nina and Koch, Reinahrd},
      journal = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
      title = {{Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation}},
      year = {2022}
    }

    Please see the full details about the used datasets below, which should also be cited as part of the license.

    @article{schoening2020Megafauna,
    author = {Schoening, T and Purser, A and Langenk{\"{a}}mper, D and Suck, I and Taylor, J and Cuvelier, D and Lins, L and Simon-Lled{\'{o}}, E and Marcon, Y and Jones, D O B and Nattkemper, T and K{\"{o}}ser, K and Zurowietz, M and Greinert, J and Gomes-Pereira, J},
    doi = {10.5194/bg-17-3115-2020},
    journal = {Biogeosciences},
    number = {12},
    pages = {3115--3133},
    title = {{Megafauna community assessment of polymetallic-nodule fields with cameras: platform and methodology comparison}},
    volume = {17},
    year = {2020}
    }
    
    @article{Langenkamper2020GearStudy,
    author = {Langenk{\"{a}}mper, Daniel and van Kevelaer, Robin and Purser, Autun and Nattkemper, Tim W},
    doi = {10.3389/fmars.2020.00506},
    issn = {2296-7745},
    journal = {Frontiers in Marine Science},
    title = {{Gear-Induced Concept Drift in Marine Images and Its Effect on Deep Learning Classification}},
    volume = {7},
    year = {2020}
    }
    
    
    @article{peterson2019cifar10h,
    author = {Peterson, Joshua and Battleday, Ruairidh and Griffiths, Thomas and Russakovsky, Olga},
    doi = {10.1109/ICCV.2019.00971},
    issn = {15505499},
    journal = {Proceedings of the IEEE International Conference on Computer Vision},
    pages = {9616--9625},
    title = {{Human uncertainty makes classification more robust}},
    volume = {2019-Octob},
    year = {2019}
    }
    
    @article{schmarje2019,
    author = {Schmarje, Lars and Zelenka, Claudius and Geisen, Ulf and Gl{\"{u}}er, Claus-C. and Koch, Reinhard},
    doi = {10.1007/978-3-030-33676-9_26},
    issn = {23318422},
    journal = {DAGM German Conference of Pattern Regocnition},
    number = {November},
    pages = {374--386},
    publisher = {Springer},
    title = {{2D and 3D Segmentation of uncertain local collagen fiber orientations in SHG microscopy}},
    volume = {11824 LNCS},
    year = {2019}
    }
    
    @article{schmarje2021foc,
    author = {Schmarje, Lars and Br{\"{u}}nger, Johannes and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Kiko, Rainer and Koch, Reinhard},
    doi = {10.3390/s21196661},
    issn = {1424-8220},
    journal = {Sensors},
    number = {19},
    pages = {6661},
    title = {{Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy}},
    volume = {21},
    year = {2021}
    }
    
    @article{schmarje2022dc3,
    author = {Schmarje, Lars and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Zelenka, Claudius and Kiko, Rainer and Stracke, Jenny and Volkmann, Nina and Koch, Reinhard},
    journal = {Proceedings of the European Conference on Computer Vision (ECCV)},
    title = {{A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering}},
    year = {2022}
    }
    
    
    @article{obuchowicz2020qualityMRI,
    author = {Obuchowicz, Rafal and Oszust, Mariusz and Piorkowski, Adam},
    doi = {10.1186/s12880-020-00505-z},
    issn = {1471-2342},
    journal = {BMC Medical Imaging},
    number = {1},
    pages = {109},
    title = {{Interobserver variability in quality assessment of magnetic resonance images}},
    volume = {20},
    year = {2020}
    }
    
    
    @article{stepien2021cnnQuality,
    author = {St{\c{e}}pie{\'{n}}, Igor and Obuchowicz, Rafa{\l} and Pi{\'{o}}rkowski, Adam and Oszust, Mariusz},
    doi = {10.3390/s21041043},
    issn = {1424-8220},
    journal = {Sensors},
    number = {4},
    title = {{Fusion of Deep Convolutional Neural Networks for No-Reference Magnetic Resonance Image Quality Assessment}},
    volume = {21},
    year = {2021}
    }
    
    @article{volkmann2021turkeys,
    author = {Volkmann, Nina and Br{\"{u}}nger, Johannes and Stracke, Jenny and Zelenka, Claudius and Koch, Reinhard and Kemper, Nicole and Spindler, Birgit},
    doi = {10.3390/ani11092655},
    journal = {Animals 2021},
    pages = {1--13},
    title = {{Learn to train: Improving training data for a neural network to detect pecking injuries in turkeys}},
    volume = {11},
    year = {2021}
    }
    
    @article{volkmann2022keypoint,
    author = {Volkmann, Nina and Zelenka, Claudius and Devaraju, Archana Malavalli and Br{\"{u}}nger, Johannes and Stracke, Jenny and Spindler, Birgit and Kemper, Nicole and Koch, Reinhard},
    doi = {10.3390/s22145188},
    issn = {1424-8220},
    journal = {Sensors},
    number = {14},
    pages = {5188},
    title = {{Keypoint Detection for Injury Identification during Turkey Husbandry Using Neural Networks}},
    volume = {22},
    year = {2022}
    }

  11. h

    construct-validity-review

    • huggingface.co
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Bean (2025). construct-validity-review [Dataset]. https://huggingface.co/datasets/ambean/construct-validity-review
    Explore at:
    Dataset updated
    Jul 29, 2025
    Authors
    Andrew Bean
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the results of the review paper Measuring What Matters, presented at NeurIPS 2025 Datasets and Benchmarks Track.

  12. CREAK-Commonsense Reasoning over Entity Knowledge

    • kaggle.com
    zip
    Updated Jul 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haowen Wang (2023). CREAK-Commonsense Reasoning over Entity Knowledge [Dataset]. https://www.kaggle.com/datasets/hwwang98/creak-commonsense-reasoning-over-entity-knowledge
    Explore at:
    zip(973400 bytes)Available download formats
    Dataset updated
    Jul 17, 2023
    Authors
    Haowen Wang
    Description

    CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge

    This repository contains the data and code for the baseline described in the following paper:

    CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge
    Yasumasa Onoe, Michael J.Q. Zhang, Eunsol Choi, Greg Durrett
    NeurIPS 2021 Datasets and Benchmarks Track @article{onoe2021creak, title={CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge}, author={Onoe, Yasumasa and Zhang, Michael J.Q. and Choi, Eunsol and Durrett, Greg}, journal={OpenReview}, year={2021} }

    ***** [New] November 8th, 2021: The contrast set has been updated. *****

    We have increased the size of the contrast set to 500 examples. Please check the paper for new numbers.

    Datasets

    Data Files

    CREAK data files are located under data/creak.

    • train.json contains 10,176 training examples.
    • dev.json contains 1,371 development examples.
    • test_without_labels.json contains 1,371 test examples (labels are not included).
    • contrast_set.json contains 500 contrastive examples.

    The data files are formatted as jsonlines. Here is a single training example: { 'ex_id': 'train_1423', 'sentence': 'Lauryn Hill separates two valleys as it is located between them.', 'explanation': 'Lauren Hill is actually a person and not a mountain.', 'label': 'false', 'entity': 'Lauryn Hill', 'en_wiki_pageid': '162864', 'entity_mention_loc': [[0, 11]] }

    FieldDescription
    ex_idExample ID
    sentenceClaim
    explanationExplanation by the annotator why the claim is TRUE/FALSE
    labelLabel: 'true' or 'false'
    entitySeed entity
    en_wiki_pageidEnglish Wikipedia Page ID for the seed entity
    entity_mention_locLocation(s) of the seed entity in the claim

    Baselines

    See this README

    Leaderboards

    https://www.cs.utexas.edu/~yasumasa/creak/leaderboard.html

    We host results only for Closed-Book methods that have been finetuned on only In-Domain data.

    To submit your results, please send your system name and prediction files for the dev, test, and contrast sets to yasumasa@utexas.edu.

    Contact

    Please contact at yasumasa@utexas.edu if you have any questions.

  13. Z

    Data from: WikiDBs - A Large-Scale Corpus Of Relational Databases From...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vogel, Liane; Bodensohn, Jan-Micha; Binnig, Carsten (2024). WikiDBs - A Large-Scale Corpus Of Relational Databases From Wikidata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11559813
    Explore at:
    Dataset updated
    Dec 12, 2024
    Authors
    Vogel, Liane; Bodensohn, Jan-Micha; Binnig, Carsten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.

    WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.

    WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.

  14. h

    mmlongbench-doc-results

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IXCLab@Shanghai AI Lab, mmlongbench-doc-results [Dataset]. https://huggingface.co/datasets/OpenIXCLab/mmlongbench-doc-results
    Explore at:
    Dataset authored and provided by
    IXCLab@Shanghai AI Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📊 MMLongBench-Doc Evaluation Results

    Official evaluation results: GPT-4.1 (2025-04-14) & GPT-4o (2024-11-20) 📄 Paper: MMLongBench-Doc, NeurIPS 2024 Datasets and Benchmarks Track (Spotlight)

  15. LITHOS-DATASET

    • kaggle.com
    zip
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paola Ruiz Puentes (2025). LITHOS-DATASET [Dataset]. https://www.kaggle.com/datasets/paolaruizpuentes/lithos-dataset
    Explore at:
    zip(20029690253 bytes)Available download formats
    Dataset updated
    May 14, 2025
    Authors
    Paola Ruiz Puentes
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Companion dataset to the paper “Towards Automated Petrography,” accepted to the NeurIPS 2025 Datasets and Benchmarks track.

    The largest and most diverse publicly available experimental framework for automated petrography. LITHOS includes 211,604 high-resolution RGB patches of polarized light and 105,802 expert-annotated grains across 25 mineral categories. Each annotation includes the mineral class, spatial coordinates, expert-measured major and minor axes, capturing grain geometry and orientation.

  16. D

    PDEBench Datasets

    • darus.uni-stuttgart.de
    • opendatalab.com
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Makoto Takamoto; Timothy Praditia; Raphael Leiteritz; Dan MacKinlay; Francesco Alesiani; Dirk Pflüger; Mathias Niepert (2024). PDEBench Datasets [Dataset]. http://doi.org/10.18419/DARUS-2986
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    DaRUS
    Authors
    Makoto Takamoto; Timothy Praditia; Raphael Leiteritz; Dan MacKinlay; Francesco Alesiani; Dirk Pflüger; Mathias Niepert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    This dataset contains benchmark data, generated with numerical simulation based on different PDEs, namely 1D advection, 1D Burgers', 1D and 2D diffusion-reaction, 1D diffusion-sorption, 1D, 2D, and 3D compressible Navier-Stokes, 2D Darcy flow, and 2D shallow water equation. This dataset is intended to progress the scientific ML research area. In general, the data are stored in HDF5 format, with the array dimensions packed according to the convention [b,t,x1,...,xd,v], where b is the batch size (i.e. number of samples), t is the time dimension, x1,...,xd are the spatial dimensions, and v is the number of channels (i.e. number of variables of interest). More detailed information are also provided in our Github repository (https://github.com/pdebench/PDEBench) and our submitting paper to NeurIPS 2022 Benchmark track.

  17. OLIVES - VIP CUP 2023

    • kaggle.com
    zip
    Updated Aug 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salman Khondker (2023). OLIVES - VIP CUP 2023 [Dataset]. https://www.kaggle.com/datasets/salmankhondker/olives-vip-cup-2023
    Explore at:
    zip(34320657194 bytes)Available download formats
    Dataset updated
    Aug 31, 2023
    Authors
    Salman Khondker
    Description

    Original Dataset

    https://zenodo.org/records/7105232

    Citation

    @inproceedings{prabhushankarolives2022,
    title={OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics},
    author={Prabhushankar, Mohit and Kokilepersaud, Kiran and Logan, Yash-yee and Trejo Corona, Stephanie and AlRegib, Ghassan and Wykoff, Charles},
    booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2 (NeurIPS Datasets and Benchmarks 2022) },
    year={2022}
    }
    
  18. h

    PartNeXt_raw

    • huggingface.co
    Updated Oct 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Penghao Wang (2025). PartNeXt_raw [Dataset]. https://huggingface.co/datasets/AuWang/PartNeXt_raw
    Explore at:
    Dataset updated
    Oct 20, 2025
    Authors
    Penghao Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    [Neurips 2025 DB] PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding

    Official dataset release for PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding.

    Penghao Wang, Yiyang He, Xin Lv, Yukai Zhou, Lan Xu, Jingyi Yu, Jiayuan Gu† ShanghaiTech University Neurips 2025 Dataset and Benchmark Track | Project Page | Paper | Dataset | Dataset Toolkit | Benchmark code (Soon) | Annotation code (Soon) |… See the full description on the dataset page: https://huggingface.co/datasets/AuWang/PartNeXt_raw.

  19. E

    BuckTales : A multi-UAV dataset for multi-object tracking and...

    • edmond.mpg.de
    mp4, zip
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar (2024). BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes [Dataset]. http://doi.org/10.17617/3.JCZ9WK
    Explore at:
    zip(65010277544), mp4(403189785), zip(3287471192), zip(457749126), mp4(130172114), zip(17011998466)Available download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    Edmond
    Authors
    Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The dataset contains UAV footage of wild antelopes (blackbucks) in grassland habitats. It can be mainly used for two tasks: Multi-object tracking (MOT) and Re-Identification (Re-ID). We provide annotations for the position of animals in each frame, allowing us to offer very long videos (up to 3 min) completely annotated while maintaining the identity of each animal in the video. The Re-ID dataset offers two videos, that capture the movement of some animals simultaneously from two different UAVs. The Re-ID task is to find the same individual in two videos taken simultaneously from a slightly different perspective. The relevant paper will be published in the NeurIPS 2024 Dataset and Benchmarking Track. https://nips.cc/virtual/2024/poster/97563 Resolution: 5.4 K MOT: 12 videos ( MOT17 Format) Re-ID: 6 sets (each with a pair of drones) (Custom) Detection: 320 Images (COCO, YOLO)

  20. FAD: A Chinese Dataset for Fake Audio Detection

    • zenodo.org
    • dataon.kisti.re.kr
    • +1more
    bin, zip
    Updated Jul 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haoxin Ma; Jiangyan Yi; Haoxin Ma; Jiangyan Yi (2023). FAD: A Chinese Dataset for Fake Audio Detection [Dataset]. http://doi.org/10.5281/zenodo.6641573
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jul 9, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Haoxin Ma; Jiangyan Yi; Haoxin Ma; Jiangyan Yi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a
    Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for
    noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for
    audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging.
    The FAD dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/FAD


    The FAD dataset is designed to evaluate the methods of fake audio detection and fake algorithms recognition and other relevant studies. To better study the robustness of the methods under noisy
    conditions when applied in real life, we construct the corresponding noisy dataset. The total FAD dataset consists of two versions: clean version and noisy version. Both versions are divided into
    disjoint training, development and test sets in the same way. There is no speaker overlap across these three subsets. Each test sets is further divided into seen and unseen test sets. Unseen test sets can
    evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audios and fake audios in the unseen test set are unknown to the model.
    For the noisy speech part, we select three noise database for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the
    remaining subsets come from different noise databases. In each version of FAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. More detailed statistics are demonstrated in the Tabel 2.

    Clean Real Audios Collection
    From the point of eliminating the interference of irrelevant factors, we collect clean real audios from
    two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset.

    Clean Fake Audios Generation
    We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios.

    Noisy Audios Simulation
    Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different
    SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes.

    This data set is licensed with a CC BY-NC-ND 4.0 license.
    You can cite the data using the following BibTeX entry:
    @inproceedings{ma2022fad,
    title={FAD: A Chinese Dataset for Fake Audio Detection},
    author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu},
    booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks },
    year={2022},
    }

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Luke Zappia (2024). NeurIPS 2021 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25958374.v1
Organization logoOrganization logo

NeurIPS 2021 dataset

Explore at:
hdfAvailable download formats
Dataset updated
Jul 28, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Luke Zappia
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt

Search
Clear search
Close search
Google apps
Main menu