37 datasets found

T
Annotations for "A Systematic Review of NeurIPS Dataset Management...
dataverse.tdl.org
tsv
Updated Nov 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiwei Wu; Hanlin Li; Hanlin Li; Yiwei Wu (2024). Annotations for "A Systematic Review of NeurIPS Dataset Management Practices" [Dataset]. http://doi.org/10.18738/T8/HLTRQP
Explore at:
tsv(91869)Available download formats
Unique identifier
https://doi.org/10.18738/T8/HLTRQP
Dataset updated
Nov 5, 2024
Dataset provided by
Texas Data Repository
Authors
Yiwei Wu; Hanlin Li; Hanlin Li; Yiwei Wu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This file includes our annotations of 238 dataset papers published at the NeurIPS Datasets and Benchmarks Track. A full report of our findings can be found at https://arxiv.org/abs/2411.00266
f
NeurIPS 2021 Benchmark dataset
figshare.com
hdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
scverse; Malte Luecken (2023). NeurIPS 2021 Benchmark dataset [Dataset]. http://doi.org/10.6084/m9.figshare.22716739.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22716739.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
scverse; Malte Luecken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Subset of the benchmark dataset published in Luecken et al. (2021).
h
ConViS-Bench
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
submission1335, ConViS-Bench [Dataset]. https://huggingface.co/datasets/submission1335/ConViS-Bench
Explore at:
Authors
submission1335
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset is associated to submission 1335 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). See the provided README for information about dataset downloading and running the evaluations.
repliqa
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ServiceNow, repliqa [Dataset]. https://huggingface.co/datasets/ServiceNow/repliqa
Explore at:
Dataset authored and provided by
ServiceNowhttp://servicenow.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RepLiQA - Repository of Likely Question-Answer for benchmarking

NeurIPS Datasets presentation

Dataset Summary

RepLiQA is an evaluation dataset that contains Context-Question-Answer triplets, where contexts are non-factual but natural-looking documents about made up entities such as people or places that do not exist in reality. RepLiQA is human-created, and designed to test for the ability of Large Language Models (LLMs) to find and use contextual information in provided… See the full description on the dataset page: https://huggingface.co/datasets/ServiceNow/repliqa.
f
Human bone marrow mononuclear cells
figshare.com
hdf
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marius Lange (2025). Human bone marrow mononuclear cells [Dataset]. http://doi.org/10.6084/m9.figshare.28302875.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28302875.v1
Dataset updated
Jan 29, 2025
Dataset provided by
figshare
Authors
Marius Lange
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The gene expression portion of the NeurIPS 2021 challenge 10x multiome dataset (Luecken et al., NeurIPS datasets and benchmarks track 2021), originally obtained from GEO. Contains single-cell gene expression of 69,249 cells for 13,431 genes. The adata.X field contains normalized data and adata.layers['counts'] contains raw expression values. We computed a latent space using scANVI (Xu et al., MSB 2021), following their tutorial.

Data from: Datasets for a data-centric image classification benchmark for...

zenodo.org
openagrar.de

txt, zip

Updated Jul 5, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch (2023). Datasets for a data-centric image classification benchmark for noisy and ambiguous label estimation [Dataset]. http://doi.org/10.5281/zenodo.7180818

Explore at:

zip, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7180818

Dataset updated

Jul 5, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Description

This is the official data repository of the Data-Centric Image Classification (DCIC) Benchmark. The goal of this benchmark is to measure the impact of tuning the dataset instead of the model for a variety of image classification datasets. Full details about the collection process, the structure and automatic download at

Paper: https://arxiv.org/abs/2207.06214

Source Code: https://github.com/Emprime/dcic

The license information is given below as download.

Citation

Please cite as

@article{schmarje2022benchmark,
  author = {Schmarje, Lars and Grossmann, Vasco and Zelenka, Claudius and Dippel, Sabine and Kiko, Rainer and Oszust, Mariusz and Pastell, Matti and Stracke, Jenny and Valros, Anna and Volkmann, Nina and Koch, Reinahrd},
  journal = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
  title = {{Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation}},
  year = {2022}
}

Please see the full details about the used datasets below, which should also be cited as part of the license.

@article{schoening2020Megafauna,
author = {Schoening, T and Purser, A and Langenk{\"{a}}mper, D and Suck, I and Taylor, J and Cuvelier, D and Lins, L and Simon-Lled{\'{o}}, E and Marcon, Y and Jones, D O B and Nattkemper, T and K{\"{o}}ser, K and Zurowietz, M and Greinert, J and Gomes-Pereira, J},
doi = {10.5194/bg-17-3115-2020},
journal = {Biogeosciences},
number = {12},
pages = {3115--3133},
title = {{Megafauna community assessment of polymetallic-nodule fields with cameras: platform and methodology comparison}},
volume = {17},
year = {2020}
}

@article{Langenkamper2020GearStudy,
author = {Langenk{\"{a}}mper, Daniel and van Kevelaer, Robin and Purser, Autun and Nattkemper, Tim W},
doi = {10.3389/fmars.2020.00506},
issn = {2296-7745},
journal = {Frontiers in Marine Science},
title = {{Gear-Induced Concept Drift in Marine Images and Its Effect on Deep Learning Classification}},
volume = {7},
year = {2020}
}


@article{peterson2019cifar10h,
author = {Peterson, Joshua and Battleday, Ruairidh and Griffiths, Thomas and Russakovsky, Olga},
doi = {10.1109/ICCV.2019.00971},
issn = {15505499},
journal = {Proceedings of the IEEE International Conference on Computer Vision},
pages = {9616--9625},
title = {{Human uncertainty makes classification more robust}},
volume = {2019-Octob},
year = {2019}
}

@article{schmarje2019,
author = {Schmarje, Lars and Zelenka, Claudius and Geisen, Ulf and Gl{\"{u}}er, Claus-C. and Koch, Reinhard},
doi = {10.1007/978-3-030-33676-9_26},
issn = {23318422},
journal = {DAGM German Conference of Pattern Regocnition},
number = {November},
pages = {374--386},
publisher = {Springer},
title = {{2D and 3D Segmentation of uncertain local collagen fiber orientations in SHG microscopy}},
volume = {11824 LNCS},
year = {2019}
}

@article{schmarje2021foc,
author = {Schmarje, Lars and Br{\"{u}}nger, Johannes and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Kiko, Rainer and Koch, Reinhard},
doi = {10.3390/s21196661},
issn = {1424-8220},
journal = {Sensors},
number = {19},
pages = {6661},
title = {{Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy}},
volume = {21},
year = {2021}
}

@article{schmarje2022dc3,
author = {Schmarje, Lars and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Zelenka, Claudius and Kiko, Rainer and Stracke, Jenny and Volkmann, Nina and Koch, Reinhard},
journal = {Proceedings of the European Conference on Computer Vision (ECCV)},
title = {{A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering}},
year = {2022}
}


@article{obuchowicz2020qualityMRI,
author = {Obuchowicz, Rafal and Oszust, Mariusz and Piorkowski, Adam},
doi = {10.1186/s12880-020-00505-z},
issn = {1471-2342},
journal = {BMC Medical Imaging},
number = {1},
pages = {109},
title = {{Interobserver variability in quality assessment of magnetic resonance images}},
volume = {20},
year = {2020}
}


@article{stepien2021cnnQuality,
author = {St{\c{e}}pie{\'{n}}, Igor and Obuchowicz, Rafa{\l} and Pi{\'{o}}rkowski, Adam and Oszust, Mariusz},
doi = {10.3390/s21041043},
issn = {1424-8220},
journal = {Sensors},
number = {4},
title = {{Fusion of Deep Convolutional Neural Networks for No-Reference Magnetic Resonance Image Quality Assessment}},
volume = {21},
year = {2021}
}

@article{volkmann2021turkeys,
author = {Volkmann, Nina and Br{\"{u}}nger, Johannes and Stracke, Jenny and Zelenka, Claudius and Koch, Reinhard and Kemper, Nicole and Spindler, Birgit},
doi = {10.3390/ani11092655},
journal = {Animals 2021},
pages = {1--13},
title = {{Learn to train: Improving training data for a neural network to detect pecking injuries in turkeys}},
volume = {11},
year = {2021}
}

@article{volkmann2022keypoint,
author = {Volkmann, Nina and Zelenka, Claudius and Devaraju, Archana Malavalli and Br{\"{u}}nger, Johannes and Stracke, Jenny and Spindler, Birgit and Kemper, Nicole and Koch, Reinhard},
doi = {10.3390/s22145188},
issn = {1424-8220},
journal = {Sensors},
number = {14},
pages = {5188},
title = {{Keypoint Detection for Injury Identification during Turkey Husbandry Using Neural Networks}},
volume = {22},
year = {2022}
}

h
ProactiveBench
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
submission1331 (2025). ProactiveBench [Dataset]. https://huggingface.co/datasets/submission1331/ProactiveBench
Explore at:
Dataset updated
May 11, 2025
Authors
submission1331
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This is the benchmark associated with submission 1331 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). The .jsonl files do not contain proper image paths but rather image path templates, as each .jsonl entry is a sample, and each sample corresponds to a different environment with its own images. See the submitted code README for information about dataset downloading and preprocessing, and to… See the full description on the dataset page: https://huggingface.co/datasets/submission1331/ProactiveBench.
H
NIPS2025D&B_AMPBenchmark
dataverse.harvard.edu
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boyao Wan (2025). NIPS2025D&B_AMPBenchmark [Dataset]. http://doi.org/10.7910/DVN/E9A88D
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/E9A88D
Dataset updated
May 16, 2025
Dataset provided by
Harvard Dataverse
Authors
Boyao Wan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dataset for "A Benchmark for Antimicrobial Peptide Recognition Based on Structure and Sequence Representation" at NeurIPS 2025 Dataset and Benchmark
h
NLID
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GWDx (2025). NLID [Dataset]. https://huggingface.co/datasets/GWDx/NLID
Explore at:
Dataset updated
May 28, 2025
Authors
GWDx
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NLID: A Large-Scale Neuromorphic Liquid Identification Dataset

NeurIPS 2025 Datasets and Benchmarks Track paper: Bubbles Talk: A Neuromorphic Dataset for Liquid Identification from Pouring process.
Data from: LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive...
zenodo.org
data.niaid.nih.gov
pdf, zip
Updated Jul 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang Junjue; Wang Junjue; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei (2024). LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation [Dataset]. http://doi.org/10.5281/zenodo.5706578
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5706578
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wang Junjue; Wang Junjue; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The benchmark code is available at: https://github.com/Junjue-Wang/LoveDA

Highlights:

5987 high spatial resolution (0.3 m) remote sensing images from Nanjing, Changzhou, and Wuhan

Focus on different geographical environments between Urban and Rural

Advance both semantic segmentation and domain adaptation tasks

Three considerable challenges: multi-scale objects, complex background samples, and inconsistent class distributions

Reference:

@inproceedings{wang2021loveda, title={Love{DA}: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation}, author={Junjue Wang and Zhuo Zheng and Ailong Ma and Xiaoyan Lu and Yanfei Zhong}, booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks}, editor = {J. Vanschoren and S. Yeung}, year={2021}, volume = {1}, pages = {}, url={https://datasets-benchmarks proceedings.neurips.cc/paper/2021/file/4e732ced3463d06de0ca9a15b6153677-Paper-round2.pdf} }

License:

The owners of the data and of the copyright on the data are RSIDEA, Wuhan University. Use of the Google Earth images must respect the "Google Earth" terms of use. All images and their associated annotations in LoveDA can be used for academic purposes only, but any commercial use is prohibited. (CC BY-NC-SA 4.0)
o
Data from: Amos: A large-scale abdominal multi-organ benchmark for versatile...
explore.openaire.eu
zenodo.org
Updated Nov 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JI YUANFENG (2022). Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation [Dataset]. http://doi.org/10.5281/zenodo.7155724
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7155724
Dataset updated
Nov 28, 2022
Authors
JI YUANFENG
Description
Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. The paper can be found at https://arxiv.org/pdf/2206.08023.pdf In addition to providing the labeled 600 CT and MRI scans, we expect to provide 2000 CT and 1200 MRI scans without labels to support more learning tasks (semi-supervised, un-supervised, domain adaption, ...). The link can be found in: labeled data (500CT+100MRI) unlabeled data Part I (900CT) unlabeled data Part II (1100CT) (Now there are 1000CT, we will replenish to 1100CT) unlabeled data Part III (1200MRI) if you found this dataset useful for your research, please cite: @inproceedings{NEURIPS2022_ee604e1b, author = {Ji, Yuanfeng and Bai, Haotian and GE, Chongjian and Yang, Jie and Zhu, Ye and Zhang, Ruimao and Li, Zhen and Zhanng, Lingyan and Ma, Wanling and Wan, Xiang and Luo, Ping}, booktitle = {Advances in Neural Information Processing Systems}, editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh}, pages = {36722--36732}, publisher = {Curran Associates, Inc.}, title = {AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation}, url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/ee604e1bedbd069d9fc9328b7b9584be-Paper-Datasets_and_Benchmarks.pdf}, volume = {35}, year = {2022} }
h
TMGBench
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haochuan Wang (2025). TMGBench [Dataset]. https://huggingface.co/datasets/pinkex/TMGBench
Explore at:
Dataset updated
May 28, 2025
Authors
Haochuan Wang
Description
TMGBench: TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs

This repository contains the code, data, and metadata for our NeurIPS 2025 Datasets and Benchmarks submission: TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs. The benchmark evaluates the strategic reasoning of large language models using 2x2 matrix games with narrative contexts and theory-of-mind variations.

Directory Structure… See the full description on the dataset page: https://huggingface.co/datasets/pinkex/TMGBench.
P
OCW Dataset
paperswithcode.com
library.toponeai.link
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saeid Naeini; Raeid Saqur; Mozhgan Saeidi; John Giorgi; Babak Taati, OCW Dataset [Dataset]. https://paperswithcode.com/dataset/only-connect-wall-ocw-dataset
Explore at:
Authors
Saeid Naeini; Raeid Saqur; Mozhgan Saeidi; John Giorgi; Babak Taati
Description
The OCW dataset is for evaluating creative problem solving tasks by curating the problems and human performance results from the popular British quiz show Only Connect.

The OCW dataset contains 618 connecting wall puzzles and solutions in total from 15 seasons of the show. Each show episode has two walls.

The dataset has two tasks: Task 1 (Grouping), and Task 2 (Connections) are identical to the quiz-show’s human participant tasks.

Task 1 (Groupings) is evaluated via six metrics: number of solved walls, number of correct groups (max. four per wall), Adjusted Mutual Information (AMI), Adjusted Rand Index (ARI), Fowlkes Mallows Score (FMS), and Wasserstein Distance (WD), normalized to (0, 1) range, between predicted and ground-truth labels.

Task 2 (Connections) is evaluated with three metrics: exact string matching, ROUGE-1 F1, and BERTScore F1.

Baseline results with pre-trained language models and with few-shot In-context Learning (ICL) with LLMs such as GPT-4 are available here:

"Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset" Saeid Alavi Naeini, Raeid Saqur, Mozhgan Saeidi, John Giorgi, Babak Taati. 2023 https://neurips.cc/virtual/2023/poster/73547
FAD: A Chinese Dataset for Fake Audio Detection
zenodo.org
bin, zip
Updated Jul 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haoxin Ma; Jiangyan Yi; Haoxin Ma; Jiangyan Yi (2023). FAD: A Chinese Dataset for Fake Audio Detection [Dataset]. http://doi.org/10.5281/zenodo.6641573
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6641573
Dataset updated
Jul 9, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Haoxin Ma; Jiangyan Yi; Haoxin Ma; Jiangyan Yi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a
Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for
noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for
audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging.
The FAD dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/FAD

The FAD dataset is designed to evaluate the methods of fake audio detection and fake algorithms recognition and other relevant studies. To better study the robustness of the methods under noisy
conditions when applied in real life, we construct the corresponding noisy dataset. The total FAD dataset consists of two versions: clean version and noisy version. Both versions are divided into
disjoint training, development and test sets in the same way. There is no speaker overlap across these three subsets. Each test sets is further divided into seen and unseen test sets. Unseen test sets can
evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audios and fake audios in the unseen test set are unknown to the model.
For the noisy speech part, we select three noise database for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the
remaining subsets come from different noise databases. In each version of FAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. More detailed statistics are demonstrated in the Tabel 2.

Clean Real Audios Collection
From the point of eliminating the interference of irrelevant factors, we collect clean real audios from
two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset.

Clean Fake Audios Generation
We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios.

Noisy Audios Simulation
Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different
SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes.

This data set is licensed with a CC BY-NC-ND 4.0 license.
You can cite the data using the following BibTeX entry:
@inproceedings{ma2022fad,
title={FAD: A Chinese Dataset for Fake Audio Detection},
author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu},
booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks },
year={2022},
}
E
BuckTales : A multi-UAV dataset for multi-object tracking and...
edmond.mpg.de
mp4, zip
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar (2024). BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes [Dataset]. http://doi.org/10.17617/3.JCZ9WK
Explore at:
zip(65010277544), mp4(403189785), zip(3287471192), zip(457749126), mp4(130172114), zip(17011998466)Available download formats
Unique identifier
https://doi.org/10.17617/3.JCZ9WK
Dataset updated
Dec 19, 2024
Dataset provided by
Edmond
Authors
Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The dataset contains UAV footage of wild antelopes (blackbucks) in grassland habitats. It can be mainly used for two tasks: Multi-object tracking (MOT) and Re-Identification (Re-ID). We provide annotations for the position of animals in each frame, allowing us to offer very long videos (up to 3 min) completely annotated while maintaining the identity of each animal in the video. The Re-ID dataset offers two videos, that capture the movement of some animals simultaneously from two different UAVs. The Re-ID task is to find the same individual in two videos taken simultaneously from a slightly different perspective. The relevant paper will be published in the NeurIPS 2024 Dataset and Benchmarking Track. https://nips.cc/virtual/2024/poster/97563 Resolution: 5.4 K MOT: 12 videos ( MOT17 Format) Re-ID: 6 sets (each with a pair of drones) (Custom) Detection: 320 Images (COCO, YOLO)
h
3D-ADAM
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul McHard (2025). 3D-ADAM [Dataset]. http://doi.org/10.57967/hf/5526
Explore at:
Unique identifier
https://doi.org/10.57967/hf/5526
Dataset updated
May 28, 2025
Authors
Paul McHard
Description
Repository for the 3D-ADAM Dataset. Submitted to NeurIPS 2025 - Datasets and Benchmarks Track.

license: cc-by-nc-sa-4.0
PDE dataset
kaggle.com
Updated Jun 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuhao Cao (2021). PDE dataset [Dataset]. https://www.kaggle.com/scaomath/pde-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 27, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shuhao Cao
Description
Context

Tasks

This dataset contains the Burgers' equation and Darcy flow benchmarks featured in several operator learning tasks involving parametric partial differential equations: for \(a\in \mathcal{A}\)

\[L_a (u) = f \quad \text{ in } \Omega \subset\mathbb{R}^m,\]

with appropriate boundary or initial conditions. The tasks are

Solving the equation above for a class of initial conditions \(f = u(\cdot, 0)\).

Find the mapping between a varying parameter \(a\) to the solution \(u\).

The inverse coefficient identification mapping: \(u+\text{noise} \mapsto a\).

Introductory posts

Fourier Neural Operator - https://zongyi-li.github.io/blog/2020/fourier-pde/

Galerkin Transformer - https://scaomath.github.io/blog/galerkin-transformer-neurips/, https://towardsdatascience.com/galerkin-transformer-a-one-shot-experiment-at-neurips-2021-96efcbaefd3e

Data

The datasets are given in the form of matlab file. The dataset can be loaded to torch.Tensor format using the following snippets. The first index is the samples; the rest dimensions contains the spacial discretizations of the data and the targets. data_path specified by user.

Burgers

PDE for \(u \in H^1((0,2\pi)) \cap C^0(\mathbb{S}^1)\) \[ \frac {\partial u}{\partial t} + u \frac{\partial u}{\partial x} = u \frac {\partial ^{2}u}{\partial x^{2}} \quad \text{ for } (x,t)\in (0,2\pi) \times (0,1],\] Initial condition \(u(x, 0) = u_0(x) \text{ for } x\in (0,2\pi)\).

data = loadmat(data_path) x_data = data['a'] # input y_data = data['u'] # target

Darcy flow

In a forward problem, a is the input and u is the solution (target). In an inverse problem, u is the input and a is the target.

PDE: the coefficient \(a\in L^{\infty}(\Omega)\), \(u \in H^1_0(\Omega)\) \[- abla \cdot( a(x) abla u ) = 1\quad \text{in }\; (0,1)^2,\] with a zero Dirichlet boundary condition.

from scipy.io import loadmat data = loadmat(data_path) a = data['coeff'] u = data['sol']

License

The dataset is owned by Zongyi Li and the license is MIT: https://github.com/zongyi-li/fourier_neural_operator

Bibliography

@misc{li2020fourier, title={Fourier Neural Operator for Parametric Partial Differential Equations}, author={Zongyi Li and Nikola Kovachki and Kamyar Azizzadenesheli and Burigede Liu and Kaushik Bhattacharya and Andrew Stuart and Anima Anandkumar}, year={2020}, eprint={2010.08895}, archivePrefix={arXiv}, primaryClass={cs.LG} }

@misc{li2020neural, title={Neural Operator: Graph Kernel Network for Partial Differential Equations}, author={Zongyi Li and Nikola Kovachki and Kamyar Azizzadenesheli and Burigede Liu and Kaushik Bhattacharya and Andrew Stuart and Anima Anandkumar}, year={2020}, eprint={2003.03485}, archivePrefix={arXiv}, primaryClass={cs.LG} }

@misc{cao2021, title={Choose a Transformer: Fourier or Galerkin}, author={Shuhao Cao}, year={2021}, eprint={2105.14995}, archivePrefix={arXiv}, primaryClass={cs.LG} }
h
SIMSHIFT_data
huggingface.co
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SIMSHIFT (2024). SIMSHIFT_data [Dataset]. https://huggingface.co/datasets/simshift/SIMSHIFT_data
Explore at:
Dataset updated
Aug 9, 2024
Authors
SIMSHIFT
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
SIMSHIFT: A Benchmark for Adapting Neural Surrogates to Distribution Shifts

This is the official data repository to the NeurIPS 2025 Datasets & Benchmarks Track Submission.

Usage

We provide dataset loading utilities and full training and evaluation pipelines in the accompanying code repository that will be released upon publication.
h
Data from: COinCO
huggingface.co
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tianze Yang (2025). COinCO [Dataset]. https://huggingface.co/datasets/ytz009/COinCO
Explore at:
Dataset updated
Jun 3, 2025
Authors
Tianze Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
🖼️ COinCO: Common Inpainted Objects In-N-Out of Context

Authors: Tianze Yang*, Tyson Jordan*, Ninghao Liu, Jin Sun*Equal contributionAffiliation: University of GeorgiaStatus: Submitted to NeurIPS 2025 Datasets and Benchmarks Track (under review)

📦 1. Dataset Overview

The COinCO dataset is a large-scale benchmark constructed from the COCO dataset to study object-scene contextual relationships via inpainting. Each image in COinCO contains one inpainted object, and… See the full description on the dataset page: https://huggingface.co/datasets/ytz009/COinCO.
CASE118 with energy cost variation
figshare.com
bin
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous USER (2025). CASE118 with energy cost variation [Dataset]. http://doi.org/10.6084/m9.figshare.29066399.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29066399.v1
Dataset updated
May 23, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anonymous USER
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the first dataset of the NeurIPS 2025 submission: The SafePowerGraph Benchmark: Toward Reliable and Realistic Graph Learning in Power Grids.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yiwei Wu; Hanlin Li; Hanlin Li; Yiwei Wu (2024). Annotations for "A Systematic Review of NeurIPS Dataset Management Practices" [Dataset]. http://doi.org/10.18738/T8/HLTRQP

Annotations for "A Systematic Review of NeurIPS Dataset Management Practices"

Explore at:

tsv(91869)Available download formats

Unique identifier

https://doi.org/10.18738/T8/HLTRQP

Dataset updated

Nov 5, 2024

Dataset provided by

Texas Data Repository

Authors

Yiwei Wu; Hanlin Li; Hanlin Li; Yiwei Wu

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This file includes our annotations of 238 dataset papers published at the NeurIPS Datasets and Benchmarks Track. A full report of our findings can be found at https://arxiv.org/abs/2411.00266

Clear search

Close search

Google apps

Main menu

Annotations for "A Systematic Review of NeurIPS Dataset Management...

NeurIPS 2021 Benchmark dataset

ConViS-Bench

repliqa

Human bone marrow mononuclear cells

Data from: Datasets for a data-centric image classification benchmark for...

ProactiveBench

NIPS2025D&B_AMPBenchmark

NLID

Data from: LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive...

Data from: Amos: A large-scale abdominal multi-organ benchmark for versatile...

TMGBench

OCW Dataset

FAD: A Chinese Dataset for Fake Audio Detection

BuckTales : A multi-UAV dataset for multi-object tracking and...

3D-ADAM

PDE dataset

Context

Tasks

Introductory posts

Data

Burgers

Darcy flow

License

Bibliography

SIMSHIFT_data

Data from: COinCO

CASE118 with energy cost variation

Annotations for "A Systematic Review of NeurIPS Dataset Management Practices"