37 datasets found
  1. T

    Annotations for "A Systematic Review of NeurIPS Dataset Management...

    • dataverse.tdl.org
    tsv
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yiwei Wu; Hanlin Li; Hanlin Li; Yiwei Wu (2024). Annotations for "A Systematic Review of NeurIPS Dataset Management Practices" [Dataset]. http://doi.org/10.18738/T8/HLTRQP
    Explore at:
    tsv(91869)Available download formats
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    Texas Data Repository
    Authors
    Yiwei Wu; Hanlin Li; Hanlin Li; Yiwei Wu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This file includes our annotations of 238 dataset papers published at the NeurIPS Datasets and Benchmarks Track. A full report of our findings can be found at https://arxiv.org/abs/2411.00266

  2. f

    NeurIPS 2021 Benchmark dataset

    • figshare.com
    hdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scverse; Malte Luecken (2023). NeurIPS 2021 Benchmark dataset [Dataset]. http://doi.org/10.6084/m9.figshare.22716739.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    scverse; Malte Luecken
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Subset of the benchmark dataset published in Luecken et al. (2021).

  3. h

    ConViS-Bench

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    submission1335, ConViS-Bench [Dataset]. https://huggingface.co/datasets/submission1335/ConViS-Bench
    Explore at:
    Authors
    submission1335
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset is associated to submission 1335 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). See the provided README for information about dataset downloading and running the evaluations.

  4. repliqa

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ServiceNow, repliqa [Dataset]. https://huggingface.co/datasets/ServiceNow/repliqa
    Explore at:
    Dataset authored and provided by
    ServiceNowhttp://servicenow.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RepLiQA - Repository of Likely Question-Answer for benchmarking

    NeurIPS Datasets presentation

      Dataset Summary
    

    RepLiQA is an evaluation dataset that contains Context-Question-Answer triplets, where contexts are non-factual but natural-looking documents about made up entities such as people or places that do not exist in reality. RepLiQA is human-created, and designed to test for the ability of Large Language Models (LLMs) to find and use contextual information in provided… See the full description on the dataset page: https://huggingface.co/datasets/ServiceNow/repliqa.

  5. f

    Human bone marrow mononuclear cells

    • figshare.com
    hdf
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marius Lange (2025). Human bone marrow mononuclear cells [Dataset]. http://doi.org/10.6084/m9.figshare.28302875.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jan 29, 2025
    Dataset provided by
    figshare
    Authors
    Marius Lange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The gene expression portion of the NeurIPS 2021 challenge 10x multiome dataset (Luecken et al., NeurIPS datasets and benchmarks track 2021), originally obtained from GEO. Contains single-cell gene expression of 69,249 cells for 13,431 genes. The adata.X field contains normalized data and adata.layers['counts'] contains raw expression values. We computed a latent space using scANVI (Xu et al., MSB 2021), following their tutorial.

  6. Data from: Datasets for a data-centric image classification benchmark for...

    • zenodo.org
    • openagrar.de
    txt, zip
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch (2023). Datasets for a data-centric image classification benchmark for noisy and ambiguous label estimation [Dataset]. http://doi.org/10.5281/zenodo.7180818
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch
    Description

    This is the official data repository of the Data-Centric Image Classification (DCIC) Benchmark. The goal of this benchmark is to measure the impact of tuning the dataset instead of the model for a variety of image classification datasets. Full details about the collection process, the structure and automatic download at

    Paper: https://arxiv.org/abs/2207.06214

    Source Code: https://github.com/Emprime/dcic

    The license information is given below as download.

    Citation

    Please cite as

    @article{schmarje2022benchmark,
      author = {Schmarje, Lars and Grossmann, Vasco and Zelenka, Claudius and Dippel, Sabine and Kiko, Rainer and Oszust, Mariusz and Pastell, Matti and Stracke, Jenny and Valros, Anna and Volkmann, Nina and Koch, Reinahrd},
      journal = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
      title = {{Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation}},
      year = {2022}
    }

    Please see the full details about the used datasets below, which should also be cited as part of the license.

    @article{schoening2020Megafauna,
    author = {Schoening, T and Purser, A and Langenk{\"{a}}mper, D and Suck, I and Taylor, J and Cuvelier, D and Lins, L and Simon-Lled{\'{o}}, E and Marcon, Y and Jones, D O B and Nattkemper, T and K{\"{o}}ser, K and Zurowietz, M and Greinert, J and Gomes-Pereira, J},
    doi = {10.5194/bg-17-3115-2020},
    journal = {Biogeosciences},
    number = {12},
    pages = {3115--3133},
    title = {{Megafauna community assessment of polymetallic-nodule fields with cameras: platform and methodology comparison}},
    volume = {17},
    year = {2020}
    }
    
    @article{Langenkamper2020GearStudy,
    author = {Langenk{\"{a}}mper, Daniel and van Kevelaer, Robin and Purser, Autun and Nattkemper, Tim W},
    doi = {10.3389/fmars.2020.00506},
    issn = {2296-7745},
    journal = {Frontiers in Marine Science},
    title = {{Gear-Induced Concept Drift in Marine Images and Its Effect on Deep Learning Classification}},
    volume = {7},
    year = {2020}
    }
    
    
    @article{peterson2019cifar10h,
    author = {Peterson, Joshua and Battleday, Ruairidh and Griffiths, Thomas and Russakovsky, Olga},
    doi = {10.1109/ICCV.2019.00971},
    issn = {15505499},
    journal = {Proceedings of the IEEE International Conference on Computer Vision},
    pages = {9616--9625},
    title = {{Human uncertainty makes classification more robust}},
    volume = {2019-Octob},
    year = {2019}
    }
    
    @article{schmarje2019,
    author = {Schmarje, Lars and Zelenka, Claudius and Geisen, Ulf and Gl{\"{u}}er, Claus-C. and Koch, Reinhard},
    doi = {10.1007/978-3-030-33676-9_26},
    issn = {23318422},
    journal = {DAGM German Conference of Pattern Regocnition},
    number = {November},
    pages = {374--386},
    publisher = {Springer},
    title = {{2D and 3D Segmentation of uncertain local collagen fiber orientations in SHG microscopy}},
    volume = {11824 LNCS},
    year = {2019}
    }
    
    @article{schmarje2021foc,
    author = {Schmarje, Lars and Br{\"{u}}nger, Johannes and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Kiko, Rainer and Koch, Reinhard},
    doi = {10.3390/s21196661},
    issn = {1424-8220},
    journal = {Sensors},
    number = {19},
    pages = {6661},
    title = {{Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy}},
    volume = {21},
    year = {2021}
    }
    
    @article{schmarje2022dc3,
    author = {Schmarje, Lars and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Zelenka, Claudius and Kiko, Rainer and Stracke, Jenny and Volkmann, Nina and Koch, Reinhard},
    journal = {Proceedings of the European Conference on Computer Vision (ECCV)},
    title = {{A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering}},
    year = {2022}
    }
    
    
    @article{obuchowicz2020qualityMRI,
    author = {Obuchowicz, Rafal and Oszust, Mariusz and Piorkowski, Adam},
    doi = {10.1186/s12880-020-00505-z},
    issn = {1471-2342},
    journal = {BMC Medical Imaging},
    number = {1},
    pages = {109},
    title = {{Interobserver variability in quality assessment of magnetic resonance images}},
    volume = {20},
    year = {2020}
    }
    
    
    @article{stepien2021cnnQuality,
    author = {St{\c{e}}pie{\'{n}}, Igor and Obuchowicz, Rafa{\l} and Pi{\'{o}}rkowski, Adam and Oszust, Mariusz},
    doi = {10.3390/s21041043},
    issn = {1424-8220},
    journal = {Sensors},
    number = {4},
    title = {{Fusion of Deep Convolutional Neural Networks for No-Reference Magnetic Resonance Image Quality Assessment}},
    volume = {21},
    year = {2021}
    }
    
    @article{volkmann2021turkeys,
    author = {Volkmann, Nina and Br{\"{u}}nger, Johannes and Stracke, Jenny and Zelenka, Claudius and Koch, Reinhard and Kemper, Nicole and Spindler, Birgit},
    doi = {10.3390/ani11092655},
    journal = {Animals 2021},
    pages = {1--13},
    title = {{Learn to train: Improving training data for a neural network to detect pecking injuries in turkeys}},
    volume = {11},
    year = {2021}
    }
    
    @article{volkmann2022keypoint,
    author = {Volkmann, Nina and Zelenka, Claudius and Devaraju, Archana Malavalli and Br{\"{u}}nger, Johannes and Stracke, Jenny and Spindler, Birgit and Kemper, Nicole and Koch, Reinhard},
    doi = {10.3390/s22145188},
    issn = {1424-8220},
    journal = {Sensors},
    number = {14},
    pages = {5188},
    title = {{Keypoint Detection for Injury Identification during Turkey Husbandry Using Neural Networks}},
    volume = {22},
    year = {2022}
    }

  7. h

    ProactiveBench

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    submission1331 (2025). ProactiveBench [Dataset]. https://huggingface.co/datasets/submission1331/ProactiveBench
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    submission1331
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This is the benchmark associated with submission 1331 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). The .jsonl files do not contain proper image paths but rather image path templates, as each .jsonl entry is a sample, and each sample corresponds to a different environment with its own images. See the submitted code README for information about dataset downloading and preprocessing, and to… See the full description on the dataset page: https://huggingface.co/datasets/submission1331/ProactiveBench.

  8. H

    NIPS2025D&B_AMPBenchmark

    • dataverse.harvard.edu
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boyao Wan (2025). NIPS2025D&B_AMPBenchmark [Dataset]. http://doi.org/10.7910/DVN/E9A88D
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 16, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Boyao Wan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Dataset for "A Benchmark for Antimicrobial Peptide Recognition Based on Structure and Sequence Representation" at NeurIPS 2025 Dataset and Benchmark

  9. h

    NLID

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GWDx (2025). NLID [Dataset]. https://huggingface.co/datasets/GWDx/NLID
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    GWDx
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NLID: A Large-Scale Neuromorphic Liquid Identification Dataset

    NeurIPS 2025 Datasets and Benchmarks Track paper: Bubbles Talk: A Neuromorphic Dataset for Liquid Identification from Pouring process.

  10. Data from: LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive...

    • zenodo.org
    • data.niaid.nih.gov
    pdf, zip
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang Junjue; Wang Junjue; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei (2024). LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation [Dataset]. http://doi.org/10.5281/zenodo.5706578
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wang Junjue; Wang Junjue; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The benchmark code is available at: https://github.com/Junjue-Wang/LoveDA

    Highlights:

    1. 5987 high spatial resolution (0.3 m) remote sensing images from Nanjing, Changzhou, and Wuhan
    2. Focus on different geographical environments between Urban and Rural
    3. Advance both semantic segmentation and domain adaptation tasks
    4. Three considerable challenges: multi-scale objects, complex background samples, and inconsistent class distributions

    Reference:

    @inproceedings{wang2021loveda,
     title={Love{DA}: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation},
     author={Junjue Wang and Zhuo Zheng and Ailong Ma and Xiaoyan Lu and Yanfei Zhong},
     booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks},
     editor = {J. Vanschoren and S. Yeung},
     year={2021},
     volume = {1},
     pages = {},
     url={https://datasets-benchmarks proceedings.neurips.cc/paper/2021/file/4e732ced3463d06de0ca9a15b6153677-Paper-round2.pdf}
    }

    License:

    The owners of the data and of the copyright on the data are RSIDEA, Wuhan University. Use of the Google Earth images must respect the "Google Earth" terms of use. All images and their associated annotations in LoveDA can be used for academic purposes only, but any commercial use is prohibited. (CC BY-NC-SA 4.0)

  11. o

    Data from: Amos: A large-scale abdominal multi-organ benchmark for versatile...

    • explore.openaire.eu
    • zenodo.org
    Updated Nov 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JI YUANFENG (2022). Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation [Dataset]. http://doi.org/10.5281/zenodo.7155724
    Explore at:
    Dataset updated
    Nov 28, 2022
    Authors
    JI YUANFENG
    Description

    Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. The paper can be found at https://arxiv.org/pdf/2206.08023.pdf In addition to providing the labeled 600 CT and MRI scans, we expect to provide 2000 CT and 1200 MRI scans without labels to support more learning tasks (semi-supervised, un-supervised, domain adaption, ...). The link can be found in: labeled data (500CT+100MRI) unlabeled data Part I (900CT) unlabeled data Part II (1100CT) (Now there are 1000CT, we will replenish to 1100CT) unlabeled data Part III (1200MRI) if you found this dataset useful for your research, please cite: @inproceedings{NEURIPS2022_ee604e1b, author = {Ji, Yuanfeng and Bai, Haotian and GE, Chongjian and Yang, Jie and Zhu, Ye and Zhang, Ruimao and Li, Zhen and Zhanng, Lingyan and Ma, Wanling and Wan, Xiang and Luo, Ping}, booktitle = {Advances in Neural Information Processing Systems}, editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh}, pages = {36722--36732}, publisher = {Curran Associates, Inc.}, title = {AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation}, url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/ee604e1bedbd069d9fc9328b7b9584be-Paper-Datasets_and_Benchmarks.pdf}, volume = {35}, year = {2022} }

  12. h

    TMGBench

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haochuan Wang (2025). TMGBench [Dataset]. https://huggingface.co/datasets/pinkex/TMGBench
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Haochuan Wang
    Description

    TMGBench: TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs

    This repository contains the code, data, and metadata for our NeurIPS 2025 Datasets and Benchmarks submission: TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs. The benchmark evaluates the strategic reasoning of large language models using 2x2 matrix games with narrative contexts and theory-of-mind variations.

      Directory Structure… See the full description on the dataset page: https://huggingface.co/datasets/pinkex/TMGBench.
    
  13. P

    OCW Dataset

    • paperswithcode.com
    • library.toponeai.link
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saeid Naeini; Raeid Saqur; Mozhgan Saeidi; John Giorgi; Babak Taati, OCW Dataset [Dataset]. https://paperswithcode.com/dataset/only-connect-wall-ocw-dataset
    Explore at:
    Authors
    Saeid Naeini; Raeid Saqur; Mozhgan Saeidi; John Giorgi; Babak Taati
    Description

    The OCW dataset is for evaluating creative problem solving tasks by curating the problems and human performance results from the popular British quiz show Only Connect.

    The OCW dataset contains 618 connecting wall puzzles and solutions in total from 15 seasons of the show. Each show episode has two walls.

    The dataset has two tasks: Task 1 (Grouping), and Task 2 (Connections) are identical to the quiz-show’s human participant tasks.

    Task 1 (Groupings) is evaluated via six metrics: number of solved walls, number of correct groups (max. four per wall), Adjusted Mutual Information (AMI), Adjusted Rand Index (ARI), Fowlkes Mallows Score (FMS), and Wasserstein Distance (WD), normalized to (0, 1) range, between predicted and ground-truth labels.

    Task 2 (Connections) is evaluated with three metrics: exact string matching, ROUGE-1 F1, and BERTScore F1.

    Baseline results with pre-trained language models and with few-shot In-context Learning (ICL) with LLMs such as GPT-4 are available here:

    "Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset" Saeid Alavi Naeini, Raeid Saqur, Mozhgan Saeidi, John Giorgi, Babak Taati. 2023 https://neurips.cc/virtual/2023/poster/73547

  14. FAD: A Chinese Dataset for Fake Audio Detection

    • zenodo.org
    bin, zip
    Updated Jul 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haoxin Ma; Jiangyan Yi; Haoxin Ma; Jiangyan Yi (2023). FAD: A Chinese Dataset for Fake Audio Detection [Dataset]. http://doi.org/10.5281/zenodo.6641573
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jul 9, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Haoxin Ma; Jiangyan Yi; Haoxin Ma; Jiangyan Yi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a
    Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for
    noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for
    audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging.
    The FAD dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/FAD


    The FAD dataset is designed to evaluate the methods of fake audio detection and fake algorithms recognition and other relevant studies. To better study the robustness of the methods under noisy
    conditions when applied in real life, we construct the corresponding noisy dataset. The total FAD dataset consists of two versions: clean version and noisy version. Both versions are divided into
    disjoint training, development and test sets in the same way. There is no speaker overlap across these three subsets. Each test sets is further divided into seen and unseen test sets. Unseen test sets can
    evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audios and fake audios in the unseen test set are unknown to the model.
    For the noisy speech part, we select three noise database for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the
    remaining subsets come from different noise databases. In each version of FAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. More detailed statistics are demonstrated in the Tabel 2.

    Clean Real Audios Collection
    From the point of eliminating the interference of irrelevant factors, we collect clean real audios from
    two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset.

    Clean Fake Audios Generation
    We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios.

    Noisy Audios Simulation
    Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different
    SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes.

    This data set is licensed with a CC BY-NC-ND 4.0 license.
    You can cite the data using the following BibTeX entry:
    @inproceedings{ma2022fad,
    title={FAD: A Chinese Dataset for Fake Audio Detection},
    author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu},
    booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks },
    year={2022},
    }

  15. E

    BuckTales : A multi-UAV dataset for multi-object tracking and...

    • edmond.mpg.de
    mp4, zip
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar (2024). BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes [Dataset]. http://doi.org/10.17617/3.JCZ9WK
    Explore at:
    zip(65010277544), mp4(403189785), zip(3287471192), zip(457749126), mp4(130172114), zip(17011998466)Available download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    Edmond
    Authors
    Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The dataset contains UAV footage of wild antelopes (blackbucks) in grassland habitats. It can be mainly used for two tasks: Multi-object tracking (MOT) and Re-Identification (Re-ID). We provide annotations for the position of animals in each frame, allowing us to offer very long videos (up to 3 min) completely annotated while maintaining the identity of each animal in the video. The Re-ID dataset offers two videos, that capture the movement of some animals simultaneously from two different UAVs. The Re-ID task is to find the same individual in two videos taken simultaneously from a slightly different perspective. The relevant paper will be published in the NeurIPS 2024 Dataset and Benchmarking Track. https://nips.cc/virtual/2024/poster/97563 Resolution: 5.4 K MOT: 12 videos ( MOT17 Format) Re-ID: 6 sets (each with a pair of drones) (Custom) Detection: 320 Images (COCO, YOLO)

  16. h

    3D-ADAM

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul McHard (2025). 3D-ADAM [Dataset]. http://doi.org/10.57967/hf/5526
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Paul McHard
    Description

    Repository for the 3D-ADAM Dataset. Submitted to NeurIPS 2025 - Datasets and Benchmarks Track.

      license: cc-by-nc-sa-4.0
    
  17. PDE dataset

    • kaggle.com
    Updated Jun 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuhao Cao (2021). PDE dataset [Dataset]. https://www.kaggle.com/scaomath/pde-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shuhao Cao
    Description

    Context

    Tasks

    This dataset contains the Burgers' equation and Darcy flow benchmarks featured in several operator learning tasks involving parametric partial differential equations: for \(a\in \mathcal{A}\)

    \[L_a (u) = f \quad \text{ in } \Omega \subset\mathbb{R}^m,\]

    with appropriate boundary or initial conditions. The tasks are

    • Solving the equation above for a class of initial conditions \(f = u(\cdot, 0)\).
    • Find the mapping between a varying parameter \(a\) to the solution \(u\).
    • The inverse coefficient identification mapping: \(u+\text{noise} \mapsto a\).

    Introductory posts

    Data

    The datasets are given in the form of matlab file. The dataset can be loaded to torch.Tensor format using the following snippets. The first index is the samples; the rest dimensions contains the spacial discretizations of the data and the targets. data_path specified by user.

    Burgers

    PDE for \(u \in H^1((0,2\pi)) \cap C^0(\mathbb{S}^1)\) \[ \frac {\partial u}{\partial t} + u \frac{\partial u}{\partial x} = u \frac {\partial ^{2}u}{\partial x^{2}} \quad \text{ for } (x,t)\in (0,2\pi) \times (0,1],\] Initial condition \(u(x, 0) = u_0(x) \text{ for } x\in (0,2\pi)\).

    data = loadmat(data_path)
    x_data = data['a'] # input
    y_data = data['u'] # target
    

    Darcy flow

    In a forward problem, a is the input and u is the solution (target). In an inverse problem, u is the input and a is the target.

    PDE: the coefficient \(a\in L^{\infty}(\Omega)\), \(u \in H^1_0(\Omega)\) \[- abla \cdot( a(x) abla u ) = 1\quad \text{in }\; (0,1)^2,\] with a zero Dirichlet boundary condition.

    from scipy.io import loadmat
    data = loadmat(data_path)
    a = data['coeff']
    u = data['sol']
    

    License

    The dataset is owned by Zongyi Li and the license is MIT: https://github.com/zongyi-li/fourier_neural_operator

    Bibliography

    @misc{li2020fourier,
       title={Fourier Neural Operator for Parametric Partial Differential Equations}, 
       author={Zongyi Li and Nikola Kovachki and Kamyar Azizzadenesheli and Burigede Liu and Kaushik Bhattacharya and Andrew Stuart and Anima Anandkumar},
       year={2020},
       eprint={2010.08895},
       archivePrefix={arXiv},
       primaryClass={cs.LG}
    }
    
    @misc{li2020neural,
       title={Neural Operator: Graph Kernel Network for Partial Differential Equations}, 
       author={Zongyi Li and Nikola Kovachki and Kamyar Azizzadenesheli and Burigede Liu and Kaushik Bhattacharya and Andrew Stuart and Anima Anandkumar},
       year={2020},
       eprint={2003.03485},
       archivePrefix={arXiv},
       primaryClass={cs.LG}
    }
    
    @misc{cao2021,
       title={Choose a Transformer: Fourier or Galerkin}, 
       author={Shuhao Cao},
       year={2021},
       eprint={2105.14995},
       archivePrefix={arXiv},
       primaryClass={cs.LG}
    }
    
  18. h

    SIMSHIFT_data

    • huggingface.co
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIMSHIFT (2024). SIMSHIFT_data [Dataset]. https://huggingface.co/datasets/simshift/SIMSHIFT_data
    Explore at:
    Dataset updated
    Aug 9, 2024
    Authors
    SIMSHIFT
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    SIMSHIFT: A Benchmark for Adapting Neural Surrogates to Distribution Shifts

    This is the official data repository to the NeurIPS 2025 Datasets & Benchmarks Track Submission.

      Usage
    

    We provide dataset loading utilities and full training and evaluation pipelines in the accompanying code repository that will be released upon publication.

  19. h

    Data from: COinCO

    • huggingface.co
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tianze Yang (2025). COinCO [Dataset]. https://huggingface.co/datasets/ytz009/COinCO
    Explore at:
    Dataset updated
    Jun 3, 2025
    Authors
    Tianze Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🖼️ COinCO: Common Inpainted Objects In-N-Out of Context

    Authors: Tianze Yang*, Tyson Jordan*, Ninghao Liu, Jin Sun*Equal contributionAffiliation: University of GeorgiaStatus: Submitted to NeurIPS 2025 Datasets and Benchmarks Track (under review)

      📦 1. Dataset Overview
    

    The COinCO dataset is a large-scale benchmark constructed from the COCO dataset to study object-scene contextual relationships via inpainting. Each image in COinCO contains one inpainted object, and… See the full description on the dataset page: https://huggingface.co/datasets/ytz009/COinCO.

  20. CASE118 with energy cost variation

    • figshare.com
    bin
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous USER (2025). CASE118 with energy cost variation [Dataset]. http://doi.org/10.6084/m9.figshare.29066399.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Anonymous USER
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the first dataset of the NeurIPS 2025 submission: The SafePowerGraph Benchmark: Toward Reliable and Realistic Graph Learning in Power Grids.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yiwei Wu; Hanlin Li; Hanlin Li; Yiwei Wu (2024). Annotations for "A Systematic Review of NeurIPS Dataset Management Practices" [Dataset]. http://doi.org/10.18738/T8/HLTRQP

Annotations for "A Systematic Review of NeurIPS Dataset Management Practices"

Related Article
Explore at:
tsv(91869)Available download formats
Dataset updated
Nov 5, 2024
Dataset provided by
Texas Data Repository
Authors
Yiwei Wu; Hanlin Li; Hanlin Li; Yiwei Wu
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This file includes our annotations of 238 dataset papers published at the NeurIPS Datasets and Benchmarks Track. A full report of our findings can be found at https://arxiv.org/abs/2411.00266

Search
Clear search
Close search
Google apps
Main menu