Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset is associated to submission 1335 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). See the provided README for information about dataset downloading and running the evaluations.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
NeurIPS 2025 Papers Dataset
This dataset contains all accepted papers from NeurIPS 2025, scraped from OpenReview.
Dataset Statistics
Overview
Total Papers: 5772 Unique Paper IDs: 5772 ✅ No duplicate IDs
Track Distribution
Main Track: 5,275 papers (91.4%) Datasets and Benchmarks Track: 497 papers (8.6%)
Award Distribution
Poster: 4,949 papers (85.7%) Oral: 84 papers (1.5%) Spotlight: 739 papers (12.8%)
Track × Award… See the full description on the dataset page: https://huggingface.co/datasets/huyxdang/neurips-2025-papers.
Facebook
TwitterAccepted by NeurIPS 2024 Datasets and Benchmarks Track
We introduce the RePair puzzle-solving dataset, a large-scale real world dataset of fractured frescoes from the archaelogical campus of Pompeii. Our dataset consists of over 1000 fractured frescoes. The RePAIR stands as a realistic computational challenge for methods for 2D and 3D puzzle solving, and serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding. Please visit our website for more dataset information, access to source code scripts and for an interactive gallery viewing of the dataset samples.
We provide a compressed version of our dataset in two seperate files. One for the 2D version and one for the 3D version.
Our full dataset contains over one thousand individual fractured fragments divided into groups with its corresponding folder and all compressed into their individual sub-set format regarding whether they are 2D or 3D. Regarding the 2D dataset, each fragment is saved as a .PNG image and each group has the corresponding ground truth transformation to solve the puzzle as a .TXT file. Considering the 3D dataset, each fragment is saved as a mesh using the widely .OBJ format with the corresponding material (.MTL) and texture (.PNG) file. The meshes are already in the assembled position and orientation, so that no additional information is needed. All additional metadata information are given as .JSON files.
Please be advised that downloading and reusing this dataset is permitted only upon acceptance of the following license terms.
The Istituto Italiano di Tecnologia (IIT) declares, and the user (“User”) acknowledges, that the "RePAIR puzzle-solving dataset" contains 3D scans, texture maps, rendered images and meta-data of fresco fragments acquired at the Archaeological Site of Pompeii. IIT is authorised to publish the RePAIR puzzle-solving dataset herein only for scientific and cultural purposes and in connection with an academic publication referenced as Tsemelis et al., "Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving", NeurIPS 2024. Use of the RePAIR puzzle-solving dataset by User is limited to downloading, viewing such images; comparing these with data or content in other datasets. User is not authorised to use, in particular explicitly excluding any commercial use nor in conjunction with the promotion of a commercial enterprise and/or its product(s) or service(s), reproduce, copy, distribute the RePAIR puzzle-solving dataset. User will not use the RePAIR puzzle-solving dataset in any way prohibited by applicable laws. RePAIR puzzle-solving dataset therein is being provided to User without warranty of any kind, either expressed or implied. User will be solely responsible for their use of such RePAIR puzzle-solving dataset. In no event shall IIT be liable for any damages arising from such use.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The gene expression portion of the NeurIPS 2021 challenge 10x multiome dataset (Luecken et al., NeurIPS datasets and benchmarks track 2021), originally obtained from GEO. Contains single-cell gene expression of 69,249 cells for 13,431 genes. The adata.X field contains normalized data and adata.layers['counts'] contains raw expression values. We computed a latent space using scANVI (Xu et al., MSB 2021), following their tutorial.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
🖥 MedSG-Bench: A Benchmark for Medical Image Sequences Grounding
📖 Paper | 💻 Code | 🤗 Dataset
🔥 MedSG-Bench is accepted at NeurIPS 2025 Datasets and Benchmarks Track as a Spotlight.
MedSG-Bench
MedSG-Bench is the first benchmark for medical image sequences grounding. 👉 We also provide MedSG-188K, a grounding instruction-tuning dataset. 👉 MedSeq-Grounder, the model trained on MedSG-188K, is available here.
Metadata
This dataset… See the full description on the dataset page: https://huggingface.co/datasets/MedSG-Bench/MedSG-Bench.
Facebook
TwitterDataset accompanying the NeurIPS 2022 Dataset and Benchmark Track paper: Breaking Bad: A Dataset for Geometric Fracture and Reassembly. Please refer to our project page for more details.
License: The Breaking Bad dataset collects 3D meshes from ShapeNet and Thingi10K thus inheriting their terms of use. Please refer to ShapeNet and Thingi10K for more details. We release each model in our dataset with an as-permissive-as-possible license compatible with its underlying base model. Please refer to ShapeNet and Thingi10K for restrictions and depositor requirements of each model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accepted to NeurIPS 2025 (Datasets & Benchmarks Track)
TreeFinder is the first large-scale, high-resolution benchmark dataset for mapping individual dead trees across the contiguous United States (CONUS). Built to advance computer vision methods for ecological monitoring and carbon assessment, TreeFinder provides pixel-level annotations of dead trees from high-resolution aerial imagery, enriched with ecological metadata and paired with performance benchmarks.
We provide benchmark performance results using five semantic segmentation models, including: - U-Net and DeepLabV3+ (CNN-based) - ViT, SegFormer, and Mask2Former (Transformer-based) - DOFA (a multimodal foundation model trained on satellite data) Each model is trained and evaluated across various domain generalization settings (e.g., region, climate, forest type) to test robustness.
Each patch is enriched with: - Geographic Coordinates - Köppen–Geiger Climate Zone - Primary Tree Type (from USDA Forest Service maps)
These metadata enable benchmarking under challenging scenarios like: - Cross-region generalization (e.g., East → West) - Climate domain shifts - Forest type transfer
TreeFinder enables the development and evaluation of machine learning models for high-impact environmental tasks such as: - Forest health monitoring - Carbon flux modeling - Wildfire risk assessment It is designed to foster cross-disciplinary collaboration between the machine learning and Earth science communities by providing a reproducible, challenging, and ecologically grounded benchmark.
If you use TreeFinder in your research, please cite the following paper:
Zhihao Wang, Cooper Li, Ruichen Wang, Lei Ma, George Hurtt, Xiaowei Jia, Gengchen Mai, Zhili Li, Yiqun Xie.
TreeFinder: A US-Scale Benchmark Dataset for Individual Tree Mortality Monitoring Using High-Resolution Aerial Imagery.
In Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), Datasets and Benchmarks Track, 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets from the Paper LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite at NeurIPS 2023 Track on Datasets and Benchmarks.
Facebook
TwitterThis is the official data repository of the Data-Centric Image Classification (DCIC) Benchmark. The goal of this benchmark is to measure the impact of tuning the dataset instead of the model for a variety of image classification datasets. Full details about the collection process, the structure and automatic download at
Paper: https://arxiv.org/abs/2207.06214
Source Code: https://github.com/Emprime/dcic
The license information is given below as download.
Citation
Please cite as
@article{schmarje2022benchmark,
author = {Schmarje, Lars and Grossmann, Vasco and Zelenka, Claudius and Dippel, Sabine and Kiko, Rainer and Oszust, Mariusz and Pastell, Matti and Stracke, Jenny and Valros, Anna and Volkmann, Nina and Koch, Reinahrd},
journal = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
title = {{Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation}},
year = {2022}
}
Please see the full details about the used datasets below, which should also be cited as part of the license.
@article{schoening2020Megafauna,
author = {Schoening, T and Purser, A and Langenk{\"{a}}mper, D and Suck, I and Taylor, J and Cuvelier, D and Lins, L and Simon-Lled{\'{o}}, E and Marcon, Y and Jones, D O B and Nattkemper, T and K{\"{o}}ser, K and Zurowietz, M and Greinert, J and Gomes-Pereira, J},
doi = {10.5194/bg-17-3115-2020},
journal = {Biogeosciences},
number = {12},
pages = {3115--3133},
title = {{Megafauna community assessment of polymetallic-nodule fields with cameras: platform and methodology comparison}},
volume = {17},
year = {2020}
}
@article{Langenkamper2020GearStudy,
author = {Langenk{\"{a}}mper, Daniel and van Kevelaer, Robin and Purser, Autun and Nattkemper, Tim W},
doi = {10.3389/fmars.2020.00506},
issn = {2296-7745},
journal = {Frontiers in Marine Science},
title = {{Gear-Induced Concept Drift in Marine Images and Its Effect on Deep Learning Classification}},
volume = {7},
year = {2020}
}
@article{peterson2019cifar10h,
author = {Peterson, Joshua and Battleday, Ruairidh and Griffiths, Thomas and Russakovsky, Olga},
doi = {10.1109/ICCV.2019.00971},
issn = {15505499},
journal = {Proceedings of the IEEE International Conference on Computer Vision},
pages = {9616--9625},
title = {{Human uncertainty makes classification more robust}},
volume = {2019-Octob},
year = {2019}
}
@article{schmarje2019,
author = {Schmarje, Lars and Zelenka, Claudius and Geisen, Ulf and Gl{\"{u}}er, Claus-C. and Koch, Reinhard},
doi = {10.1007/978-3-030-33676-9_26},
issn = {23318422},
journal = {DAGM German Conference of Pattern Regocnition},
number = {November},
pages = {374--386},
publisher = {Springer},
title = {{2D and 3D Segmentation of uncertain local collagen fiber orientations in SHG microscopy}},
volume = {11824 LNCS},
year = {2019}
}
@article{schmarje2021foc,
author = {Schmarje, Lars and Br{\"{u}}nger, Johannes and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Kiko, Rainer and Koch, Reinhard},
doi = {10.3390/s21196661},
issn = {1424-8220},
journal = {Sensors},
number = {19},
pages = {6661},
title = {{Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy}},
volume = {21},
year = {2021}
}
@article{schmarje2022dc3,
author = {Schmarje, Lars and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Zelenka, Claudius and Kiko, Rainer and Stracke, Jenny and Volkmann, Nina and Koch, Reinhard},
journal = {Proceedings of the European Conference on Computer Vision (ECCV)},
title = {{A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering}},
year = {2022}
}
@article{obuchowicz2020qualityMRI,
author = {Obuchowicz, Rafal and Oszust, Mariusz and Piorkowski, Adam},
doi = {10.1186/s12880-020-00505-z},
issn = {1471-2342},
journal = {BMC Medical Imaging},
number = {1},
pages = {109},
title = {{Interobserver variability in quality assessment of magnetic resonance images}},
volume = {20},
year = {2020}
}
@article{stepien2021cnnQuality,
author = {St{\c{e}}pie{\'{n}}, Igor and Obuchowicz, Rafa{\l} and Pi{\'{o}}rkowski, Adam and Oszust, Mariusz},
doi = {10.3390/s21041043},
issn = {1424-8220},
journal = {Sensors},
number = {4},
title = {{Fusion of Deep Convolutional Neural Networks for No-Reference Magnetic Resonance Image Quality Assessment}},
volume = {21},
year = {2021}
}
@article{volkmann2021turkeys,
author = {Volkmann, Nina and Br{\"{u}}nger, Johannes and Stracke, Jenny and Zelenka, Claudius and Koch, Reinhard and Kemper, Nicole and Spindler, Birgit},
doi = {10.3390/ani11092655},
journal = {Animals 2021},
pages = {1--13},
title = {{Learn to train: Improving training data for a neural network to detect pecking injuries in turkeys}},
volume = {11},
year = {2021}
}
@article{volkmann2022keypoint,
author = {Volkmann, Nina and Zelenka, Claudius and Devaraju, Archana Malavalli and Br{\"{u}}nger, Johannes and Stracke, Jenny and Spindler, Birgit and Kemper, Nicole and Koch, Reinhard},
doi = {10.3390/s22145188},
issn = {1424-8220},
journal = {Sensors},
number = {14},
pages = {5188},
title = {{Keypoint Detection for Injury Identification during Turkey Husbandry Using Neural Networks}},
volume = {22},
year = {2022}
}
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains the results of the review paper Measuring What Matters, presented at NeurIPS 2025 Datasets and Benchmarks Track.
Facebook
TwitterThis repository contains the data and code for the baseline described in the following paper:
CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge
Yasumasa Onoe, Michael J.Q. Zhang, Eunsol Choi, Greg Durrett
NeurIPS 2021 Datasets and Benchmarks Track@article{onoe2021creak, title={CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge}, author={Onoe, Yasumasa and Zhang, Michael J.Q. and Choi, Eunsol and Durrett, Greg}, journal={OpenReview}, year={2021} }
***** [New] November 8th, 2021: The contrast set has been updated. *****
We have increased the size of the contrast set to 500 examples. Please check the paper for new numbers.
CREAK data files are located under data/creak.
train.json contains 10,176 training examples.dev.json contains 1,371 development examples.test_without_labels.json contains 1,371 test examples (labels are not included).contrast_set.json contains 500 contrastive examples.The data files are formatted as jsonlines. Here is a single training example:
{
'ex_id': 'train_1423',
'sentence': 'Lauryn Hill separates two valleys as it is located between them.',
'explanation': 'Lauren Hill is actually a person and not a mountain.',
'label': 'false',
'entity': 'Lauryn Hill',
'en_wiki_pageid': '162864',
'entity_mention_loc': [[0, 11]]
}
| Field | Description |
|---|---|
ex_id | Example ID |
sentence | Claim |
explanation | Explanation by the annotator why the claim is TRUE/FALSE |
label | Label: 'true' or 'false' |
entity | Seed entity |
en_wiki_pageid | English Wikipedia Page ID for the seed entity |
entity_mention_loc | Location(s) of the seed entity in the claim |
See this README
https://www.cs.utexas.edu/~yasumasa/creak/leaderboard.html
We host results only for Closed-Book methods that have been finetuned on only In-Domain data.
To submit your results, please send your system name and prediction files for the dev, test, and contrast sets to yasumasa@utexas.edu.
Please contact at yasumasa@utexas.edu if you have any questions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.
WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.
WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
📊 MMLongBench-Doc Evaluation Results
Official evaluation results: GPT-4.1 (2025-04-14) & GPT-4o (2024-11-20) 📄 Paper: MMLongBench-Doc, NeurIPS 2024 Datasets and Benchmarks Track (Spotlight)
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Companion dataset to the paper “Towards Automated Petrography,” accepted to the NeurIPS 2025 Datasets and Benchmarks track.
The largest and most diverse publicly available experimental framework for automated petrography. LITHOS includes 211,604 high-resolution RGB patches of polarized light and 105,802 expert-annotated grains across 25 mineral categories. Each annotation includes the mineral class, spatial coordinates, expert-measured major and minor axes, capturing grain geometry and orientation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains benchmark data, generated with numerical simulation based on different PDEs, namely 1D advection, 1D Burgers', 1D and 2D diffusion-reaction, 1D diffusion-sorption, 1D, 2D, and 3D compressible Navier-Stokes, 2D Darcy flow, and 2D shallow water equation. This dataset is intended to progress the scientific ML research area. In general, the data are stored in HDF5 format, with the array dimensions packed according to the convention [b,t,x1,...,xd,v], where b is the batch size (i.e. number of samples), t is the time dimension, x1,...,xd are the spatial dimensions, and v is the number of channels (i.e. number of variables of interest). More detailed information are also provided in our Github repository (https://github.com/pdebench/PDEBench) and our submitting paper to NeurIPS 2022 Benchmark track.
Facebook
Twitterhttps://zenodo.org/records/7105232
@inproceedings{prabhushankarolives2022,
title={OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics},
author={Prabhushankar, Mohit and Kokilepersaud, Kiran and Logan, Yash-yee and Trejo Corona, Stephanie and AlRegib, Ghassan and Wykoff, Charles},
booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2 (NeurIPS Datasets and Benchmarks 2022) },
year={2022}
}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[Neurips 2025 DB] PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding
Official dataset release for PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding.
Penghao Wang, Yiyang He, Xin Lv, Yukai Zhou, Lan Xu, Jingyi Yu, Jiayuan Gu† ShanghaiTech University Neurips 2025 Dataset and Benchmark Track | Project Page | Paper | Dataset | Dataset Toolkit | Benchmark code (Soon) | Annotation code (Soon) |… See the full description on the dataset page: https://huggingface.co/datasets/AuWang/PartNeXt_raw.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset contains UAV footage of wild antelopes (blackbucks) in grassland habitats. It can be mainly used for two tasks: Multi-object tracking (MOT) and Re-Identification (Re-ID). We provide annotations for the position of animals in each frame, allowing us to offer very long videos (up to 3 min) completely annotated while maintaining the identity of each animal in the video. The Re-ID dataset offers two videos, that capture the movement of some animals simultaneously from two different UAVs. The Re-ID task is to find the same individual in two videos taken simultaneously from a slightly different perspective. The relevant paper will be published in the NeurIPS 2024 Dataset and Benchmarking Track. https://nips.cc/virtual/2024/poster/97563 Resolution: 5.4 K MOT: 12 videos ( MOT17 Format) Re-ID: 6 sets (each with a pair of drones) (Custom) Detection: 320 Images (COCO, YOLO)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a
Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for
noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for
audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging.
The FAD dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/FAD
The FAD dataset is designed to evaluate the methods of fake audio detection and fake algorithms recognition and other relevant studies. To better study the robustness of the methods under noisy
conditions when applied in real life, we construct the corresponding noisy dataset. The total FAD dataset consists of two versions: clean version and noisy version. Both versions are divided into
disjoint training, development and test sets in the same way. There is no speaker overlap across these three subsets. Each test sets is further divided into seen and unseen test sets. Unseen test sets can
evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audios and fake audios in the unseen test set are unknown to the model.
For the noisy speech part, we select three noise database for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the
remaining subsets come from different noise databases. In each version of FAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. More detailed statistics are demonstrated in the Tabel 2.
Clean Real Audios Collection
From the point of eliminating the interference of irrelevant factors, we collect clean real audios from
two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset.
Clean Fake Audios Generation
We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios.
Noisy Audios Simulation
Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different
SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes.
This data set is licensed with a CC BY-NC-ND 4.0 license.
You can cite the data using the following BibTeX entry:
@inproceedings{ma2022fad,
title={FAD: A Chinese Dataset for Fake Audio Detection},
author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu},
booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks },
year={2022},
}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt