10 datasets found

f
NeurIPS 2021 dataset
figshare.com
hdf
Updated Jul 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Zappia (2024). NeurIPS 2021 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25958374.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25958374.v1
Dataset updated
Jul 28, 2024
Dataset provided by
figshare
Authors
Luke Zappia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt
f
NeurIPS 2021 Benchmark dataset
figshare.com
hdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
scverse; Malte Luecken (2023). NeurIPS 2021 Benchmark dataset [Dataset]. http://doi.org/10.6084/m9.figshare.22716739.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22716739.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
scverse; Malte Luecken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Subset of the benchmark dataset published in Luecken et al. (2021).
CLUES
huggingface.co
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2023). CLUES [Dataset]. https://huggingface.co/datasets/microsoft/CLUES
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 12, 2023
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

This repo contains the data for the NeurIPS 2021 benchmark Constrained Language Understanding Evaluation Standard (CLUES).

Leaderboard

We maintain a Leaderboard allowing researchers to submit their results as entries.

Submission Instructions

Each submission must be submitted as a pull request modifying the markdown file underlying the leaderboard. The submission must attach an accompanying… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/CLUES.
f
Human bone marrow mononuclear cells
figshare.com
hdf
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marius Lange (2025). Human bone marrow mononuclear cells [Dataset]. http://doi.org/10.6084/m9.figshare.28302875.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28302875.v1
Dataset updated
Jan 29, 2025
Dataset provided by
figshare
Authors
Marius Lange
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The gene expression portion of the NeurIPS 2021 challenge 10x multiome dataset (Luecken et al., NeurIPS datasets and benchmarks track 2021), originally obtained from GEO. Contains single-cell gene expression of 69,249 cells for 13,431 genes. The adata.X field contains normalized data and adata.layers['counts'] contains raw expression values. We computed a latent space using scANVI (Xu et al., MSB 2021), following their tutorial.
Data from: LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive...
zenodo.org
data.niaid.nih.gov
pdf, zip
Updated Jul 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang Junjue; Wang Junjue; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei (2024). LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation [Dataset]. http://doi.org/10.5281/zenodo.5706578
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5706578
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wang Junjue; Wang Junjue; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei; Zheng Zhuo; Ma Ailong; Lu Xiaoyan; Zhong Yanfei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The benchmark code is available at: https://github.com/Junjue-Wang/LoveDA

Highlights:

5987 high spatial resolution (0.3 m) remote sensing images from Nanjing, Changzhou, and Wuhan

Focus on different geographical environments between Urban and Rural

Advance both semantic segmentation and domain adaptation tasks

Three considerable challenges: multi-scale objects, complex background samples, and inconsistent class distributions

Reference:

@inproceedings{wang2021loveda, title={Love{DA}: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation}, author={Junjue Wang and Zhuo Zheng and Ailong Ma and Xiaoyan Lu and Yanfei Zhong}, booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks}, editor = {J. Vanschoren and S. Yeung}, year={2021}, volume = {1}, pages = {}, url={https://datasets-benchmarks proceedings.neurips.cc/paper/2021/file/4e732ced3463d06de0ca9a15b6153677-Paper-round2.pdf} }

License:

The owners of the data and of the copyright on the data are RSIDEA, Wuhan University. Use of the Google Earth images must respect the "Google Earth" terms of use. All images and their associated annotations in LoveDA can be used for academic purposes only, but any commercial use is prohibited. (CC BY-NC-SA 4.0)

Data from: Datasets for a data-centric image classification benchmark for...

zenodo.org
openagrar.de

txt, zip

Updated Jul 5, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch (2023). Datasets for a data-centric image classification benchmark for noisy and ambiguous label estimation [Dataset]. http://doi.org/10.5281/zenodo.7180818

Explore at:

zip, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7180818

Dataset updated

Jul 5, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Description

This is the official data repository of the Data-Centric Image Classification (DCIC) Benchmark. The goal of this benchmark is to measure the impact of tuning the dataset instead of the model for a variety of image classification datasets. Full details about the collection process, the structure and automatic download at

Paper: https://arxiv.org/abs/2207.06214

Source Code: https://github.com/Emprime/dcic

The license information is given below as download.

Citation

Please cite as

@article{schmarje2022benchmark,
  author = {Schmarje, Lars and Grossmann, Vasco and Zelenka, Claudius and Dippel, Sabine and Kiko, Rainer and Oszust, Mariusz and Pastell, Matti and Stracke, Jenny and Valros, Anna and Volkmann, Nina and Koch, Reinahrd},
  journal = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
  title = {{Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation}},
  year = {2022}
}

Please see the full details about the used datasets below, which should also be cited as part of the license.

@article{schoening2020Megafauna,
author = {Schoening, T and Purser, A and Langenk{\"{a}}mper, D and Suck, I and Taylor, J and Cuvelier, D and Lins, L and Simon-Lled{\'{o}}, E and Marcon, Y and Jones, D O B and Nattkemper, T and K{\"{o}}ser, K and Zurowietz, M and Greinert, J and Gomes-Pereira, J},
doi = {10.5194/bg-17-3115-2020},
journal = {Biogeosciences},
number = {12},
pages = {3115--3133},
title = {{Megafauna community assessment of polymetallic-nodule fields with cameras: platform and methodology comparison}},
volume = {17},
year = {2020}
}

@article{Langenkamper2020GearStudy,
author = {Langenk{\"{a}}mper, Daniel and van Kevelaer, Robin and Purser, Autun and Nattkemper, Tim W},
doi = {10.3389/fmars.2020.00506},
issn = {2296-7745},
journal = {Frontiers in Marine Science},
title = {{Gear-Induced Concept Drift in Marine Images and Its Effect on Deep Learning Classification}},
volume = {7},
year = {2020}
}


@article{peterson2019cifar10h,
author = {Peterson, Joshua and Battleday, Ruairidh and Griffiths, Thomas and Russakovsky, Olga},
doi = {10.1109/ICCV.2019.00971},
issn = {15505499},
journal = {Proceedings of the IEEE International Conference on Computer Vision},
pages = {9616--9625},
title = {{Human uncertainty makes classification more robust}},
volume = {2019-Octob},
year = {2019}
}

@article{schmarje2019,
author = {Schmarje, Lars and Zelenka, Claudius and Geisen, Ulf and Gl{\"{u}}er, Claus-C. and Koch, Reinhard},
doi = {10.1007/978-3-030-33676-9_26},
issn = {23318422},
journal = {DAGM German Conference of Pattern Regocnition},
number = {November},
pages = {374--386},
publisher = {Springer},
title = {{2D and 3D Segmentation of uncertain local collagen fiber orientations in SHG microscopy}},
volume = {11824 LNCS},
year = {2019}
}

@article{schmarje2021foc,
author = {Schmarje, Lars and Br{\"{u}}nger, Johannes and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Kiko, Rainer and Koch, Reinhard},
doi = {10.3390/s21196661},
issn = {1424-8220},
journal = {Sensors},
number = {19},
pages = {6661},
title = {{Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy}},
volume = {21},
year = {2021}
}

@article{schmarje2022dc3,
author = {Schmarje, Lars and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Zelenka, Claudius and Kiko, Rainer and Stracke, Jenny and Volkmann, Nina and Koch, Reinhard},
journal = {Proceedings of the European Conference on Computer Vision (ECCV)},
title = {{A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering}},
year = {2022}
}


@article{obuchowicz2020qualityMRI,
author = {Obuchowicz, Rafal and Oszust, Mariusz and Piorkowski, Adam},
doi = {10.1186/s12880-020-00505-z},
issn = {1471-2342},
journal = {BMC Medical Imaging},
number = {1},
pages = {109},
title = {{Interobserver variability in quality assessment of magnetic resonance images}},
volume = {20},
year = {2020}
}


@article{stepien2021cnnQuality,
author = {St{\c{e}}pie{\'{n}}, Igor and Obuchowicz, Rafa{\l} and Pi{\'{o}}rkowski, Adam and Oszust, Mariusz},
doi = {10.3390/s21041043},
issn = {1424-8220},
journal = {Sensors},
number = {4},
title = {{Fusion of Deep Convolutional Neural Networks for No-Reference Magnetic Resonance Image Quality Assessment}},
volume = {21},
year = {2021}
}

@article{volkmann2021turkeys,
author = {Volkmann, Nina and Br{\"{u}}nger, Johannes and Stracke, Jenny and Zelenka, Claudius and Koch, Reinhard and Kemper, Nicole and Spindler, Birgit},
doi = {10.3390/ani11092655},
journal = {Animals 2021},
pages = {1--13},
title = {{Learn to train: Improving training data for a neural network to detect pecking injuries in turkeys}},
volume = {11},
year = {2021}
}

@article{volkmann2022keypoint,
author = {Volkmann, Nina and Zelenka, Claudius and Devaraju, Archana Malavalli and Br{\"{u}}nger, Johannes and Stracke, Jenny and Spindler, Birgit and Kemper, Nicole and Koch, Reinhard},
doi = {10.3390/s22145188},
issn = {1424-8220},
journal = {Sensors},
number = {14},
pages = {5188},
title = {{Keypoint Detection for Injury Identification during Turkey Husbandry Using Neural Networks}},
volume = {22},
year = {2022}
}

h
WRENCH
huggingface.co
Updated Apr 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jieyu Zhang (2023). WRENCH [Dataset]. https://huggingface.co/datasets/jieyuz2/WRENCH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 14, 2023
Authors
Jieyu Zhang
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for WRENCH

Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development and evaluation of your own weak supervision models within the benchmark. For more information, checkout the github repo and our publications:

WRENCH: A Comprehensive Benchmark for Weak Supervision (NeurIPS 2021) A Survey on Programmatic Weak Supervision

If you find this repository helpful, feel free to cite our publication:… See the full description on the dataset page: https://huggingface.co/datasets/jieyuz2/WRENCH.
h
plantnet300K
huggingface.co
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Hemberger (2024). plantnet300K [Dataset]. https://huggingface.co/datasets/mikehemberger/plantnet300K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 29, 2024
Authors
Mike Hemberger
Description
Dataset Card for "plantnet300K"

More Information needed Original Work is here: https://github.com/plantnet/PlantNet-300K @inproceedings{plantnet-300k, author = {C. Garcin and A. Joly and P. Bonnet and A. Affouard and \JC Lombardo and M. Chouet and M. Servajean and T. Lorieul and J. Salmon}, booktitle = {NeurIPS Datasets and Benchmarks 2021}, title = {{Pl@ntNet-300K}: a plant image dataset with high label ambiguity and a long-tailed distribution}, year = {2021}, }
z
RELLISUR: A Real Low-Light Image Super-Resolution Dataset
zenodo.org
zip
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Aakerberg; Kamal Nasrollahi; Thomas B. Moeslund; Andreas Aakerberg; Kamal Nasrollahi; Thomas B. Moeslund (2025). RELLISUR: A Real Low-Light Image Super-Resolution Dataset [Dataset]. http://doi.org/10.5281/zenodo.5234969
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5234969
Dataset updated
Jun 17, 2025
Dataset provided by
Zenodo
Authors
Andreas Aakerberg; Kamal Nasrollahi; Thomas B. Moeslund; Andreas Aakerberg; Kamal Nasrollahi; Thomas B. Moeslund
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The RELLISUR dataset contains real low-light low-resolution images paired with normal-light high-resolution reference image counterparts. This dataset aims to fill the gap between low-light image enhancement and low-resolution image enhancement (Super-Resolution (SR)) which is currently only being addressed separately in the literature, even though the visibility of real-world images is often limited by both low-light and low-resolution. The dataset contains 12750 paired images of different resolutions and degrees of low-light illumination, to facilitate learning of deep-learning based models that can perform a direct mapping from degraded images with low visibility to high-quality detail rich images of high resolution. The associated paper can be found here: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/7ef605fc8dba5425d6965fbd4c8fbe1f-Paper-round2.pdf
WildfireDB: An Open-Source Dataset Connecting Wildfire Spread with Relevant...
zenodo.org
zip
Updated Apr 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samriddhi Singla; Ayan Mukhopadhyay; Michael Wilbur; Tina Diao; Vinayak Gajjewar; Ahmed Eldawy; Mykel Kochenderfer; Ross Shachter; Abhishek Dubey; Samriddhi Singla; Ayan Mukhopadhyay; Michael Wilbur; Tina Diao; Vinayak Gajjewar; Ahmed Eldawy; Mykel Kochenderfer; Ross Shachter; Abhishek Dubey (2022). WildfireDB: An Open-Source Dataset Connecting Wildfire Spread with Relevant Determinants [Dataset]. http://doi.org/10.5281/zenodo.5636429
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5636429
Dataset updated
Apr 1, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samriddhi Singla; Ayan Mukhopadhyay; Michael Wilbur; Tina Diao; Vinayak Gajjewar; Ahmed Eldawy; Mykel Kochenderfer; Ross Shachter; Abhishek Dubey; Samriddhi Singla; Ayan Mukhopadhyay; Michael Wilbur; Tina Diao; Vinayak Gajjewar; Ahmed Eldawy; Mykel Kochenderfer; Ross Shachter; Abhishek Dubey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Modeling fire spread is critical in fire risk management. Creating data-driven models to forecast spread remains challenging due to the lack of comprehensive data sources that relate fires with relevant covariates. We present the first comprehensive and open-source dataset that relates historical fire data with relevant covariates such as weather, vegetation, and topography. Our dataset, named WildfireDB, contains over 17 million data points that capture how fires spread in the continental USA in the last decade. The paper accompanying this dataset is part of the 2021 Neural Information Processing Systems (NeurIPS) Dataset and Benchmark Track. The paper describes the algorithmic approach used to create and integrate the data, describe the dataset, and present benchmark results regarding data-driven models that can be learned to forecast the spread of wildfires.

Please see https://colab.research.google.com/drive/1cm2Z4E0HzXMAcuUrE26wHXL2FS_pIj3t?usp=sharing for an introduction about how to load the database using python (pandas).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Luke Zappia (2024). NeurIPS 2021 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25958374.v1

NeurIPS 2021 dataset

Explore at:

hdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.25958374.v1

Dataset updated

Jul 28, 2024

Dataset provided by

figshare

Authors

Luke Zappia

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt

Clear search

Close search

Google apps

Main menu

NeurIPS 2021 dataset

NeurIPS 2021 Benchmark dataset

CLUES

Human bone marrow mononuclear cells

Data from: LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive...

Data from: Datasets for a data-centric image classification benchmark for...

WRENCH

plantnet300K

RELLISUR: A Real Low-Light Image Super-Resolution Dataset

WildfireDB: An Open-Source Dataset Connecting Wildfire Spread with Relevant...

NeurIPS 2021 dataset