62 datasets found
  1. NeurIPS 2021 dataset

    • figshare.com
    hdf
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Zappia (2024). NeurIPS 2021 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25958374.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jul 28, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Luke Zappia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt

  2. h

    neurips-2025-papers

    • huggingface.co
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huy Dang (2025). neurips-2025-papers [Dataset]. https://huggingface.co/datasets/huyxdang/neurips-2025-papers
    Explore at:
    Dataset updated
    Nov 19, 2025
    Authors
    Huy Dang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    NeurIPS 2025 Papers Dataset

    This dataset contains all accepted papers from NeurIPS 2025, scraped from OpenReview.

      Dataset Statistics
    
    
    
    
    
      Overview
    

    Total Papers: 5772 Unique Paper IDs: 5772 ✅ No duplicate IDs

      Track Distribution
    

    Main Track: 5,275 papers (91.4%) Datasets and Benchmarks Track: 497 papers (8.6%)

      Award Distribution
    

    Poster: 4,949 papers (85.7%) Oral: 84 papers (1.5%) Spotlight: 739 papers (12.8%)

      Track × Award… See the full description on the dataset page: https://huggingface.co/datasets/huyxdang/neurips-2025-papers.
    
  3. NeurIPS 2021 Benchmark dataset

    • figshare.com
    hdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scverse; Malte Luecken (2023). NeurIPS 2021 Benchmark dataset [Dataset]. http://doi.org/10.6084/m9.figshare.22716739.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    scverse; Malte Luecken
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Subset of the benchmark dataset published in Luecken et al. (2021).

  4. Data from: Re-assembling the past: The RePAIR dataset and benchmark for real...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    txt, zip
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue (2024). Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving [Dataset]. http://doi.org/10.5281/zenodo.13993089
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue
    Description

    Accepted by NeurIPS 2024 Datasets and Benchmarks Track

    We introduce the RePair puzzle-solving dataset, a large-scale real world dataset of fractured frescoes from the archaelogical campus of Pompeii. Our dataset consists of over 1000 fractured frescoes. The RePAIR stands as a realistic computational challenge for methods for 2D and 3D puzzle solving, and serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding. Please visit our website for more dataset information, access to source code scripts and for an interactive gallery viewing of the dataset samples.

    Access the entire dataset

    We provide a compressed version of our dataset in two seperate files. One for the 2D version and one for the 3D version.

    Our full dataset contains over one thousand individual fractured fragments divided into groups with its corresponding folder and all compressed into their individual sub-set format regarding whether they are 2D or 3D. Regarding the 2D dataset, each fragment is saved as a .PNG image and each group has the corresponding ground truth transformation to solve the puzzle as a .TXT file. Considering the 3D dataset, each fragment is saved as a mesh using the widely .OBJ format with the corresponding material (.MTL) and texture (.PNG) file. The meshes are already in the assembled position and orientation, so that no additional information is needed. All additional metadata information are given as .JSON files.

    Important Note

    Please be advised that downloading and reusing this dataset is permitted only upon acceptance of the following license terms.

    The Istituto Italiano di Tecnologia (IIT) declares, and the user (“User”) acknowledges, that the "RePAIR puzzle-solving dataset" contains 3D scans, texture maps, rendered images and meta-data of fresco fragments acquired at the Archaeological Site of Pompeii. IIT is authorised to publish the RePAIR puzzle-solving dataset herein only for scientific and cultural purposes and in connection with an academic publication referenced as Tsemelis et al., "Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving", NeurIPS 2024. Use of the RePAIR puzzle-solving dataset by User is limited to downloading, viewing such images; comparing these with data or content in other datasets. User is not authorised to use, in particular explicitly excluding any commercial use nor in conjunction with the promotion of a commercial enterprise and/or its product(s) or service(s), reproduce, copy, distribute the RePAIR puzzle-solving dataset. User will not use the RePAIR puzzle-solving dataset in any way prohibited by applicable laws. RePAIR puzzle-solving dataset therein is being provided to User without warranty of any kind, either expressed or implied. User will be solely responsible for their use of such RePAIR puzzle-solving dataset. In no event shall IIT be liable for any damages arising from such use.

  5. h

    ConViS-Bench

    • huggingface.co
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    submission1335 (2025). ConViS-Bench [Dataset]. https://huggingface.co/datasets/submission1335/ConViS-Bench
    Explore at:
    Dataset updated
    Sep 20, 2025
    Authors
    submission1335
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset is associated to submission 1335 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). See the provided README for information about dataset downloading and running the evaluations.

  6. Data from: Datasets for a data-centric image classification benchmark for...

    • zenodo.org
    • openagrar.de
    txt, zip
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch (2023). Datasets for a data-centric image classification benchmark for noisy and ambiguous label estimation [Dataset]. http://doi.org/10.5281/zenodo.7180818
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lars Schmarje; Lars Schmarje; Vasco Grossmann; Vasco Grossmann; Claudius Zelenka; Claudius Zelenka; Sabine Dippel; Sabine Dippel; Rainer Kiko; Rainer Kiko; Mariusz Oszust; Mariusz Oszust; Matti Pastell; Matti Pastell; Jenny Stracke; Jenny Stracke; Anna Valros; Anna Valros; Nina Volkmann; Nina Volkmann; Reinhard Koch; Reinhard Koch
    Description

    This is the official data repository of the Data-Centric Image Classification (DCIC) Benchmark. The goal of this benchmark is to measure the impact of tuning the dataset instead of the model for a variety of image classification datasets. Full details about the collection process, the structure and automatic download at

    Paper: https://arxiv.org/abs/2207.06214

    Source Code: https://github.com/Emprime/dcic

    The license information is given below as download.

    Citation

    Please cite as

    @article{schmarje2022benchmark,
      author = {Schmarje, Lars and Grossmann, Vasco and Zelenka, Claudius and Dippel, Sabine and Kiko, Rainer and Oszust, Mariusz and Pastell, Matti and Stracke, Jenny and Valros, Anna and Volkmann, Nina and Koch, Reinahrd},
      journal = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
      title = {{Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation}},
      year = {2022}
    }

    Please see the full details about the used datasets below, which should also be cited as part of the license.

    @article{schoening2020Megafauna,
    author = {Schoening, T and Purser, A and Langenk{\"{a}}mper, D and Suck, I and Taylor, J and Cuvelier, D and Lins, L and Simon-Lled{\'{o}}, E and Marcon, Y and Jones, D O B and Nattkemper, T and K{\"{o}}ser, K and Zurowietz, M and Greinert, J and Gomes-Pereira, J},
    doi = {10.5194/bg-17-3115-2020},
    journal = {Biogeosciences},
    number = {12},
    pages = {3115--3133},
    title = {{Megafauna community assessment of polymetallic-nodule fields with cameras: platform and methodology comparison}},
    volume = {17},
    year = {2020}
    }
    
    @article{Langenkamper2020GearStudy,
    author = {Langenk{\"{a}}mper, Daniel and van Kevelaer, Robin and Purser, Autun and Nattkemper, Tim W},
    doi = {10.3389/fmars.2020.00506},
    issn = {2296-7745},
    journal = {Frontiers in Marine Science},
    title = {{Gear-Induced Concept Drift in Marine Images and Its Effect on Deep Learning Classification}},
    volume = {7},
    year = {2020}
    }
    
    
    @article{peterson2019cifar10h,
    author = {Peterson, Joshua and Battleday, Ruairidh and Griffiths, Thomas and Russakovsky, Olga},
    doi = {10.1109/ICCV.2019.00971},
    issn = {15505499},
    journal = {Proceedings of the IEEE International Conference on Computer Vision},
    pages = {9616--9625},
    title = {{Human uncertainty makes classification more robust}},
    volume = {2019-Octob},
    year = {2019}
    }
    
    @article{schmarje2019,
    author = {Schmarje, Lars and Zelenka, Claudius and Geisen, Ulf and Gl{\"{u}}er, Claus-C. and Koch, Reinhard},
    doi = {10.1007/978-3-030-33676-9_26},
    issn = {23318422},
    journal = {DAGM German Conference of Pattern Regocnition},
    number = {November},
    pages = {374--386},
    publisher = {Springer},
    title = {{2D and 3D Segmentation of uncertain local collagen fiber orientations in SHG microscopy}},
    volume = {11824 LNCS},
    year = {2019}
    }
    
    @article{schmarje2021foc,
    author = {Schmarje, Lars and Br{\"{u}}nger, Johannes and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Kiko, Rainer and Koch, Reinhard},
    doi = {10.3390/s21196661},
    issn = {1424-8220},
    journal = {Sensors},
    number = {19},
    pages = {6661},
    title = {{Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy}},
    volume = {21},
    year = {2021}
    }
    
    @article{schmarje2022dc3,
    author = {Schmarje, Lars and Santarossa, Monty and Schr{\"{o}}der, Simon-Martin and Zelenka, Claudius and Kiko, Rainer and Stracke, Jenny and Volkmann, Nina and Koch, Reinhard},
    journal = {Proceedings of the European Conference on Computer Vision (ECCV)},
    title = {{A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering}},
    year = {2022}
    }
    
    
    @article{obuchowicz2020qualityMRI,
    author = {Obuchowicz, Rafal and Oszust, Mariusz and Piorkowski, Adam},
    doi = {10.1186/s12880-020-00505-z},
    issn = {1471-2342},
    journal = {BMC Medical Imaging},
    number = {1},
    pages = {109},
    title = {{Interobserver variability in quality assessment of magnetic resonance images}},
    volume = {20},
    year = {2020}
    }
    
    
    @article{stepien2021cnnQuality,
    author = {St{\c{e}}pie{\'{n}}, Igor and Obuchowicz, Rafa{\l} and Pi{\'{o}}rkowski, Adam and Oszust, Mariusz},
    doi = {10.3390/s21041043},
    issn = {1424-8220},
    journal = {Sensors},
    number = {4},
    title = {{Fusion of Deep Convolutional Neural Networks for No-Reference Magnetic Resonance Image Quality Assessment}},
    volume = {21},
    year = {2021}
    }
    
    @article{volkmann2021turkeys,
    author = {Volkmann, Nina and Br{\"{u}}nger, Johannes and Stracke, Jenny and Zelenka, Claudius and Koch, Reinhard and Kemper, Nicole and Spindler, Birgit},
    doi = {10.3390/ani11092655},
    journal = {Animals 2021},
    pages = {1--13},
    title = {{Learn to train: Improving training data for a neural network to detect pecking injuries in turkeys}},
    volume = {11},
    year = {2021}
    }
    
    @article{volkmann2022keypoint,
    author = {Volkmann, Nina and Zelenka, Claudius and Devaraju, Archana Malavalli and Br{\"{u}}nger, Johannes and Stracke, Jenny and Spindler, Birgit and Kemper, Nicole and Koch, Reinhard},
    doi = {10.3390/s22145188},
    issn = {1424-8220},
    journal = {Sensors},
    number = {14},
    pages = {5188},
    title = {{Keypoint Detection for Injury Identification during Turkey Husbandry Using Neural Networks}},
    volume = {22},
    year = {2022}
    }

  7. scRNA-seq + scATAC-seq Challenge at NeurIPS 2021

    • kaggle.com
    zip
    Updated Sep 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2022). scRNA-seq + scATAC-seq Challenge at NeurIPS 2021 [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-scatacseq-challenge-at-neurips-2021
    Explore at:
    zip(2917180928 bytes)Available download formats
    Dataset updated
    Sep 16, 2022
    Authors
    Alexander Chervov
    Description

    Context

    Dataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"

    It is https://en.wikipedia.org/wiki/ATAC-seq#Single-cell_ATAC-seq single cell ATAC-seq data. And single cell RNA-seq data: https://en.wikipedia.org/wiki/Single-cell_transcriptomics#Single-cell_RNA-seq

    Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

    See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

    (For companion dataset on CITE-seq = scRNA-seq + Proteomics, see: https://www.kaggle.com/datasets/alexandervc/citeseqscrnaseqproteins-challenge-neurips2021)

    Particular data

    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122

    Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.

    Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.

    Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html

    Related datasets:

    Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

    Inspiration

    Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

    Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

    Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

    (Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

  8. CITE-seq=scRNA-seq+Proteins: Challenge NeurIPS2021

    • kaggle.com
    zip
    Updated Jan 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2023). CITE-seq=scRNA-seq+Proteins: Challenge NeurIPS2021 [Dataset]. https://www.kaggle.com/datasets/alexandervc/citeseqscrnaseqproteins-challenge-neurips2021
    Explore at:
    zip(646191284 bytes)Available download formats
    Dataset updated
    Jan 22, 2023
    Authors
    Alexander Chervov
    Description

    Context

    Dataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"

    CITE-seq - joint single cell RNA sequencing + single cell measurements of CD** proteins. (https://en.wikipedia.org/wiki/CITE-Seq) (For companion dataset on scRNA-seq + scATAC-seq, see: https://www.kaggle.com/datasets/alexandervc/scrnaseq-scatacseq-challenge-at-neurips-2021 )

    Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

    See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

    Particular data

    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122

    Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.

    Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.

    Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html

    Related datasets:

    Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

    Inspiration

    Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

    Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

    Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

    (Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

  9. LagrangeBench Datasets

    • zenodo.org
    zip
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur P. Toshev; Nikolaus A. Adams; Artur P. Toshev; Nikolaus A. Adams (2023). LagrangeBench Datasets [Dataset]. http://doi.org/10.5281/zenodo.10021926
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Artur P. Toshev; Nikolaus A. Adams; Artur P. Toshev; Nikolaus A. Adams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets from the Paper LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite at NeurIPS 2023 Track on Datasets and Benchmarks.

  10. h

    construct-validity-review

    • huggingface.co
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Bean (2025). construct-validity-review [Dataset]. https://huggingface.co/datasets/ambean/construct-validity-review
    Explore at:
    Dataset updated
    Jul 29, 2025
    Authors
    Andrew Bean
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the results of the review paper Measuring What Matters, presented at NeurIPS 2025 Datasets and Benchmarks Track.

  11. Human bone marrow mononuclear cells

    • figshare.com
    hdf
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marius Lange (2025). Human bone marrow mononuclear cells [Dataset]. http://doi.org/10.6084/m9.figshare.28302875.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jan 29, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Marius Lange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The gene expression portion of the NeurIPS 2021 challenge 10x multiome dataset (Luecken et al., NeurIPS datasets and benchmarks track 2021), originally obtained from GEO. Contains single-cell gene expression of 69,249 cells for 13,431 genes. The adata.X field contains normalized data and adata.layers['counts'] contains raw expression values. We computed a latent space using scANVI (Xu et al., MSB 2021), following their tutorial.

  12. TreeFinder

    • kaggle.com
    zip
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bibi 9 (2025). TreeFinder [Dataset]. https://www.kaggle.com/datasets/zhihaow/tree-finder
    Explore at:
    zip(1965923052 bytes)Available download formats
    Dataset updated
    Oct 24, 2025
    Authors
    Bibi 9
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🌲 TreeFinder: TreeFinder: A US-Scale Benchmark Dataset for Individual Tree Mortality Monitoring Using High-Resolution Aerial Imagery

    Accepted to NeurIPS 2025 (Datasets & Benchmarks Track)

    TreeFinder is the first large-scale, high-resolution benchmark dataset for mapping individual dead trees across the contiguous United States (CONUS). Built to advance computer vision methods for ecological monitoring and carbon assessment, TreeFinder provides pixel-level annotations of dead trees from high-resolution aerial imagery, enriched with ecological metadata and paired with performance benchmarks.

    📦 What's in the Dataset?

    • 1,000 Sites across 48 U.S. States
      • Spatially diverse sampling of forested regions across CONUS
    • 23,000 Hectares of 0.6m NAIP Imagery
      • Aerial imagery from the National Agriculture Imagery Program (NAIP), including 4 channels (RGB + NIR)
    • 20,000+ Manually Annotated Dead Trees
      • Pixel-level masks created through expert labeling and validated via multi-temporal image comparison
    • ML-Ready Patches
      • Each raw scene is tiled into $224 \times 224$ patches for deep learning, with associated segmentation masks

    🧠 Benchmark Models

    We provide benchmark performance results using five semantic segmentation models, including: - U-Net and DeepLabV3+ (CNN-based) - ViT, SegFormer, and Mask2Former (Transformer-based) - DOFA (a multimodal foundation model trained on satellite data) Each model is trained and evaluated across various domain generalization settings (e.g., region, climate, forest type) to test robustness.

    🗺️ Metadata & Scenarios

    Each patch is enriched with: - Geographic Coordinates - Köppen–Geiger Climate Zone - Primary Tree Type (from USDA Forest Service maps)

    These metadata enable benchmarking under challenging scenarios like: - Cross-region generalization (e.g., East → West) - Climate domain shifts - Forest type transfer

    🧪 Why Use TreeFinder?

    TreeFinder enables the development and evaluation of machine learning models for high-impact environmental tasks such as: - Forest health monitoring - Carbon flux modeling - Wildfire risk assessment It is designed to foster cross-disciplinary collaboration between the machine learning and Earth science communities by providing a reproducible, challenging, and ecologically grounded benchmark.

    📚 Citation

    If you use TreeFinder in your research, please cite the following paper:

    Zhihao Wang, Cooper Li, Ruichen Wang, Lei Ma, George Hurtt, Xiaowei Jia, Gengchen Mai, Zhili Li, Yiqun Xie.
    TreeFinder: A US-Scale Benchmark Dataset for Individual Tree Mortality Monitoring Using High-Resolution Aerial Imagery.
    In Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), Datasets and Benchmarks Track, 2025.

  13. h

    MedSG-Bench

    • huggingface.co
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MedSG-Bench (2025). MedSG-Bench [Dataset]. https://huggingface.co/datasets/MedSG-Bench/MedSG-Bench
    Explore at:
    Dataset updated
    Sep 28, 2025
    Authors
    MedSG-Bench
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🖥 MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

    📖 Paper | 💻 Code | 🤗 Dataset

    🔥 MedSG-Bench is accepted at NeurIPS 2025 Datasets and Benchmarks Track as a Spotlight.

      MedSG-Bench
    

    MedSG-Bench is the first benchmark for medical image sequences grounding. 👉 We also provide MedSG-188K, a grounding instruction-tuning dataset. 👉 MedSeq-Grounder, the model trained on MedSG-188K, is available here.

      Metadata
    

    This dataset… See the full description on the dataset page: https://huggingface.co/datasets/MedSG-Bench/MedSG-Bench.

  14. Causal Machine Learning Benchmark Datasets

    • figshare.com
    zip
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damian Machlanski (2025). Causal Machine Learning Benchmark Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.30244936.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Damian Machlanski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This collection includes four well-known datasets used to benchmark causal machine learning algorithms. The datasets are: IHDP [1], News [2], Twins [3], Jobs [4].References[1] J. L. Hill, ‘Bayesian Nonparametric Modeling for Causal Inference’, Journal of Computational and Graphical Statistics, vol. 20, no. 1, pp. 217–240, Jan. 2011, doi: 10.1198/jcgs.2010.08162.[2] F. D. Johansson, U. Shalit, and D. Sontag, ‘Learning representations for counterfactual inference’, in Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, in ICML’16. New York, NY, USA: JMLR.org, Jun. 2016, pp. 3020–3029.[3] C. Louizos, U. Shalit, J. M. Mooij, D. Sontag, R. Zemel, and M. Welling, ‘Causal Effect Inference with Deep Latent-Variable Models’, Advances in Neural Information Processing Systems, vol. 30, 2017, Accessed: May 25, 2021. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/94b5bde6de888ddf9cde6748ad2523d1-Abstract.html[4] J. A. Smith and P. E. Todd, ‘Does matching overcome LaLonde’s critique of nonexperimental estimators?’, Journal of Econometrics, vol. 125, no. 1–2, pp. 305–353, 2005.

  15. d

    NIPS2025D&B_AMPBenchmark

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wan, Boyao (2025). NIPS2025D&B_AMPBenchmark [Dataset]. http://doi.org/10.7910/DVN/E9A88D
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Wan, Boyao
    Description

    Dataset for "A Benchmark for Antimicrobial Peptide Recognition Based on Structure and Sequence Representation" at NeurIPS 2025 Dataset and Benchmark

  16. h

    InspiredFromHetionet

    • huggingface.co
    Updated Oct 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anirban Das (2025). InspiredFromHetionet [Dataset]. https://huggingface.co/datasets/axd353/InspiredFromHetionet
    Explore at:
    Dataset updated
    Oct 20, 2025
    Authors
    Anirban Das
    License

    Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
    License information was derived automatically

    Description

    InspiredFromHetionet

    License: Creative Commons Attribution–NonCommercial 2.0 (CC BY-NC 2.0) This dataset is part of the When No Paths Lead to Rome benchmark for systematic neural relational reasoning,accepted for NeurIPS 2025 Datasets and Benchmarks Track.

  17. Breaking Bad Dataset

    • kaggle.com
    zip
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dazitu616 (2023). Breaking Bad Dataset [Dataset]. https://www.kaggle.com/datasets/dazitu616/breaking-bad-dataset/code
    Explore at:
    zip(1150713325 bytes)Available download formats
    Dataset updated
    Feb 16, 2023
    Authors
    Dazitu616
    Description

    Dataset accompanying the NeurIPS 2022 Dataset and Benchmark Track paper: Breaking Bad: A Dataset for Geometric Fracture and Reassembly. Please refer to our project page for more details.

    License: The Breaking Bad dataset collects 3D meshes from ShapeNet and Thingi10K thus inheriting their terms of use. Please refer to ShapeNet and Thingi10K for more details. We release each model in our dataset with an as-permissive-as-possible license compatible with its underlying base model. Please refer to ShapeNet and Thingi10K for restrictions and depositor requirements of each model.

  18. Z

    Data from: LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive...

    • data.niaid.nih.gov
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junjue, Wang; Zhuo, Zheng; Ailong, Ma; Xiaoyan, Lu; Yanfei, Zhong (2024). LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5706577
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Wuhan University
    Authors
    Junjue, Wang; Zhuo, Zheng; Ailong, Ma; Xiaoyan, Lu; Yanfei, Zhong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The benchmark code is available at: https://github.com/Junjue-Wang/LoveDA

    Highlights:

    5987 high spatial resolution (0.3 m) remote sensing images from Nanjing, Changzhou, and Wuhan

    Focus on different geographical environments between Urban and Rural

    Advance both semantic segmentation and domain adaptation tasks

    Three considerable challenges: multi-scale objects, complex background samples, and inconsistent class distributions

    Reference:

    @inproceedings{wang2021loveda, title={Love{DA}: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation}, author={Junjue Wang and Zhuo Zheng and Ailong Ma and Xiaoyan Lu and Yanfei Zhong}, booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks}, editor = {J. Vanschoren and S. Yeung}, year={2021}, volume = {1}, pages = {}, url={https://datasets-benchmarks proceedings.neurips.cc/paper/2021/file/4e732ced3463d06de0ca9a15b6153677-Paper-round2.pdf} }

    License:

    The owners of the data and of the copyright on the data are RSIDEA, Wuhan University. Use of the Google Earth images must respect the "Google Earth" terms of use. All images and their associated annotations in LoveDA can be used for academic purposes only, but any commercial use is prohibited. (CC BY-NC-SA 4.0)

  19. OLIVES - VIP CUP 2023

    • kaggle.com
    zip
    Updated Aug 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salman Khondker (2023). OLIVES - VIP CUP 2023 [Dataset]. https://www.kaggle.com/datasets/salmankhondker/olives-vip-cup-2023
    Explore at:
    zip(34320657194 bytes)Available download formats
    Dataset updated
    Aug 31, 2023
    Authors
    Salman Khondker
    Description

    Original Dataset

    https://zenodo.org/records/7105232

    Citation

    @inproceedings{prabhushankarolives2022,
    title={OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics},
    author={Prabhushankar, Mohit and Kokilepersaud, Kiran and Logan, Yash-yee and Trejo Corona, Stephanie and AlRegib, Ghassan and Wykoff, Charles},
    booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2 (NeurIPS Datasets and Benchmarks 2022) },
    year={2022}
    }
    
  20. MatSeg: Material State Segmentation Dataset and Benchmark

    • zenodo.org
    zip
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). MatSeg: Material State Segmentation Dataset and Benchmark [Dataset]. http://doi.org/10.5281/zenodo.11331618
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    MatSeg Dataset and benchmark for zero-shot material state segmentation.

    MatSeg Benchmark containing 1220 real-world images and their annotations is available at MatSeg_Benchmark.zip the file contains documentation and Python readers.

    MatSeg dataset containing synthetic images with infused natural images patterns is available at MatSeg3D_part_*.zip and MatSeg3D_part_*.zip (* stand for number).

    MatSeg3D_part_*.zip: contain synthethc 3D scenes

    MatSeg2D_part_*.zip: contain syntethc 2D scenes

    Readers and documentation for the synthetic data are available at: Dataset_Documentation_And_Readers.zip

    Readers and documentation for the real-images benchmark are available at: MatSeg_Benchmark.zip

    The Code used to generate the MatSeg Dataset is available at: https://zenodo.org/records/11401072

    Additional permanent sources for downloading the dataset and metadata: 1, 2

    Evaluation scripts for the Benchmark are now available at:

    https://zenodo.org/records/13402003 and https://e.pcloud.link/publink/show?code=XZsP8PZbT7AJzG98tV1gnVoEsxKRbBl8awX

    Description

    Materials and their states form a vast array of patterns and textures that define the physical and visual world. Minerals in rocks, sediment in soil, dust on surfaces, infection on leaves, stains on fruits, and foam in liquids are some of these almost infinite numbers of states and patterns.

    Image segmentation of materials and their states is fundamental to the understanding of the world and is essential for a wide range of tasks, from cooking and cleaning to construction, agriculture, and chemistry laboratory work.

    The MatSeg dataset focuses on zero-shot segmentation of materials and their states, meaning identifying the region of an image belonging to a specific material type of state, without previous knowledge or training of the material type, states, or environment.

    The dataset contains a large set of (100k) synthetic images and benchmarks of 1220 real-world images for testing.

    Benchmark

    The benchmark contains 1220 real-world images with a wide range of material states and settings. For example: food states (cooked/burned..), plants (infected/dry.) to rocks/soil (minerals/sediment), construction/metals (rusted, worn), liquids (foam/sediment), and many other states in without being limited to a set of classes or environment. The goal is to evaluate the segmentation of material materials without knowledge or pretraining on the material or setting. The focus is on materials with complex scattered boundaries, and gradual transition (like the level of wetness of the surface).

    Evaluation scripts for the Benchmark are now available at: 1 and 2.

    Synthetic Dataset

    The synthetic dataset is composed of synthetic scenes rendered in 2d and 3d using a blender. The synthetic data is infused with patterns, materials, and textures automatically extracted from real images allowing it to capture the complexity and diversity of the real world while maintaining the precision and scale of synthetic data. 100k images and their annotation are available to download.

    License

    This dataset, including all its components, is released under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. To the extent possible under law, the authors have dedicated all copyright and related and neighboring rights to this dataset to the public domain worldwide. This dedication applies to the dataset and all derivative works.

    The MatSeg 2D and 3D synthetic were generated using the open-images dataset which is licensed under the https://www.apache.org/licenses/LICENSE-2.0. For these components, you must comply with the terms of the Apache License. In addition, the MatSege3D dataset uses Shapenet 3D assets with GNU license.

    Example Usage:

    An Example of a training and evaluation code for a net trained on the dataset and evaluated on the benchmark is given at these urls: 1, 2

    This include an evaluation script on the MatSeg benchmark.

    Training script using the MatSeg dataset.

    And weights of a trained model

    Paper:

    More detail on the work ca be found in the paper "Infusing Synthetic Data with Real-World Patterns for
    Zero-Shot Material State Segmentation"

    Croissant metadata and additional sources for downloading the dataset are available at 1,2

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Luke Zappia (2024). NeurIPS 2021 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25958374.v1
Organization logoOrganization logo

NeurIPS 2021 dataset

Explore at:
hdfAvailable download formats
Dataset updated
Jul 28, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Luke Zappia
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

NeurIPS 2021 dataset used for benchmarking feature selection for integration in H5AD format. Files contain the full raw dataset, the processed batches used to create the reference and the processed batches used as a query.Note: These files have been saved with compression to reduce file size. Re-saving without compression will reduce reading times if needed.If used, please cite:Lance C, Luecken MD, Burkhardt DB, Cannoodt R, Rautenstrauch P, Laddach A, et al. Multimodal single cell data integration challenge: Results and lessons learned. In: Kiela D, Ciccone M, Caputo B, editors. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. PMLR; 06--14 Dec 2022. p. 162–76. Available from: https://proceedings.mlr.press/v176/lance22a.htmlANDLuecken MD, Burkhardt DB, Cannoodt R, Lance C, Agrawal A, Aliee H, et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2022 [cited 2022 Nov 8]. Available from: https://openreview.net/pdf?id=gN35BGa1Rt

Search
Clear search
Close search
Google apps
Main menu