Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset is associated to submission 1335 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). See the provided README for information about dataset downloading and running the evaluations.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is the benchmark associated with submission 1331 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). The .jsonl files do not contain proper image paths but rather image path templates, as each .jsonl entry is a sample, and each sample corresponds to a different environment with its own images. See the submitted code README for information about dataset downloading and preprocessing, and to… See the full description on the dataset page: https://huggingface.co/datasets/submission1331/ProactiveBench.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset for "A Benchmark for Antimicrobial Peptide Recognition Based on Structure and Sequence Representation" at NeurIPS 2025 Dataset and Benchmark
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
SIMSHIFT: A Benchmark for Adapting Neural Surrogates to Distribution Shifts
This is the official data repository to the NeurIPS 2025 Datasets & Benchmarks Track Submission.
Usage
We provide dataset loading utilities and full training and evaluation pipelines in the accompanying code repository that will be released upon publication.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Introduction
CodeR-Pile. Dataset of submission to NeurIPS 2025 Datasets and Benchmarks Track. Under review.
Load Dataset
An example to load the dataset: import datasets
task_to_main_task_type = { # text2code ## seed tasks "web_code_retrieval": "text2code", "code_contest_retrieval": "text2code", "text2sql_retrieval": "text2code", ## new tasks "error_message_retrieval": "text2code"… See the full description on the dataset page: https://huggingface.co/datasets/nebula2025/CodeR-Pile.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
NLID: A Large-Scale Neuromorphic Liquid Identification Dataset
NeurIPS 2025 Datasets and Benchmarks Track paper: Bubbles Talk: A Neuromorphic Dataset for Liquid Identification from Pouring process.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the first dataset of the NeurIPS 2025 submission: The SafePowerGraph Benchmark: Toward Reliable and Realistic Graph Learning in Power Grids.
TMGBench: TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs
This repository contains the code, data, and metadata for our NeurIPS 2025 Datasets and Benchmarks submission: TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs. The benchmark evaluates the strategic reasoning of large language models using 2x2 matrix games with narrative contexts and theory-of-mind variations.
Directory Structure… See the full description on the dataset page: https://huggingface.co/datasets/pinkex/TMGBench.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
NeurIPS 2025 Dataset Track Submission #23
The repository provides the dataset for VideoMathQA benchmark. We provide a task implementation compatible with lmms_eval to improve the reproducibility of our experiments.
🏆 VideoMathQA Leaderboard (MCQ & MBin with Subtitles)
Rank Model Size MCQ (Subtitles) 🔍 MBin (Subtitles) 📊 Model Type
1️⃣
61.4 44.2 🔒 Proprietary
2️⃣ Qwen2.5-VL (72B) 72B 36.9 28.6 🪶 Open Weights
3️⃣ InternVL3 (78B) 78B 37.1… See the full description on the dataset page: https://huggingface.co/datasets/annonymousa378/neurips_submission_23.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
🖼️ COinCO: Common Inpainted Objects In-N-Out of Context
Authors: Tianze Yang*, Tyson Jordan*, Ninghao Liu, Jin Sun*Equal contributionAffiliation: University of GeorgiaStatus: Submitted to NeurIPS 2025 Datasets and Benchmarks Track (under review)
📦 1. Dataset Overview
The COinCO dataset is a large-scale benchmark constructed from the COCO dataset to study object-scene contextual relationships via inpainting. Each image in COinCO contains one inpainted object, and… See the full description on the dataset page: https://huggingface.co/datasets/ytz009/COinCO.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
📊 MMLongBench-Doc Evaluation Results
Official evaluation results: GPT-4.1 (2025-04-14) & GPT-4o (2024-11-20) 📄 Paper: MMLongBench-Doc, NeurIPS 2024 Datasets and Benchmarks Track (Spotlight)
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
EgoGazeVQA-91 • NeurIPS 2025 Datasets & Benchmarks submission
In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting
1 Folder layout
EgoGazeVQA-91-nips25DB/
├── qa_pairs/ # VQA supervision
│ ├── causal_ego4d.{csv,json}
│ ├── spatial_ego4d.{csv,json}
│ ├── temporal_ego4d.{csv,json}
│ └── ...
├── keyframe_tar/
│ ├── ego4d.tar.gz
│ ├── egoexo.tar.gz
│ └── egtea.tar.gz
└──… See the full description on the dataset page: https://huggingface.co/datasets/anonupload/EgoGazeVQA-91-nips25DB.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
M3DRS: Multi-Modal Multi-Resolution Remote Sensing Dataset
This repository hosts the M3DRS dataset, a comprehensive collection of 5-channel remote sensing images (RGB, NIR, nDSM) from Switzerland, France, and Italy. The dataset is unlabelled and specifically designed to support self-supervised learning tasks. It is part of our submission to the NeurIPS 2025 Datasets and Benchmarks Track. The dataset is organized into three folders, each containing ZIP archives of images grouped by… See the full description on the dataset page: https://huggingface.co/datasets/heig-vd-geo/M3DRS.
MTVBenth: Benchmarking Video Generation Models with Multiple Transition Text Prompts
Submission to NeurIPS 2025 Dataset and Benchmark track. Xiaodong Cun, Xiuli Bi, Ruihuan Yang, Jianfei Yuan, Bin Xiao, Bo Liu. GVC Lab, Great Bay University, and Chongqing University of Posts and Telecommunications We list all the generated videos here in each folder. The prompt benchmark we collected is shown in the prompt folder
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Repository for the 3D-ADAM (3D Anomaly Detection in Advanced Manufacturing) Dataset. Submitted to NeurIPS 2025 - Datasets and Benchmarks Track. This project has been supported by the 1851 Royal Commission and HAL Robotics
license: cc-by-nc-sa-4.0
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Osiris: A Scalable Dataset Generation Pipeline for Machine Learning in Analog Circuit Design
Osiris is an end-to-end analog circuits design pipeline capable of producing, validating, and evaluating layouts for generic analog circuits. The Osiris GitHub repository hosts the code that implements the randomized pipeline as well as the reinforcement learning-driven baseline methodology discussed in the paper proposed at the NeurIPS 2025 Datasets & Benchmarks Track. The Osiris 🤗… See the full description on the dataset page: https://huggingface.co/datasets/hardware-fab/osiris.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
EmbedMol: An Open Billion-scale Molecular Embedding Dataset for Molecular Discovery
(For NeurIPS 2025 Datasets and Benchmark Track Review ONLY)
Description of the Data and File Structure
This repository contains the complete EmbedMol-1B dataset, partitioned into 40 parts. Each part is provided as a tarball (e.g., embedmol-1b-part-39.tar.gz). Inside each tarball, the embeddings are further split into four batches, with each batch stored as a compressed NumPy array of… See the full description on the dataset page: https://huggingface.co/datasets/zhiyaot/EmbedMol.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
EgoExOR: An Egocentric–Exocentric Operating Room Dataset for Comprehensive Understanding of Surgical Activities
Official code of the paper "EgoExOR: An Egocentric–Exocentric Operating Room Dataset for Comprehensive Understanding of Surgical Activities" submitted at NeurIPS 2025 Datasets & Benchmarks Track. Authors: Ege Özsoy, Arda Mamur, Felix Tristram, Chantal Pellegrini… See the full description on the dataset page: https://huggingface.co/datasets/ardamamur/EgoExOR.
🅿️ MVVLP: A Multi-View Benchmark Dataset for Vision-and-Language Navigation in Real-World Parking Scenarios
Paper: MVVLP: A Multi-View Benchmark Dataset for Vision-and-Language Navigation in Real-World Parking Scenarios Conference: NeurIPS 2025 (submitted)Authors: Pengyu Fu*, Jincheng Hu*, Jihao Li*, Ming Liu, Jingjing Jiang, Yuanjian Zhang All code is available on GitHub.
📌 Dataset Summary
MVVLP includes a multi-view image dataset collected in real parking lots, a… See the full description on the dataset page: https://huggingface.co/datasets/TJIET/MVVLP.
🥭 FruitBench: A Multimodal Benchmark for Fruit Growth Understanding
Paper: FruitBench: A Multimodal Benchmark for Comprehensive Fruit Growth Understanding in Real-World AgricultureConference: NeurIPS 2025 (submitted)Authors: Jihao Li*, Jincheng Hu*, Pengyu Fu*, Ming Liu, et al.
📌 Dataset Summary
FruitBench is the first large-scale multimodal benchmark designed to evaluate vision-language models on real-world agricultural understanding. It focuses on fruit growth… See the full description on the dataset page: https://huggingface.co/datasets/TJIET/FruitBench.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset is associated to submission 1335 at NeurIPS 2025 - Dataset and Benchmarks track. The benchmark is intended to be used with the proposed submission environments (see the source code). See the provided README for information about dataset downloading and running the evaluations.