100+ datasets found

n
Benchmark data sets
narcis.nl
data.mendeley.com
Updated Dec 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tong, H (via Mendeley Data) (2017). Benchmark data sets [Dataset]. http://doi.org/10.17632/923xvkk5mm.1
Explore at:
Unique identifier
https://doi.org/10.17632/923xvkk5mm.1
Dataset updated
Dec 27, 2017
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Tong, H (via Mendeley Data)
Description
A total of 12 software defect data sets from NASA were used in this study, where five data sets (part I) including CM1, JM1, KC1, KC2, and PC1 are obtained from PROMISE software engineering repository (http://promise.site.uottawa.ca/SERepository/), the other seven data sets (part II) are obtained from tera-PROMISE Repository (http://openscience.us/repo/defect/mccabehalsted/).

Data from: Imbalanced dataset for benchmarking

data.niaid.nih.gov
zenodo.org

Updated Jan 24, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Nogueira, Fernando (2020). Imbalanced dataset for benchmarking [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_61452

Explore at:

Dataset updated

Jan 24, 2020

Dataset provided by

Aridas, Christos K.
Lemaitre, Guillaume
Oliveira, Dayvid V. R.
Nogueira, Fernando

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

Imbalanced dataset for benchmarking

The different algorithms of the imbalanced-learn toolbox are evaluated on a set of common dataset, which are more or less balanced. These benchmark have been proposed in [1]. The following section presents the main characteristics of this benchmark.

Characteristics

ID	Name	Repository & Target	Ratio	# samples	# features
1	Ecoli	UCI, target: imU	8.6:1	336	7
2	Optical Digits	UCI, target: 8	9.1:1	5,620	64
3	SatImage	UCI, target: 4	9.3:1	6,435	36
4	Pen Digits	UCI, target: 5	9.4:1	10,992	16
5	Abalone	UCI, target: 7	9.7:1	4,177	8
6	Sick Euthyroid	UCI, target: sick euthyroid	9.8:1	3,163	25
7	Spectrometer	UCI, target: >=44	11:1	531	93
8	Car_Eval_34	UCI, target: good, v good	12:1	1,728	6
9	ISOLET	UCI, target: A, B	12:1	7,797	617
10	US Crime	UCI, target: >0.65	12:1	1,994	122
11	Yeast_ML8	LIBSVM, target: 8	13:1	2,417	103
12	Scene	LIBSVM, target: >one label	13:1	2,407	294
13	Libras Move	UCI, target: 1	14:1	360	90
14	Thyroid Sick	UCI, target: sick	15:1	3,772	28
15	Coil_2000	KDD, CoIL, target: minority	16:1	9,822	85
16	Arrhythmia	UCI, target: 06	17:1	452	279
17	Solar Flare M0	UCI, target: M->0	19:1	1,389	10
18	OIL	UCI, target: minority	22:1	937	49
19	Car_Eval_4	UCI, target: vgood	26:1	1,728	6
20	Wine Quality	UCI, wine, target: <=4	26:1	4,898	11
21	Letter Img	UCI, target: Z	26:1	20,000	16
22	Yeast _ME2	UCI, target: ME2	28:1	1,484	8
23	Webpage	LIBSVM, w7a, target: minority	33:1	49,749	300
24	Ozone Level	UCI, ozone, data	34:1	2,536	72
25	Mammography	UCI, target: minority	42:1	11,183	6
26	Protein homo.	KDD CUP 2004, minority	111:1	145,751	74
27	Abalone_19	UCI, target: 19	130:1	4,177	8

References

[1] Ding, Zejin, "Diversified Ensemble Classifiers for H ighly Imbalanced Data Learning and their Application in Bioinformatics." Dissertation, Georgia State University, (2011).

[2] Blake, Catherine, and Christopher J. Merz. "UCI Repository of machine learning databases." (1998).

[3] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011): 27.

[4] Caruana, Rich, Thorsten Joachims, and Lars Backstrom. "KDD-Cup 2004: results and analysis." ACM SIGKDD Explorations Newsletter 6.2 (2004): 95-108.

h
tabular-benchmark
huggingface.co
opendatalab.com
Updated Dec 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SODA (2022). tabular-benchmark [Dataset]. https://huggingface.co/datasets/inria-soda/tabular-benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2022
Dataset authored and provided by
SODA
License
https://choosealicense.com/licenses/undefined/https://choosealicense.com/licenses/undefined/
Description
Tabular Benchmark

Dataset Description

This dataset is a curation of various datasets from openML and is curated to benchmark performance of various machine learning algorithms.

Repository: https://github.com/LeoGrin/tabular-benchmark/community Paper: https://hal.archives-ouvertes.fr/hal-03723551v2/document

Dataset Summary

Benchmark made of curation of various tabular data learning tasks, including:

Regression from Numerical and Categorical Features… See the full description on the dataset page: https://huggingface.co/datasets/inria-soda/tabular-benchmark.
d
Benchmark dataset for graph classification
search.dataone.org
dataverse.azure.uit.no
+1more
Updated Jan 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bianchi, Filippo Maria (2024). Benchmark dataset for graph classification [Dataset]. http://doi.org/10.18710/TIZ9II
Explore at:
Unique identifier
https://doi.org/10.18710/TIZ9II
Dataset updated
Jan 5, 2024
Dataset provided by
DataverseNO
Authors
Bianchi, Filippo Maria
Description
This repository contains datasets to quickly test graph classification algorithms, such as Graph Kernels and Graph Neural Networks. The purpose of this dataset is to make the features on the nodes and the adjacency matrix to be completely uninformative if considered alone. Therefore, an algorithm that relies only on the node features or on the graph structure will fail to achieve good classification results. A more detailed description of the dataset construction can be found on the Github page (https://github.com/FilippoMB/Benchmark_dataset_for_graph_classification), in the original publication and in the original publication: Bianchi, Filippo Maria, Claudio Gallicchio, and Alessio Micheli. "Pyramidal Reservoir Graph Neural Network." Neurocomputing 470 (2022): 389-404, and in the README.txt file.
Data from: Bio-logger Ethogram Benchmark: A benchmark for computational...
zenodo.org
portalcienciaytecnologia.jcyl.es
+4more
csv, zip
Updated Apr 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Hoffman; Benjamin Hoffman; Maddie Cusimano; Maddie Cusimano; Vittorio Baglione; Vittorio Baglione; Daniela Canestrari; Daniela Canestrari; Damien Chevallier; Damien Chevallier; Dominic L. DeSantis; Dominic L. DeSantis; Lorène Jeantet; Lorène Jeantet; Monique A. Ladds; Monique A. Ladds; Takuya Maekawa; Takuya Maekawa; Vicente Mata-Silva; Vicente Mata-Silva; Víctor Moreno-González; Víctor Moreno-González; Eva Trapote; Eva Trapote; Outi Vainio; Outi Vainio; Antti Vehkaoja; Antti Vehkaoja; Ken Yoda; Ken Yoda; Katherine Zacarian; Katherine Zacarian; Ari Friedlaender; Ari Friedlaender (2024). Bio-logger Ethogram Benchmark: A benchmark for computational analysis of animal behavior, using animal-borne tags [Dataset]. http://doi.org/10.5281/zenodo.10982620
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10982620
Dataset updated
Apr 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Hoffman; Benjamin Hoffman; Maddie Cusimano; Maddie Cusimano; Vittorio Baglione; Vittorio Baglione; Daniela Canestrari; Daniela Canestrari; Damien Chevallier; Damien Chevallier; Dominic L. DeSantis; Dominic L. DeSantis; Lorène Jeantet; Lorène Jeantet; Monique A. Ladds; Monique A. Ladds; Takuya Maekawa; Takuya Maekawa; Vicente Mata-Silva; Vicente Mata-Silva; Víctor Moreno-González; Víctor Moreno-González; Eva Trapote; Eva Trapote; Outi Vainio; Outi Vainio; Antti Vehkaoja; Antti Vehkaoja; Ken Yoda; Ken Yoda; Katherine Zacarian; Katherine Zacarian; Ari Friedlaender; Ari Friedlaender
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the datasets and experiment results presented in our arxiv paper:

B. Hoffman, M. Cusimano, V. Baglione, D. Canestrari, D. Chevallier, D. DeSantis, L. Jeantet, M. Ladds, T. Maekawa, V. Mata-Silva, V. Moreno-González, A. Pagano, E. Trapote, O. Vainio, A. Vehkaoja, K. Yoda, K. Zacarian, A. Friedlaender, "A benchmark for computational analysis of animal behavior, using animal-borne tags," 2023.

Standardized code to implement, train, and evaluate models can be found at https://github.com/earthspecies/BEBE/.

Please note the licenses in each dataset folder.

Zip folders beginning with "formatted": These are the datasets we used to run the experiments reported in the benchmark paper.

Zip folders beginning with "raw": These are the unprocessed datasets used in BEBE. Code to process these raw datasets into the formatted ones used by BEBE can be found at https://github.com/earthspecies/BEBE-datasets/.

Zip folders beginning with "experiments": Results of the cross-validation experiments reported in the paper, as well as hyperparameter optimization. Confusion matrices for all experiments can also be found here. Note that dt, rf, and svm refer to the feature set from Nathan et al., 2012.

Results used in Fig. 4 of arxiv paper (deep neural networks vs. classical models)
{dataset}_ harnet_nogyr
{dataset}_CRNN
{dataset}_CNN
{dataset}_dt
{dataset}_rf
{dataset}_svm
{dataset}_wavelet_dt
{dataset}_wavelet_rf
{dataset}_wavelet_svm

Results used in Fig. 5D of arxiv paper (full data setting)
If dataset contains gyroscope (HAR, jeantet_turtles, vehkaoja_dogs):
{dataset}_harnet_nogyr
{dataset}_harnet_random_nogyr
{dataset}_harnet_unfrozen_nogyr
{dataset}_RNN_nogyr
{dataset}_CRNN_nogyr
{dataset}_rf_nogyr

Otherwise:
{dataset}_harnet_nogyr
{dataset}_harnet_unfrozen_nogyr
{dataset}_harnet_random_nogyr
{dataset}_RNN_nogyr
{dataset}_CRNN
{dataset}_rf

Results used in Fig. 5E of arxiv paper (reduced data setting)
If dataset contains gyroscope (HAR, jeantet_turtles, vehkaoja_dogs):
{dataset}_harnet_low_data_nogyr
{dataset}_harnet_random_low_data_nogyr
{dataset}_harnet_unfrozen_low_data_nogyr
{dataset}_RNN_low_data_nogyr
{dataset}_wavelet_RNN_low_data_nogyr
{dataset}_CRNN_low_data_nogyr
{dataset}_rf_low_data_nogyr

Otherwise:
{dataset}_harnet_low_data_nogyr
{dataset}_harnet_random_low_data_nogyr
{dataset}_harnet_unfrozen_low_data_nogyr
{dataset}_RNN_low_data_nogyr
{dataset}_wavelet_RNN_low_data_nogyr
{dataset}_CRNN_low_data
{dataset}_rf_low_data

CSV files: we also include summaries of the experimental results in experiments_summary.csv, experiments_by_fold_individual.csv, experiments_by_fold_behavior.csv.

experiments_summary.csv - results averaged over individuals and behavior classes
dataset (str): name of dataset
experiment (str): name of model with experiment setting
fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paper
fig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paper
fig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paper
f1_mean (float): mean of macro-averaged F1 score, averaged over individuals in test folds
f1_std (float): standard deviation of macro-averaged F1 score, computed over individuals in test folds
prec_mean, prec_std (float): analogous for precision
rec_mean, rec_std (float): analogous for recall

experiments_by_fold_individual.csv - results per individual in the test folds
dataset (str): name of dataset
experiment (str): name of model with experiment setting
fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paper
fig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paper
fig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paper
fold (int): test fold index
individual (int): individuals are numbered zero-indexed, starting from fold 1
f1 (float): macro-averaged f1 score for this individual
precision (float): macro-averaged precision for this individual
recall (float): macro-averaged recall for this individual

experiments_by_fold_behavior.csv - results per behavior class, for each test fold
dataset (str): name of dataset
experiment (str): name of model with experiment setting
fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paper
fig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paper
fig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paper
fold (int): test fold index
behavior_class (str): name of behavior class
f1 (float): f1 score for this behavior, averaged over individuals in the test fold
precision (float): precision for this behavior, averaged over individuals in the test fold
recall (float): recall for this behavior, averaged over individuals in the test fold
train_ground_truth_label_counts (int): number of timepoints labeled with this behavior class, in the training set
t
Benchmark Dataset for Compression for 2-Parameter Persistent Homology
repository.tugraz.at
tar
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ulderico Fugacci; Michael Kerber; Michael Kerber; Alexander Rolle; Ulderico Fugacci; Alexander Rolle (2025). Benchmark Dataset for Compression for 2-Parameter Persistent Homology [Dataset]. http://doi.org/10.3217/xcs8c-hjm53
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.3217/xcs8c-hjm53
Dataset updated
May 13, 2025
Dataset provided by
Graz University of Technology
Authors
Ulderico Fugacci; Michael Kerber; Michael Kerber; Alexander Rolle; Ulderico Fugacci; Alexander Rolle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a collection of benchmark data sets used in the experiments of the paper
"Compression for 2-Parameter Persistent Homology"
by Ulderico Fugacci, Michael Kerber, and Alexander Rolle.

The detailed description of the datasets can be found in that paper. The file format is partially firep (as described in the Rivet library here) and partially scc2020 files (as described here).
The repository also contains the scripts that generated the instances and to run the benchmarks from the cited paper. Executing them requires several additional libraries: for generating geometric examples, CGAL is required. For the benchmarks, mpfree, multi-chunk, phat and Rivet are required.
The NeuroTask Benchmark Dataset
kaggle.com
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carolina Filipe (2025). The NeuroTask Benchmark Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10470234
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10470234
Dataset updated
Jan 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carolina Filipe
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
NeuroTask is a benchmark dataset designed to facilitate the development of accurate and efficient methods for analyzing multi-session, multi-task, and multi-subject neural data. NeuroTask integrates 6 datasets from motor cortical regions, covering 7 tasks across 19 subjects.

This dataset includes:

Spike counts per unit

Behavioral data (hand/cursor position, velocity, force)

Indices for dataset, session, subject, and trial

The indices are included to uniquely identify each session using datasetID, animal, and session.

The rationale for the file naming convention is as follows:

datasetID _ bin size _ dataset name _ task.parquet

Check out the github repository for more resources and some example notebooks: https://github.com/catniplab/NeuroTask/

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20742846%2F85b47e421f30f4203cb97ceb78f2d2f6%2FNeuroTask3.png?generation=1716989002860465&alt=media" alt="">
h
HumanEval-V-Benchmark
huggingface.co
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HumanEval-V (2025). HumanEval-V-Benchmark [Dataset]. https://huggingface.co/datasets/HumanEval-V/HumanEval-V-Benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2025
Dataset authored and provided by
HumanEval-V
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks

📄 Paper • 🏠 Home Page • 💻 GitHub Repository • 🏆 Leaderboard • 🤗 Dataset Viewer

HumanEval-V is a novel benchmark designed to evaluate the diagram understanding and reasoning capabilities of Large Multimodal Models (LMMs) in programming contexts. Unlike existing benchmarks, HumanEval-V focuses on coding tasks that require sophisticated visual reasoning over… See the full description on the dataset page: https://huggingface.co/datasets/HumanEval-V/HumanEval-V-Benchmark.
h
RTL-Repo
huggingface.co
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Allam (2023). RTL-Repo [Dataset]. https://huggingface.co/datasets/ahmedallam/RTL-Repo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 30, 2023
Authors
Ahmed Allam
Description
RTL-Repo Benchmark

This repository contains the data for the RTL-Repo benchmark introduced in the paper RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects.

👋 Overview

RTL-Repo is a benchmark for evaluating LLMs' effectiveness in generating Verilog code autocompletions within large, complex codebases. It assesses the model's ability to understand and remember the entire Verilog repository context and generate new code that is correct, relevant… See the full description on the dataset page: https://huggingface.co/datasets/ahmedallam/RTL-Repo.
Z
HornMT – Machine Translation Benchmark Dataset for Languages in the Horn of...
data.niaid.nih.gov
zenodo.org
Updated Mar 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asmelash Teka Hadgu (2022). HornMT – Machine Translation Benchmark Dataset for Languages in the Horn of Africa [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6369441
Explore at:
Dataset updated
Mar 21, 2022
Dataset provided by
Gebrekirstos G. Gebremeskel
Asmelash Teka Hadgu
Abel Aregawi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Horn of Africa
Description
The HornMT repository contains data and the associated metadata for the project Machine Translation Benchmark Dataset for Languages in the Horn of Africa. It is a multi-way parallel corpus that will serve as a benchmark to accelerate progress in machine translation research and production systems for languages in the Horn of Africa.

Supported Languages

Language

ISO 639-3 code

Afar

aaf

Amharic

amh

English

eng

Oromo

orm

Somali

som

Tigrinya

tir

data/ contains one text file per language and each file contains news snippets in the same order for each language.

data ├── aar.txt ├── amh.txt ├── eng.txt ├── orm.txt ├── som.txt └── tir.txt

metadata.tsv contains tab separated data describing each news snippet. The metadata contains the following fields.

Scope - describes whether the news is global or local. It takes two values: Global news and Local news.

Category - News category covering the following 12 topics

Art and Culture

Business and Economy

Conflicts and Attacks

Disaster and Accidents

Entertainment

Environment

Health

International Relations

Law and Crime

Politics

Science and Technology

Sport

Source - List of one or more URLs from which the news content is extracted or based on.

Domain - TLD corresponding to the URL(s) in Source.

Date - The publication date of the source article. The format is yyyy-mm-dd.

Other formats

All the data and associated metadata together in one file is also available in other file formats.

HornMT.xlsx - data and associated metadata in xlsx format.

HornMT.json - data and associated metadata in json format.

Below is an example row.

{ "data":{ "eng":"The World Meteorological Organisation reports that the ozone layer is damaged to its worst extent ever in the Arctic.", "aaf":"Baad Metrolojih Eglali Areketekeh Addal Ozonih qelu faxe waktik lafetle calat biyakisem xayose.", "amh":"የአለም የአየር ንብረት ድርጅት በአርክቲክ አካባቢ ያለው የኦዞን ምንጣፍ ከፍተኛ ጉዳት እንደደረሰበት አስታወቀ፡፡", "orm":"Dhaabbanni Meetiroolojii Addunyaa baqqaanni oozonii Arkiitik keessatti gara sadarkaa isa hamaa haga ammaatti akka miidhame gabaase.", "som":"Ururka Saadaasha Hawada Adduunka ayaa ku warramaya in lakabka ozoneka ee Ka koreeya dhulka baraflayda uu waxyeelladii abid ugu darnaa soo gaadhay.", "tir":"ውድብ ሜትሮሎጂ ዓለም ኣብ ኣርክቲክ ዝርከብ ናሕሲ ኦዞን ኣዝዩ ብዝኸፍአ ደረጃ ከምዝተጎድአ ሓቢሩ፡፡" }, "metadata":{ "scope":"Global", "category":"Science and Technology", "source":"https://www.independent.co.uk/environment/climate-change/ozone-layer-damaged-by-unusually-harsh-winter-2263653.html", "domain":"www.independent.co.uk", "date":"2011-04-05" } }

Team

Afar

Mohammed Deresa

Yasin Nur

Amharic

Tigist Taye

Selamawit Hailemariam

Wako Tilahun

Oromo

Gemechis Melkamu

Galata Girmaye

Somali

Abdiselam Mohamed

Beshir Abdi

Tigrinya

Berhanu Abadi Weldegiorgis

Michael Minassie

Nureddin Mohammedshiek

Project Leaders

Asmelash Teka Hadgu asme@lesan.ai

Gebrekirstos G. Gebremeskel gebrekirstos.gebremeskel@ru.nl

Abel Aregawi abel@lesan.ai

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.
r
Penn machine learning benchmark repository
rrid.site
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Penn machine learning benchmark repository [Dataset]. http://identifiers.org/RRID:SCR_017138
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_017138
Description
Python wrapper for Penn Machine Learning Benchmark data repository. Large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Part of PyPI https://pypi.org/
P
migration-bench-java-selected Dataset
paperswithcode.com
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). migration-bench-java-selected Dataset [Dataset]. https://paperswithcode.com/dataset/migration-bench-java-selected
Explore at:
Dataset updated
May 28, 2025
Description
🤗 MigrationBench is a large-scale code migration benchmark dataset at the repository level, across multiple programming languages.

Current and initial release includes java 8 repositories with the maven build system, as of May 2025.

It has 3 datasets:

🤗 migration-bench-java-full has 5,102 repos, and each of them has a test directory or at least one test case.

🤗 migration-bench-java-selected is a subset of migration-bench-java-full, with 300 repos.

🤗 migration-bench-java-utg contains 4,184 repos, complementary to migration-bench-java-full.
Z
A new remote sensing benchmark dataset for machine learning applications :...
data.niaid.nih.gov
zenodo.org
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lhassane Idoumghar (2024). A new remote sensing benchmark dataset for machine learning applications : MultiSenGE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6375465
Explore at:
Dataset updated
Dec 16, 2024
Dataset provided by
Jonathan Weber
Lhassane Idoumghar
Romain Wenger
Germain Forestier
Anne Puissant
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
[UPDATE] You can now access MultiSen (GE and NA) collection though this portal : https://doi.theia.data-terra.org/ai4lcc/?lang=en

MultiSenGE is a new large-scale multimodal and multitemporal benchmark dataset covering one of the biggest administrative region located in the Eastern part of France. It contains 8,157 patches of 256 * 256 pixels for Sentinel-2 L2A, Sentinel-1 GRD and a regional LULC topographic regional database.

Every file has a specific nomenclature :

Sentinel-1 patches: {tile}_{date}_S1_{x-pixel-coordinate}_{y-pixel-coordinate}.tif

Sentinel-2 patches: {tile}_{date}_S2_{x-pixel-coordinate}_{y-pixel-coordinate}.tif

Ground reference patches: {tile}_GR_{x-pixel-coordinate}_{y-pixel-coordinate}.tif

JSON Labels: {tile}_{x-pixel-coordinate}_{y-pixel-coordinate}.json

where tile is the Sentinel-2 tile number, date the date of acquisition of the patch, x-pixel-coordinate and y-pixel-coordinate are the coordinates of the patch in the tile.

In addition, you can find a set of useful python tools for extracting information about the dataset on Github : https://github.com/r-wenger/MultiSenGE-Tools

First experiments based on this dataset is in press in ISPRS Annals : Wenger, R., Puissant, A., Weber, J., Idoumghar, L., and Forestier, G.: MULTISENGE: A MULTIMODAL AND MULTITEMPORAL BENCHMARK DATASET FOR LAND USE/LAND COVER REMOTE SENSING APPLICATIONS, ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., V-3-2022, 635–640, https://doi.org/10.5194/isprs-annals-V-3-2022-635-2022, 2022.

Due to the large size of the dataset, you will only find the associated JSON files on this Zenodo repository. To download the Sentinel-1, Sentinel-2 patches and the reference data, please do so via these links:

Sentinel-1 temporal serie patches: https://s3.unistra.fr/a2s_datasets/MultiSenGE/s1.tgz

Sentinel-2 temporal serie patches: https://s3.unistra.fr/a2s_datasets/MultiSenGE/s2.tgz

Ground reference patches: https://s3.unistra.fr/a2s_datasets/MultiSenGE/ground_reference.tgz

JSON files for each patch: https://s3.unistra.fr/a2s_datasets/MultiSenGE/labels.tgz
Benchmark Data Repositories: Lessons and Recommendations
zenodo.org
csv, pdf
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rachel Longjohn; Rachel Longjohn; Markelle Kelly; Markelle Kelly; Padhraic Smyth; Sameer Singh; Padhraic Smyth; Sameer Singh (2024). Benchmark Data Repositories: Lessons and Recommendations [Dataset]. http://doi.org/10.5281/zenodo.8397028
Explore at:
csv, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8397028
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rachel Longjohn; Rachel Longjohn; Markelle Kelly; Markelle Kelly; Padhraic Smyth; Sameer Singh; Padhraic Smyth; Sameer Singh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our dataset "repository_survey" summarizes a comprehensive survey of over 150 data repositories, characterizing their metadata documentation and standardization, data curation and validation, and tracking of dataset use in the literature. In addition, "survey_model_evaluation" includes our findings on model evaluation for five benchmark repositories. Column descriptions and further details can be found in "README.pdf." The data are associated with our paper "Benchmark Data Repositories: Lessons and Recommendations."
Z
RDF Reification Benchmark (REF) using the Biomedical Knowledge Repository...
data.niaid.nih.gov
zenodo.org
Updated Feb 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Orlandi, Fabrizio (2021). RDF Reification Benchmark (REF) using the Biomedical Knowledge Repository (BKR) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3894745
Explore at:
Dataset updated
Feb 19, 2021
Dataset provided by
Orlandi, Fabrizio
Graux, Damien
O'Sullivan, Declan
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
This resource can be used for benchmarking different RDF modelling solutions for statement-level metadata, namely:

RDF Reification,

Singleton Property,

RDF* (RDF-star).

More details about this resource can be found in the following publication:

Fabrizio Orlandi, Damien Graux, Declan O'Sullivan, "Benchmarking RDF Metadata Representations: Reification, Singleton Property and RDF*", 15th IEEE International Conference on Semantic Computing (ICSC), 2021.

Pre-print available at: http://fabriziorlandi.net/pdf/2021/ICSC2021_REF-Benchmark.pdf

The dataset contains 3 different versions of the Biomedical Knowledge Repository (BKR) knowledge graph, as described in:

Vinh Nguyen, Olivier Bodenreider, Amit Sheth. "Don't Like RDF Reification? Making Statements About Statements Using Singleton Property" WWW 2014, doi: 10.1145/2566486.2567973.

and,

Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth and Krishnaprasad Thirunarayan. "Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data" in Sci Stat Database Manag. 2010; 6187: 461–470. doi: 10.1007/978-3-642-13818-8_32

The 3 knowledge graphs dumps are packaged as Gzipped RDF files in Turtle (and Turtle*) syntax.

BKR-R-fullKGdump.ttl.gz for the Reification method,

BKR-S-fullKGdump.ttl.gz for the Singleton method,

BKR-star-fullKGdump.ttls.gz for the RDF* (RDF-star) method.

The RDF REiFication Benchmark (REF) includes also a set of SPARQL (and SPARQL*) queries that can be used to compare the performance of different triplestores.

Details about the SPARQL queries, and the queries themselves, are included in the "REF-Benchmark.tar.gz" archive. The queries are named after the dataset they are designed for (BKR-R or BKR-S or BKR-star), plus they include a letter identifying the query set, and a query number.

E.g. the query in the file "BKR-R_F-Q3.rq" is for the BKR-R (standard reification) dataset, it is part of the query set "F" and it is the number 3 of that set "F". Hence, the same query, but translated for the RDF* dataset in SPARQL* syntax, is contained in "BKR-star_F-Q3.rq".

Sets "A" and "B" are derived from the queries introduced by V. Nguyen et al. in: "Don't Like RDF Reification? Making Statements About Statements Using Singleton Property" WWW 2014, doi: 10.1145/2566486.2567973. Set "F" has been designed more with RDF* in mind as part of this benchmark (see [Orlandi et al., ICSC 2021])
Z
#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music...
data.niaid.nih.gov
zenodo.org
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asmita Poddar (2024). #nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1318037
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
Asmita Poddar
Eva Zangerle
Yi-Hsuan Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Music recommender systems can offer users personalized and contextualized recommendation and are therefore important for music information retrieval. An increasing number of datasets have been compiled to facilitate research on different topics, such as content-based, context-based or next-song recommendation. However, these topics are usually addressed separately using different datasets, due to the lack of a unified dataset that contains a large variety of feature types such as item features, user contexts, and timestamps. To address this issue, we propose a large-scale benchmark dataset called #nowplaying-RS, which contains 11.6 million music listening events (LEs) of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, and the timestamps of the LEs. Moreover, some of the user context features imply the cultural origin of the users, and some others—like hashtags—give clues to the emotional state of a user underlying an LE. In this paper, we provide some statistics to give insight into the dataset, and some directions in which the dataset can be used for making music recommendation. We also provide standardized training and test sets for experimentation, and some baseline results obtained by using factorization machines.

The dataset contains three files:

user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at

context_content_features.csv: contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).

sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse. The file contains the following comma-separated values: , where we abbreviate all scores gathered over the Opinion Lexicon with the prefix 'ol'. Similarly, 'ss' stands for SentiStrength.

Please also find the training and test-splits for the dataset in this repo. Also, prototypical implementations of a context-aware recommender system based on the dataset can be found at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM.

If you make use of this dataset, please cite the following paper where we describe and experiment with the dataset:

@inproceedings{smc18, title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems}, author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang}, url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf}, year = {2018}, date = {2018-07-04}, booktitle = {Proceedings of the 15th Sound & Music Computing Conference}, address = {Limassol, Cyprus}, note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM}, tppubtype = {inproceedings} }
h
AerialExtreMatch-Benchmark
huggingface.co
Updated May 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhe Huang (2025). AerialExtreMatch-Benchmark [Dataset]. https://huggingface.co/datasets/Xecades/AerialExtreMatch-Benchmark
Explore at:
Dataset updated
May 29, 2025
Authors
Zhe Huang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
AerialExtreMatch — Benchmark Dataset

Code | Project Page | Paper (WIP) This repo contains the benchmark set for our paper AerialExtreMatch: A Benchmark for Extreme-View Image Matching and Localization. 32 difficulty levels are included. We also provide train and localization datasets.

Usage

Simply clone this repository and unzip the dataset files. git clone git@hf.co:datasets/Xecades/AerialExtreMatch-Benchmark cd AerialExtreMatch-Benchmark unzip "*.zip" rm -rf *.zip… See the full description on the dataset page: https://huggingface.co/datasets/Xecades/AerialExtreMatch-Benchmark.
Benchmark_KTA_Rostock - Dataset - CKAN
ckan.fdm.uni-greifswald.de
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.fdm.uni-greifswald.de (2025). Benchmark_KTA_Rostock - Dataset - CKAN [Dataset]. https://ckan.fdm.uni-greifswald.de/dataset/benchmark_kta_rostock
Explore at:
Dataset updated
Mar 19, 2025
Dataset provided by
CKANhttps://ckan.org/
Area covered
Rostock
Description
This repository contains the dataset used in the paper "Enhancing Kitchen Activity Recognition: A Benchmark Study of the Rostock KTA Dataset" by Dr. Samaneh Zolfaghari, Teodor Stoev, and Prof. Dr. Kristina Yordanova. If you use the dataset, please cite the paper using the Bibtex below @ARTICLE{10409517, author={Zolfaghari, Samaneh and Stoev, Teodor and Yordanova, Kristina}, journal={IEEE Access}, title={Enhancing Kitchen Activity Recognition: A Benchmark Study of the Rostock KTA Dataset}, year={2024}, volume={}, number={}, pages={1-1}, doi={10.1109/ACCESS.2024.3356352}} as well as the original KTA dataset paper "Kitchen task assessment dataset for measuring errors due to cognitive impairments" by Yordanova, Kristina and Hein, Albert and Kirste, Thomas @inproceedings{yordanova2020kitchen, title={Kitchen task assessment dataset for measuring errors due to cognitive impairments}, author={Yordanova, Kristina and Hein, Albert and Kirste, Thomas}, booktitle={2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)}, pages={1--6}, year={2020}, organization={IEEE} } Description of the files All the archive files containing our data are in the folder data which contains two other folders all_actions (containing the experimental data and the labels we used for the evaluation of the classifier when trained with all action classes in the KTA dataset), and most_common_actions (containing the experimental data and labels we used to evaluate the classifier on the 6 most common actions).
h
MMIU-Benchmark
huggingface.co
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fanqing meng (2024). MMIU-Benchmark [Dataset]. https://huggingface.co/datasets/FanqingM/MMIU-Benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2024
Authors
fanqing meng
Description
Dataset Card for MMIU

Repository: https://github.com/OpenGVLab/MMIU Paper: https://arxiv.org/abs/2408.02718 Project Page: https://mmiu-bench.github.io/ Point of Contact: Fanqing Meng

Introduction

MMIU encompasses 7 types of multi-image relationships, 52 tasks, 77K images, and 11K meticulously curated multiple-choice questions, making it the most extensive benchmark of its kind. Our evaluation of 24 popular MLLMs, including both open-source and proprietary models… See the full description on the dataset page: https://huggingface.co/datasets/FanqingM/MMIU-Benchmark.
Results of the Benchmark Dataset with 28 document pairs
figshare.com
zip
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Zatarain (2025). Results of the Benchmark Dataset with 28 document pairs [Dataset]. http://doi.org/10.6084/m9.figshare.29082791.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29082791.v1
Dataset updated
May 15, 2025
Dataset provided by
figshare
Authors
Omar Zatarain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Set of results for the benchmark dataset for long text similarity research. The results were tested with the following language modelsall-MiniLM_6_v2all-MiniLM-L12-v2all-mpnet-base-v2glove.6B.300dLongformerBigBirdGPT2BARTThe repository containing the code and dataset is available at:https://github.com/omarzatarain/long-texts-similarity

Facebook

Twitter

Click to copy link

Link copied

Cite

Tong, H (via Mendeley Data) (2017). Benchmark data sets [Dataset]. http://doi.org/10.17632/923xvkk5mm.1

Benchmark data sets

Explore at:

Unique identifier

https://doi.org/10.17632/923xvkk5mm.1

Dataset updated

Dec 27, 2017

Dataset provided by

Data Archiving and Networked Services (DANS)

Authors

Tong, H (via Mendeley Data)

Description

A total of 12 software defect data sets from NASA were used in this study, where five data sets (part I) including CM1, JM1, KC1, KC2, and PC1 are obtained from PROMISE software engineering repository (http://promise.site.uottawa.ca/SERepository/), the other seven data sets (part II) are obtained from tera-PROMISE Repository (http://openscience.us/repo/defect/mccabehalsted/).

Clear search

Close search

Google apps

Main menu

Benchmark data sets

Data from: Imbalanced dataset for benchmarking

Imbalanced dataset for benchmarking

Characteristics

References

tabular-benchmark

Benchmark dataset for graph classification

Data from: Bio-logger Ethogram Benchmark: A benchmark for computational...

Benchmark Dataset for Compression for 2-Parameter Persistent Homology

The NeuroTask Benchmark Dataset

HumanEval-V-Benchmark

RTL-Repo

HornMT – Machine Translation Benchmark Dataset for Languages in the Horn of...

Penn machine learning benchmark repository

migration-bench-java-selected Dataset

A new remote sensing benchmark dataset for machine learning applications :...

Benchmark Data Repositories: Lessons and Recommendations

RDF Reification Benchmark (REF) using the Biomedical Knowledge Repository...

#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music...

AerialExtreMatch-Benchmark

Benchmark_KTA_Rostock - Dataset - CKAN

MMIU-Benchmark

Results of the Benchmark Dataset with 28 document pairs

Benchmark data sets