100+ datasets found

h
fmow-splits
huggingface.co
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fmow-splits [Dataset]. https://huggingface.co/datasets/jbourcier/fmow-splits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 5, 2025
Authors
Jules Bourcier
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
jbourcier/fmow-splits dataset hosted on Hugging Face and contributed by the HF Datasets community
h
spam-detection-dataset-splits
huggingface.co
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tan Quang DUONG (2023). spam-detection-dataset-splits [Dataset]. https://huggingface.co/datasets/tanquangduong/spam-detection-dataset-splits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 30, 2023
Authors
Tan Quang DUONG
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Spam Detection Dataset

This is the dataset for spam classification task. It contains:

'train' subset with 8175 samples 'validation' subset with 1362 samples 'test' subset with 1636 samples

Source and modifications

This dataset is cloned from Deysi/spam-detection-dataset with the following added processing:

Convert 'string' to 'id' label that allows to be used and trained directly with transformer's trainer Split the original 'test' dataset (2725 samples) into 2… See the full description on the dataset page: https://huggingface.co/datasets/tanquangduong/spam-detection-dataset-splits.
P
Film (60%/20%/20% random splits) Dataset
library.toponeai.link
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Film (60%/20%/20% random splits) Dataset [Dataset]. https://library.toponeai.link/dataset/film-60-20-20-random-splits
Explore at:
Dataset updated
Feb 8, 2025
Description
Node classification on Film with 60%/20%/20% random splits for training/validation/test.
Dataset, splits, models, and scripts for the QM descriptors prediction
zenodo.org
explore.openaire.eu
application/gzip
Updated Apr 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green (2024). Dataset, splits, models, and scripts for the QM descriptors prediction [Dataset]. http://doi.org/10.5281/zenodo.10668491
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10668491
Dataset updated
Apr 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset, splits, models, and scripts from the manuscript "When Do Quantum Mechanical Descriptors Help Graph Neural Networks Predict Chemical Properties?" are provided. The curated dataset includes 37 QM descriptors for 64,921 unique molecules across six levels of theory: wB97XD, B3LYP, M06-2X, PBE0, TPSS, and BP86. This dataset is stored in the data.tar.gz file, which also contains a file for multitask constraints applied to various atomic and bond properties. The data splits (training, validation, and test splits) for both random and scaffold-based divisions are saved as separate index files in splits.tar.gz. The trained D-MPNN models for predicting QM descriptors are saved in the models.tar.gz file. The scripts.tar.gz file contains ready-to-use scripts for training machine learning models to predict QM descriptors, as well as scripts for predicting QM descriptors using our trained models on unseen molecules and for applying radial basis function (RBF) expansion to QM atom and bond features.

Below are descriptions of the available scripts:

atom_bond_descriptors.sh: Trains atom/bond targets.

atom_bond_descriptors_predict.sh: Predicts atom/bond targets from pre-trained model.

dipole_quadrupole_moments.sh: Trains dipole and quadrupole moments.

dipole_quadrupole_moments_predict.sh: Predicts dipole and quadrupole moments from pre-trained model.

energy_gaps_IP_EA.sh: Trains energy gaps, ionization potential (IP), and electron affinity (EA).

energy_gaps_IP_EA_predict.sh: Predicts energy gaps, IP, and EA from pre-trained model.

get_constraints.py: Generates constraints file for testing dataset. This generated file needs to be provided before using our trained models to predict the atom/bond QM descriptors of your testing data.

csv2pkl.py: Converts QM atom and bond features to .pkl files using RBF expansion for use with Chemprop software.

Below is the procedure for running the ml-QM-GNN on your own dataset:

Use get_constraints.py to generate a constraint file required for predicting atom/bond QM descriptors with the trained ML models.

Execute atom_bond_descriptors_predict.sh to predict atom and bond properties. Run dipole_quadrupole_moments_predict.sh and energy_gaps_IP_EA_predict.sh to calculate molecular QM descriptors.

Utilize csv2pkl.py to convert the data from predicted atom/bond descriptors .csv file into separate atom and bond feature files (which are saved as .pkl files here).

Run Chemprop to train your models using the additional predicted features supported here.
v
Voting Precinct Splits
sjcmaps.votesjc.gov
Updated May 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sjcsoegis (2023). Voting Precinct Splits [Dataset]. https://sjcmaps.votesjc.gov/datasets/30752172cc534b50bc5e96d0d7c61d70
Explore at:
Dataset updated
May 10, 2023
Dataset authored and provided by
sjcsoegis
Description
Precinct splits are created when a district boundary cuts through a precinct. Examples of these are Congressional Districts, State House Representative Districts, and Community Development Districts. Disclaimer: Data provided are derived from multiple sources, with varying levels of accuracy. The St. Johns County Supervisor of Elections disclaims all responsibility for the accuracy or completeness of the data shown herein.
h
countdown-splits
huggingface.co
Updated Apr 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giordano Rogers (2025). countdown-splits [Dataset]. https://huggingface.co/datasets/giordanorogers/countdown-splits
Explore at:
Dataset updated
Apr 4, 2025
Authors
Giordano Rogers
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
giordanorogers/countdown-splits dataset hosted on Hugging Face and contributed by the HF Datasets community
cross-validation-splits
kaggle.com
Updated May 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dani (2021). cross-validation-splits [Dataset]. https://www.kaggle.com/danyalaftab/crossvalidationsplits/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
dani
Description
Dataset

This dataset was created by dani

Contents
Forward feature functions selection with 10 train/test splits of the SPX...
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julien Becker; Francis Maes; Louis Wehenkel (2023). Forward feature functions selection with 10 train/test splits of the SPX dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0056621.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0056621.t004
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Julien Becker; Francis Maes; Louis Wehenkel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In bold, the most frequent feature function (without consideration of the window size parameters) of each iteration. Mean: averages over the ten cross-validated scores and the ten verification scores.
R
Thermal Detection Split 3 Dataset
universe.roboflow.com
zip
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eli MDT Data Splits (2025). Thermal Detection Split 3 Dataset [Dataset]. https://universe.roboflow.com/eli-mdt-data-splits/thermal-detection-split-3
Explore at:
zipAvailable download formats
Dataset updated
Feb 10, 2025
Dataset authored and provided by
Eli MDT Data Splits
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
People Bounding Boxes
Description
Thermal Detection Split 3

## Overview Thermal Detection Split 3 is a dataset for object detection tasks - it contains People annotations for 340 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Rsdd Split 2 Dataset
universe.roboflow.com
zip
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CSCI 199 (2025). Rsdd Split 2 Dataset [Dataset]. https://universe.roboflow.com/csci-199/rsdd-split-2
Explore at:
zipAvailable download formats
Dataset updated
Feb 6, 2025
Dataset authored and provided by
CSCI 199
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Train Track Damage Bounding Boxes
Description
RSDD Split 2

## Overview RSDD Split 2 is a dataset for object detection tasks - it contains Train Track Damage annotations for 1,353 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Test splits for CREMA-D, emoDB, IEMOCAP, MELD, RAVDESS
zenodo.org
bin, csv +2
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hagen Wierstorf; Anna Derington; Hagen Wierstorf; Anna Derington (2023). Test splits for CREMA-D, emoDB, IEMOCAP, MELD, RAVDESS [Dataset]. http://doi.org/10.5281/zenodo.10229583
Explore at:
csv, txt, bin, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10229583
Dataset updated
Nov 30, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hagen Wierstorf; Anna Derington; Hagen Wierstorf; Anna Derington
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 30, 2023
Description
Test splits for the categorical emotion datasets CREMA-D, emoDB, IEMOCAP, MELD, RAVDESS used inside audEERING.
For each dataset, a CSV file is provided listing the file names included in the test split.
The test splits were designed trying to balance gender and emotional categories as good as possible.
test-dataset-all-splits
huggingface.co
Updated Apr 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face H4 (2023). test-dataset-all-splits [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/test-dataset-all-splits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2023
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face H4
Description
Dataset Card for "test-dataset-all-splits"

More Information needed
Cauliflower-Split-Dataser
kaggle.com
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MADHUKALAKERI (2025). Cauliflower-Split-Dataser [Dataset]. https://www.kaggle.com/datasets/madhukk23/my-splits-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MADHUKALAKERI
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by MADHUKALAKERI

Released under CC0: Public Domain

Contents
M
Westamerica Bancorporation - 45 Year Stock Split History | WABC
macrotrends.net
csv
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MACROTRENDS (2025). Westamerica Bancorporation - 45 Year Stock Split History | WABC [Dataset]. https://www.macrotrends.net/stocks/charts/WABC/westamerica-bancorporation/stock-splits
Explore at:
csvAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
MACROTRENDS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2010 - 2025
Area covered
United States
Description
The most recent stock split for Westamerica Bancorporation (WABC) was a 3:1 split on February 26, 1998. The combined total of all historical stock splits for Westamerica Bancorporation result in 6 current shares for every original share available at the IPO in 1980.
c
Ductless Mini Splits Market Share & Opportunities, 2025-2032
coherentmarketinsights.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coherent Market Insights (2025). Ductless Mini Splits Market Share & Opportunities, 2025-2032 [Dataset]. https://www.coherentmarketinsights.com/industry-reports/ductless-mini-splits-market
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Coherent Market Insights
License
https://www.coherentmarketinsights.com/privacy-policyhttps://www.coherentmarketinsights.com/privacy-policy
Time period covered
2025 - 2031
Area covered
Global
Description
Ductless Mini Splits Market valuation is estimated to reach USD 17.92 Bn in 2025 and is anticipated to grow to USD 31.31 Bn by 2032 with steady CAGR of 8.3%.
Z
Audio captioning DCASE 2020 evaluation (testing) split
data.niaid.nih.gov
explore.openaire.eu
Updated Dec 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantinos Drossos (2020). Audio captioning DCASE 2020 evaluation (testing) split [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3865657
Explore at:
Dataset updated
Dec 8, 2020
Dataset provided by
Konstantinos Drossos
Tuomas Virtanen
Samuel Lipping
Description
This is the evaluation split for Task 6, Automated Audio Captioning, in DCASE 2020 Challenge.

This evaluation split is the Clotho testing split, which is thoroughly described in the corresponding paper:

K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.

available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990

This evaluation split is meant to be used for the purposes of the Task 6 at the scientific challenge DCASE 2020. This split it is not meant to be used for developing audio captioning methods. For developing audio captioning methods, you should use the development and evaluation splits of Clotho.

If you want the development and evaluation splits of Clotho dataset, you can find them also in Zenodo, at: https://zenodo.org/record/3490684

== License ==

The audio files in the archives:

clotho_audio_test.7z

and the associated meta-data in the CSV file:

clotho_metadata_test.csv

are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV file for each of the audio files. That is, each audio file in the 7z archive is listed in the CSV file with the meta-data. The meta-data for each file are:

File name

Start and ending samples for the excerpt that is used in the Clotho dataset

Uploader/user in the Freesound platform (manufacturer)

Link to the licence of the file

== References == [1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245
M
ABM Industries - 45 Year Stock Split History | ABM
macrotrends.net
csv
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MACROTRENDS (2025). ABM Industries - 45 Year Stock Split History | ABM [Dataset]. https://www.macrotrends.net/stocks/charts/ABM/abm-industries/stock-splits
Explore at:
csvAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
MACROTRENDS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2010 - 2025
Area covered
United States
Description
The most recent stock split for ABM Industries (ABM) was a 2:1 split on May 7, 2002. The combined total of all historical stock splits for ABM Industries result in 8 current shares for every original share available at the IPO in 1980.
Butterfly Splits Zipped Dataset
kaggle.com
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MUBBASSARKHAN AYUBKHAN JAHAGIRDAR MCA KLEIT (2025). Butterfly Splits Zipped Dataset [Dataset]. https://www.kaggle.com/datasets/mubbassir/butterfly-splits-zipped/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MUBBASSARKHAN AYUBKHAN JAHAGIRDAR MCA KLEIT
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by MUBBASSARKHAN AYUBKHAN JAHAGIRDAR MCA KLEIT

Released under CC0: Public Domain

Contents
PII-Data-K-Splits
kaggle.com
Updated Mar 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yacine Bouaouni (2024). PII-Data-K-Splits [Dataset]. https://www.kaggle.com/datasets/jarvisai7/pii-data-k-splits/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yacine Bouaouni
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Yacine Bouaouni

Released under Apache 2.0

Contents
Amazon-Google, Augmented Version, Fixed Splits
linkagelibrary.icpsr.umich.edu
Updated Nov 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Primpeli; Christian Bizer (2020). Amazon-Google, Augmented Version, Fixed Splits [Dataset]. http://doi.org/10.3886/E127241V1
Explore at:
Unique identifier
https://doi.org/10.3886/E127241V1
Dataset updated
Nov 23, 2020
Dataset provided by
University of Mannheim (Germany)
Authors
Anna Primpeli; Christian Bizer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Motivation:Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large number of entity matching/record linkage tasks has been made available for evaluating entity matching methods. However, the lack of fixed development and test splits as well as correspondence sets including both matching and non-matching record pairs hinders the reproducibility and comparability of benchmark experiments. In an effort to enhance the reproducibility and comparability of the experiments, we complement existing entity matching benchmark tasks with fixed sets of non-matching pairs as well as fixed development and test splits. Dataset Description:An augmented version of the amazon-google products dataset for benchmarking entity matching/record linkage methods found at: https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolutio...The augmented version adds a fixed set of non-matching pairs to the original dataset. In addition, fixed splits for training, validation and testing as well as their corresponding feature vectors are provided. The feature vectors are built using data type specific similarity metrics.The dataset contains 1,363 records describing products deriving from amazon which are matched against 3,226 product records from google. The gold standards have manual annotations for 1,298 matching and 6,306 non-matching pairs. The total number of attributes used to decribe the product records are 4 while the attribute density is 0.75.The augmented dataset enhances the reproducibility of matching methods and the comparability of matching results.The dataset is part of the CompERBench repository which provides 21 complete benchmark tasks for entity matching for public download:http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html