100+ datasets found

h
alpaca-train-validation-test-split
huggingface.co
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doula Isham Rashik Hasan (2023). alpaca-train-validation-test-split [Dataset]. https://huggingface.co/datasets/disham993/alpaca-train-validation-test-split
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 12, 2023
Authors
Doula Isham Rashik Hasan
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for Alpaca

I have just performed train, test and validation split on the original dataset. Repository to reproduce this will be shared here soon. I am including the orignal Dataset card as follows.

Dataset Summary

Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better.… See the full description on the dataset page: https://huggingface.co/datasets/disham993/alpaca-train-validation-test-split.
R
Train Test Split For Freiburg In Yolov7 Format Dataset
universe.roboflow.com
zip
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isaac H (2023). Train Test Split For Freiburg In Yolov7 Format Dataset [Dataset]. https://universe.roboflow.com/isaac-h/train-test-split-for-freiburg-dataset-in-yolov7-format
Explore at:
zipAvailable download formats
Dataset updated
Aug 4, 2023
Dataset authored and provided by
Isaac H
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Freiburg im Breisgau
Variables measured
Groceries Bounding Boxes
Description
Train Test Split For Freiburg Dataset In YOLOv7 Format

## Overview Train Test Split For Freiburg Dataset In YOLOv7 Format is a dataset for object detection tasks - it contains Groceries annotations for 8,879 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
arc-agi-prompts-train-test-split
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bryce Sandlund (2025). arc-agi-prompts-train-test-split [Dataset]. https://huggingface.co/datasets/bcsandlund/arc-agi-prompts-train-test-split
Explore at:
Dataset updated
Jun 1, 2025
Authors
Bryce Sandlund
Description
bcsandlund/arc-agi-prompts-train-test-split dataset hosted on Hugging Face and contributed by the HF Datasets community
h
arc-agi-prompts-train-test-split
huggingface.co
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pritish Saha (2025). arc-agi-prompts-train-test-split [Dataset]. https://huggingface.co/datasets/Pritish92/arc-agi-prompts-train-test-split
Explore at:
Dataset updated
Jul 8, 2025
Authors
Pritish Saha
Description
Pritish92/arc-agi-prompts-train-test-split dataset hosted on Hugging Face and contributed by the HF Datasets community
P
PubMed (60%/20%/20% random splits) Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PubMed (60%/20%/20% random splits) Dataset [Dataset]. https://paperswithcode.com/dataset/pubmed-60-20-20-random-splits
Explore at:
Description
Node classification on PubMed with 60%/20%/20% random splits for training/validation/test.
Dataset, splits, models, and scripts for the QM descriptors prediction
zenodo.org
explore.openaire.eu
application/gzip
Updated Apr 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green (2024). Dataset, splits, models, and scripts for the QM descriptors prediction [Dataset]. http://doi.org/10.5281/zenodo.10668491
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10668491
Dataset updated
Apr 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset, splits, models, and scripts from the manuscript "When Do Quantum Mechanical Descriptors Help Graph Neural Networks Predict Chemical Properties?" are provided. The curated dataset includes 37 QM descriptors for 64,921 unique molecules across six levels of theory: wB97XD, B3LYP, M06-2X, PBE0, TPSS, and BP86. This dataset is stored in the data.tar.gz file, which also contains a file for multitask constraints applied to various atomic and bond properties. The data splits (training, validation, and test splits) for both random and scaffold-based divisions are saved as separate index files in splits.tar.gz. The trained D-MPNN models for predicting QM descriptors are saved in the models.tar.gz file. The scripts.tar.gz file contains ready-to-use scripts for training machine learning models to predict QM descriptors, as well as scripts for predicting QM descriptors using our trained models on unseen molecules and for applying radial basis function (RBF) expansion to QM atom and bond features.

Below are descriptions of the available scripts:

atom_bond_descriptors.sh: Trains atom/bond targets.

atom_bond_descriptors_predict.sh: Predicts atom/bond targets from pre-trained model.

dipole_quadrupole_moments.sh: Trains dipole and quadrupole moments.

dipole_quadrupole_moments_predict.sh: Predicts dipole and quadrupole moments from pre-trained model.

energy_gaps_IP_EA.sh: Trains energy gaps, ionization potential (IP), and electron affinity (EA).

energy_gaps_IP_EA_predict.sh: Predicts energy gaps, IP, and EA from pre-trained model.

get_constraints.py: Generates constraints file for testing dataset. This generated file needs to be provided before using our trained models to predict the atom/bond QM descriptors of your testing data.

csv2pkl.py: Converts QM atom and bond features to .pkl files using RBF expansion for use with Chemprop software.

Below is the procedure for running the ml-QM-GNN on your own dataset:

Use get_constraints.py to generate a constraint file required for predicting atom/bond QM descriptors with the trained ML models.

Execute atom_bond_descriptors_predict.sh to predict atom and bond properties. Run dipole_quadrupole_moments_predict.sh and energy_gaps_IP_EA_predict.sh to calculate molecular QM descriptors.

Utilize csv2pkl.py to convert the data from predicted atom/bond descriptors .csv file into separate atom and bond feature files (which are saved as .pkl files here).

Run Chemprop to train your models using the additional predicted features supported here.
R
Complete Final Rainy With Traintest Split & Augm Dataset
universe.roboflow.com
zip
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NIT Jalandhar (2023). Complete Final Rainy With Traintest Split & Augm Dataset [Dataset]. https://universe.roboflow.com/nit-jalandhar-euvaa/complete-final-rainy-dataset-with-traintest-split-augm
Explore at:
zipAvailable download formats
Dataset updated
Aug 8, 2023
Dataset authored and provided by
NIT Jalandhar
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Car Auto Motorbike Bus Truck Bounding Boxes
Description
Complete Final Rainy Dataset With Traintest Split & Augm

## Overview Complete Final Rainy Dataset With Traintest Split & Augm is a dataset for object detection tasks - it contains Car Auto Motorbike Bus Truck annotations for 2,106 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
Train Test and Validation Split
kaggle.com
Updated Apr 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IMT2022053 (2025). Train Test and Validation Split [Dataset]. https://www.kaggle.com/datasets/pranavakulkarni/train-test-and-validation-split/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 18, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
IMT2022053
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by IMT2022053

Released under Apache 2.0

Contents
WikiLingua Train/Test Split
kaggle.com
Updated Sep 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stiff_Subset (2023). WikiLingua Train/Test Split [Dataset]. https://www.kaggle.com/datasets/stiffsubset/wikilingua-traintest-split/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Stiff_Subset
Description
Dataset

This dataset was created by Stiff_Subset

Contents
Training/Validation/Test set split
figshare.com
zip
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tianfan Jin (2024). Training/Validation/Test set split [Dataset]. http://doi.org/10.6084/m9.figshare.25511056.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25511056.v1
Dataset updated
Mar 30, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Tianfan Jin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Including the split of real and null reactions for training, validation and test
P
Film (60%/20%/20% random splits) Dataset
paperswithcode.com
library.toponeai.link
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Film (60%/20%/20% random splits) Dataset [Dataset]. https://paperswithcode.com/dataset/film-60-20-20-random-splits
Explore at:
Description
Node classification on Film with 60%/20%/20% random splits for training/validation/test.
dataset-muenzen-training-test-split-01
kaggle.com
Updated Dec 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pascalammeter (2024). dataset-muenzen-training-test-split-01 [Dataset]. https://www.kaggle.com/datasets/pascalammeter/dataset-muenzen-training-test-split-01/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 13, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
pascalammeter
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by pascalammeter

Released under MIT

Contents
h
deepstock-sp500-companies-info-stonkv2-test-train-split
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Abrie Nel, deepstock-sp500-companies-info-stonkv2-test-train-split [Dataset]. https://huggingface.co/datasets/2084Collective/deepstock-sp500-companies-info-stonkv2-test-train-split
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Lukas Abrie Nel
Description
2084Collective/deepstock-sp500-companies-info-stonkv2-test-train-split dataset hosted on Hugging Face and contributed by the HF Datasets community
R
Final Raw Rainy Dataset With Augum Without Traintest Split Dataset
universe.roboflow.com
zip
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NIT Jalandhar (2023). Final Raw Rainy Dataset With Augum Without Traintest Split Dataset [Dataset]. https://universe.roboflow.com/nit-jalandhar-euvaa/final-raw-rainy-dataset-with-augum-without-traintest-split/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Aug 9, 2023
Dataset authored and provided by
NIT Jalandhar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Car Auto Motorbike Bus Truck Bounding Boxes
Description
Final Raw Rainy Dataset With Augum Without Traintest Split

## Overview Final Raw Rainy Dataset With Augum Without Traintest Split is a dataset for object detection tasks - it contains Car Auto Motorbike Bus Truck annotations for 731 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
DUDE competition train - validation - test splits ground truth
zenodo.org
json
Updated Mar 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordy Van Landeghem; Jordy Van Landeghem (2023). DUDE competition train - validation - test splits ground truth [Dataset]. http://doi.org/10.5281/zenodo.7763635
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7763635
Dataset updated
Mar 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jordy Van Landeghem; Jordy Van Landeghem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This JSON file contains the ground truth annotations for the train and validation set of the DUDE competition (https://rrc.cvc.uab.es/?ch=23&com=tasks) of ICDAR 2023 (https://icdar2023.org/).

V1.0.7 release: 41454 annotations for 4974 documents (train-validation-test)

DatasetDict({ train: Dataset({ features: ['docId', 'questionId', 'question', 'answers', 'answers_page_bounding_boxes', 'answers_variants', 'answer_type', 'data_split', 'document', 'OCR'], num_rows: 23728 }) val: Dataset({ features: ['docId', 'questionId', 'question', 'answers', 'answers_page_bounding_boxes', 'answers_variants', 'answer_type', 'data_split', 'document', 'OCR'], num_rows: 6315 }) test: Dataset({ features: ['docId', 'questionId', 'question', 'answers', 'answers_page_bounding_boxes', 'answers_variants', 'answer_type', 'data_split', 'document', 'OCR'], num_rows: 11402 }) }) ++update on answer_type +++formatting change to answers_variants ++++stricter check on answer_variants & rename annotations file + blind test set (no ground truth answers provided) ++ removed duplicates from test set:

"92bd5c758bda9bdceb5f67c17009207b_ac6964cbdf483e765b6668e27b3d0bc4",

"6ee71a16d4e4d1dbd7c1f569a92d4e08_549f2a163f8ff3e9f0293cf59fdd98bc",

"e6f3855472231a7ca6aada2f8e85fe5a_827c03a72f2552c722f2c872fd7f74c3",

"e3eecd7cca5de11f1d17cd94ae6a8d77_6300df64e4cf6ba0600ac81278f68de2",

"107b4037df8127a92ee4b6ae9b5df8fb_d7a60e7a9fc0b27487ea39cd7f56f98e",

"300cc3900080064d308983f958141232_6a7cf1aad908d58a75ab8e02ddc856f4",

"fdd3308efacddb88d4aa6e2073f481d4_138cb868ecc804a63cc7a4502c0009b2",

"1f7de256ff1743d329a8402ba0d132e7_95b6e8758533a9817b9f20a958e7b776",

"4f399b8c526ffb6a2fd585a18d4ed5ec_51097231bc327c26c59a4fd8d3ff3069",
Split Garbage Dataset
kaggle.com
Updated May 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Santoro (2019). Split Garbage Dataset [Dataset]. https://www.kaggle.com/andreasantoro/split-garbage-dataset/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Andrea Santoro
Description
Split version of the garbage classification dataset (link below). train, test and valid folders have been generated as specified by the one-indexed files of the original dataset

Acknowledgements

Original dataset here: https://www.kaggle.com/asdasdasasdas/garbage-classification
flowers-299_Train&Test
kaggle.com
Updated Jul 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xocion (2023). flowers-299_Train&Test [Dataset]. https://www.kaggle.com/datasets/xocion/flower299-train-and-test
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 7, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Xocion
Description
Original dataset https://www.kaggle.com/datasets/bogdancretu/flower299 I choose a Acacia flower as the display picture of this dataset to highlight a problem in the dataset flowers-299, if you go to the second folder of Acacia flowers you will see a bunch of pictures of different looking flowers, despite having different shapes structure and colors they are all technically Acacia flowers but we can't use this data to train because we don't have enough samples of acacia flowers despite all efforts and the best model the probability of a model giving accurate prediction of acacia flowers are low

this set of data needs data augmentation to be effieciently used with resnet50
Titanic Dataset
kaggle.com
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavesh Padharia (2022). Titanic Dataset [Dataset]. https://www.kaggle.com/datasets/bhavesh1335/titanic-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 24, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bhavesh Padharia
Description
Dataset

This dataset was created by Bhavesh Padharia

Contents
h
hak-chat-dataset-train-test-split
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ho Kang, hak-chat-dataset-train-test-split [Dataset]. https://huggingface.co/datasets/kanghokh/hak-chat-dataset-train-test-split
Explore at:
Authors
Ho Kang
Description
kanghokh/hak-chat-dataset-train-test-split dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Juliet-train-split-test-on-BinRealVul
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Compote, Juliet-train-split-test-on-BinRealVul [Dataset]. https://huggingface.co/datasets/CCompote/Juliet-train-split-test-on-BinRealVul
Explore at:
Authors
Compote
License
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Description
Juliet-train-split-test-on-BinRealVul

Dataset Summary

Juliet-train-split-test-on-BinRealVul is a curated subset of the Juliet Test Suite (as organized in the GitHub repository), compiled and lifted to LLVM Intermediate Representation (IR) after pre-process phase. This dataset is designed specifically for training binary vulnerability detection models in a setting that ensures a fair comparison with models trained on CompRealVul_LLVM. The split was constructed to match… See the full description on the dataset page: https://huggingface.co/datasets/CCompote/Juliet-train-split-test-on-BinRealVul.

Facebook

Twitter

Click to copy link

Link copied

Cite

Doula Isham Rashik Hasan (2023). alpaca-train-validation-test-split [Dataset]. https://huggingface.co/datasets/disham993/alpaca-train-validation-test-split

alpaca-train-validation-test-split

Alpaca

disham993/alpaca-train-validation-test-split

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 12, 2023

Authors

Doula Isham Rashik Hasan

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Dataset Card for Alpaca

I have just performed train, test and validation split on the original dataset. Repository to reproduce this will be shared here soon. I am including the orignal Dataset card as follows.

  Dataset Summary

Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better.… See the full description on the dataset page: https://huggingface.co/datasets/disham993/alpaca-train-validation-test-split.

Clear search

Close search

Google apps

Main menu

alpaca-train-validation-test-split

Train Test Split For Freiburg In Yolov7 Format Dataset

Train Test Split For Freiburg Dataset In YOLOv7 Format

arc-agi-prompts-train-test-split

arc-agi-prompts-train-test-split

PubMed (60%/20%/20% random splits) Dataset

Dataset, splits, models, and scripts for the QM descriptors prediction

Complete Final Rainy With Traintest Split & Augm Dataset

Complete Final Rainy Dataset With Traintest Split & Augm

Train Test and Validation Split

Dataset

Contents

WikiLingua Train/Test Split

Dataset

Contents

Training/Validation/Test set split

Film (60%/20%/20% random splits) Dataset

dataset-muenzen-training-test-split-01

Dataset

Contents

deepstock-sp500-companies-info-stonkv2-test-train-split

Final Raw Rainy Dataset With Augum Without Traintest Split Dataset

Final Raw Rainy Dataset With Augum Without Traintest Split

DUDE competition train - validation - test splits ground truth

Split Garbage Dataset

Acknowledgements

flowers-299_Train&Test

Titanic Dataset

Dataset

Contents

hak-chat-dataset-train-test-split

Juliet-train-split-test-on-BinRealVul

alpaca-train-validation-test-split

Alpaca

disham993/alpaca-train-validation-test-split