100+ datasets found

R
Data Split Dataset
universe.roboflow.com
zip
Updated Sep 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yolov5 (2022). Data Split Dataset [Dataset]. https://universe.roboflow.com/yolov5-vgpfy/data-split-atsuf/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 2, 2022
Dataset authored and provided by
yolov5
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
1
Description
Data Split

## Overview Data Split is a dataset for classification tasks - it contains 1 annotations for 639 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
RLCD-generated-preference-data-split
huggingface.co
Updated Sep 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taylor (2023). RLCD-generated-preference-data-split [Dataset]. https://huggingface.co/datasets/TaylorAI/RLCD-generated-preference-data-split
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2023
Dataset authored and provided by
Taylor
Description
Dataset Card for "RLCD-generated-preference-data-split"

More Information needed
split data set
kaggle.com
Updated Jan 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Gold Medalist (2025). split data set [Dataset]. https://www.kaggle.com/datasets/salman2024/split-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ali Gold Medalist
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Ali Gold Medalist

Released under Apache 2.0

Contents
Data Split
kaggle.com
zip
Updated Dec 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DanielJamesdj08 (2023). Data Split [Dataset]. https://www.kaggle.com/datasets/danieljamesdj08/data-split
Explore at:
zip(7553 bytes)Available download formats
Dataset updated
Dec 20, 2023
Authors
DanielJamesdj08
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by DanielJamesdj08

Released under MIT

Contents
R
Split Data Patch Dataset
universe.roboflow.com
zip
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universitas Islam Indonesia (2023). Split Data Patch Dataset [Dataset]. https://universe.roboflow.com/universitas-islam-indonesia-fgk9e/split-data-patch/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Oct 25, 2023
Dataset authored and provided by
Universitas Islam Indonesia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Patch Bounding Boxes
Description
Split Data Patch

## Overview Split Data Patch is a dataset for object detection tasks - it contains Patch annotations for 636 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
tae-data-split-paragraphs
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicky (2025). tae-data-split-paragraphs [Dataset]. https://huggingface.co/datasets/nickypro/tae-data-split-paragraphs
Explore at:
Dataset updated
Jun 1, 2025
Authors
Nicky
Description
Split Paragraphs Dataset

Split paragraphs data with configs 000-099.
Materials Project Time Split Data
figshare.com
json
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sterling G. Baird; Taylor Sparks (2023). Materials Project Time Split Data [Dataset]. http://doi.org/10.6084/m9.figshare.19991516.v4
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19991516.v4
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Sterling G. Baird; Taylor Sparks
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Full and dummy snapshots (2022-06-04) of data for mp-time-split encoded via matminer convenience functions grabbed via the new Materials Project API. The dataset is restricted to experimentally verified compounds with no more than 52 sites. No other filtering criteria were applied. The snapshots were developed for sparks-baird/mp-time-split as a benchmark dataset for materials generative modeling. Compressed version of the files (.gz) are also available. dtypes python from pprint import pprint from matminer.utils.io import load_dataframe_from_json filepath = "insert/path/to/file/here.json" expt_df = load_dataframe_from_json(filepath) pprint(expt_df.iloc[0].apply(type).to_dict()) {'discovery': , 'energy_above_hull': , 'formation_energy_per_atom': , 'material_id': , 'references': , 'structure': , 'theoretical': , 'year': } index/mpids (just the number for the index). Note that material_id-s that begin with "mvc-" have the "mvc" dropped and the hyphen (minus sign) is left to distinguish between "mp-" and "mvc-" types while still allowing for sorting. E.g. mvc-001 -> -1.

{146: MPID(mp-146), 925: MPID(mp-925), 1282: MPID(mp-1282), 1335: MPID(mp-1335), 12778: MPID(mp-12778), 2540: MPID(mp-2540), 316: MPID(mp-316), 1395: MPID(mp-1395), 2678: MPID(mp-2678), 1281: MPID(mp-1281), 1251: MPID(mp-1251)}
h
cleaned-data-split-0
huggingface.co
Updated Mar 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Indonesia AI (2019). cleaned-data-split-0 [Dataset]. https://huggingface.co/datasets/IndonesiaAI/cleaned-data-split-0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 18, 2019
Dataset authored and provided by
Indonesia AI
Description
Dataset Card for "cleaned-data-split-0"

More Information needed
h
X-ALMA-Parallel-Data-Split
huggingface.co
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yong-Joong Kim (2025). X-ALMA-Parallel-Data-Split [Dataset]. https://huggingface.co/datasets/yongjoongkim/X-ALMA-Parallel-Data-Split
Explore at:
Dataset updated
Jun 1, 2025
Authors
Yong-Joong Kim
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
yongjoongkim/X-ALMA-Parallel-Data-Split dataset hosted on Hugging Face and contributed by the HF Datasets community
DR1 DR2 DR3 image split dataset
kaggle.com
zip
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DR S K Prabhakar (2024). DR1 DR2 DR3 image split dataset [Dataset]. https://www.kaggle.com/datasets/drskprabhakar/dr1-dr2-dr3-image-split-dataset/data
Explore at:
zip(59511870 bytes)Available download formats
Dataset updated
Apr 11, 2024
Authors
DR S K Prabhakar
Description
Dataset

This dataset was created by DR S K Prabhakar

Released under Other (specified in description)

Contents
f
Data from: Time-Split Cross-Validation as a Method for Estimating the...
acs.figshare.com
txt
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert P. Sheridan (2023). Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. [Dataset]. http://doi.org/10.1021/ci400084k.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1021/ci400084k.s001
Dataset updated
Jun 2, 2023
Dataset provided by
ACS Publications
Authors
Robert P. Sheridan
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R2) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.
R
Data from: Split 3 Dataset
universe.roboflow.com
zip
Updated Jun 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SPLIT 3 (2024). Split 3 Dataset [Dataset]. https://universe.roboflow.com/split-3/split-3/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 16, 2024
Dataset authored and provided by
SPLIT 3
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
SPLIT3 Bounding Boxes
Description
SPLIT 3

## Overview SPLIT 3 is a dataset for object detection tasks - it contains SPLIT3 annotations for 7,306 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Data split for each class of each dataset for training and test.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Nov 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niranjan, Mahesan; Fan, Keqiang; Cai, Xiaohao; Liu, Jiahui (2024). Data split for each class of each dataset for training and test. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001424294
Explore at:
Dataset updated
Nov 6, 2024
Authors
Niranjan, Mahesan; Fan, Keqiang; Cai, Xiaohao; Liu, Jiahui
Description
Data split for each class of each dataset for training and test.
Machine learning algorithm validation with a limited sample size
plos.figshare.com
text/x-python
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrius Vabalas; Emma Gowen; Ellen Poliakoff; Alexander J. Casson (2023). Machine learning algorithm validation with a limited sample size [Dataset]. http://doi.org/10.1371/journal.pone.0224365
Explore at:
text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0224365
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Andrius Vabalas; Emma Gowen; Ellen Poliakoff; Alexander J. Casson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Advances in neuroimaging, genomic, motion tracking, eye-tracking and many other technology-based data collection methods have led to a torrent of high dimensional datasets, which commonly have a small number of samples because of the intrinsic high cost of data collection involving human participants. High dimensional data with a small number of samples is of critical importance for identifying biomarkers and conducting feasibility and pilot work, however it can lead to biased machine learning (ML) performance estimates. Our review of studies which have applied ML to predict autistic from non-autistic individuals showed that small sample size is associated with higher reported classification accuracy. Thus, we have investigated whether this bias could be caused by the use of validation methods which do not sufficiently control overfitting. Our simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000. Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size. We also show that feature selection if performed on pooled training and testing data is contributing to bias considerably more than parameter tuning. In addition, the contribution to bias by data dimensionality, hyper-parameter space and number of CV folds was explored, and validation methods were compared with discriminable data. The results suggest how to design robust testing methodologies when working with small datasets and how to interpret the results of other studies based on what validation method was used.
Dataskripsi_split
kaggle.com
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dewizzz (2023). Dataskripsi_split [Dataset]. https://www.kaggle.com/datasets/dewizzz/dataskripsi-split
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dewizzz
Description
Dataset

This dataset was created by Dewizzz

Contents
d
Data from: Split Phase Inverter Data
catalog.data.gov
data.openei.org
+3more
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). Split Phase Inverter Data [Dataset]. https://catalog.data.gov/dataset/split-phase-inverter-data-b286c
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Description
The increase in power electronic based generation sources require accurate modeling of inverters. Accurate modeling requires experimental data over wider operation range. We used 8.35 kW off-the-shelf grid following split phase PV inverter in the experiments. We used controllable AC supply and controllable DC supply to emulate AC and DC side characteristics. The experiments were performed at NREL's Energy Systems Integration Facility. Inverter is tested under 100%, 75%, 50%, 25% load conditions. In the first dataset, for each operating condition, controllable AC source voltage is varied from 0.9 to 1.1 per unit (p.u) with a step value of 0.025 p.u while keeping the frequency at 60 Hz. In the second dataset, under similar load conditions (100%, 75%, 50%, 25% ), the frequency of the controllable AC source voltage was varied from 59 Hz to 61 Hz with a step value of 0.2 Hz. Voltage and frequency range is chosen based on inverter protection. Voltages and currents on DC and AC side are included in the dataset.
h
nyc-taxi-data-split
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rossil Wu, nyc-taxi-data-split [Dataset]. https://huggingface.co/datasets/Rossil/nyc-taxi-data-split
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Rossil Wu
Description
Rossil/nyc-taxi-data-split dataset hosted on Hugging Face and contributed by the HF Datasets community
R
Thermal Detection Split 3 Dataset
universe.roboflow.com
zip
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eli MDT Data Splits (2025). Thermal Detection Split 3 Dataset [Dataset]. https://universe.roboflow.com/eli-mdt-data-splits/thermal-detection-split-3
Explore at:
zipAvailable download formats
Dataset updated
Feb 10, 2025
Dataset authored and provided by
Eli MDT Data Splits
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
People Bounding Boxes
Description
Thermal Detection Split 3

## Overview Thermal Detection Split 3 is a dataset for object detection tasks - it contains People annotations for 340 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Z
Data Cleaning, Translation & Split of the Dataset for the Automatic...
data.niaid.nih.gov
zenodo.org
Updated Aug 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Köhler, Juliane (2022). Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6957841
Explore at:
Dataset updated
Aug 8, 2022
Authors
Köhler, Juliane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.

Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.

ger_train.csv – The German training set as CSV file.

ger_validation.csv – The German validation set as CSV file.

en_test.csv – The English test set as CSV file.

en_train.csv – The English training set as CSV file.

en_validation.csv – The English validation set as CSV file.

splitting.py – The python code for splitting a dataset into train, test and validation set.

DataSetTrans_de.csv – The final German dataset as a CSV file.

DataSetTrans_en.csv – The final English dataset as a CSV file.

translation.py – The python code for translating the cleaned dataset.
h
llm-sgd-dst8-split-training-data
huggingface.co
Updated Jul 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ammer Ayach (2023). llm-sgd-dst8-split-training-data [Dataset]. https://huggingface.co/datasets/amay01/llm-sgd-dst8-split-training-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 24, 2023
Authors
Ammer Ayach
Description
Dataset Card for "llm-sgd-dst8-split-training-data"

More Information needed

Facebook

Twitter

Click to copy link

Link copied

Cite

yolov5 (2022). Data Split Dataset [Dataset]. https://universe.roboflow.com/yolov5-vgpfy/data-split-atsuf/dataset/1

Data Split Dataset

data-split-atsuf

data-split-dataset

Explore at:

zipAvailable download formats

Dataset updated

Sep 2, 2022

Dataset authored and provided by

yolov5

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

Description

Data Split

## Overview

Data Split is a dataset for classification tasks - it contains 1 annotations for 639 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Clear search

Close search

Google apps

Main menu

Data Split Dataset

Data Split

RLCD-generated-preference-data-split

split data set

Dataset

Contents

Data Split

Dataset

Contents

Split Data Patch Dataset

Split Data Patch

tae-data-split-paragraphs

Materials Project Time Split Data

cleaned-data-split-0

X-ALMA-Parallel-Data-Split

DR1 DR2 DR3 image split dataset

Dataset

Contents

Data from: Time-Split Cross-Validation as a Method for Estimating the...

Data from: Split 3 Dataset

SPLIT 3

Data split for each class of each dataset for training and test.

Machine learning algorithm validation with a limited sample size

Dataskripsi_split

Dataset

Contents

Data from: Split Phase Inverter Data

nyc-taxi-data-split

Thermal Detection Split 3 Dataset

Thermal Detection Split 3

Data Cleaning, Translation & Split of the Dataset for the Automatic...

llm-sgd-dst8-split-training-data

Data Split Dataset

data-split-atsuf

data-split-dataset

Data Split