Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Data Split is a dataset for classification tasks - it contains 1 annotations for 639 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterDataset Card for "RLCD-generated-preference-data-split"
More Information needed
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Ali Gold Medalist
Released under Apache 2.0
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by DanielJamesdj08
Released under MIT
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Split Data Patch is a dataset for object detection tasks - it contains Patch annotations for 636 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterSplit Paragraphs Dataset
Split paragraphs data with configs 000-099.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Full and dummy snapshots (2022-06-04) of data for mp-time-split encoded via matminer convenience functions grabbed via the new Materials Project API. The dataset is restricted to experimentally verified compounds with no more than 52 sites. No other filtering criteria were applied. The snapshots were developed for sparks-baird/mp-time-split as a benchmark dataset for materials generative modeling. Compressed version of the files (.gz) are also available.
dtypes
python
from pprint import pprint
from matminer.utils.io import load_dataframe_from_json
filepath = "insert/path/to/file/here.json"
expt_df = load_dataframe_from_json(filepath)
pprint(expt_df.iloc[0].apply(type).to_dict())
{'discovery': , 'energy_above_hull': , 'formation_energy_per_atom': , 'material_id': , 'references': , 'structure': , 'theoretical': , 'year': }
index/mpids
(just the number for the index). Note that material_id-s that begin with "mvc-" have the "mvc" dropped and the hyphen (minus sign) is left to distinguish between "mp-" and "mvc-" types while still allowing for sorting. E.g. mvc-001 -> -1.
{146: MPID(mp-146), 925: MPID(mp-925), 1282: MPID(mp-1282), 1335: MPID(mp-1335), 12778: MPID(mp-12778), 2540: MPID(mp-2540), 316: MPID(mp-316), 1395: MPID(mp-1395), 2678: MPID(mp-2678), 1281: MPID(mp-1281), 1251: MPID(mp-1251)}
Facebook
TwitterDataset Card for "cleaned-data-split-0"
More Information needed
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
yongjoongkim/X-ALMA-Parallel-Data-Split dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis dataset was created by DR S K Prabhakar
Released under Other (specified in description)
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R2) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
SPLIT 3 is a dataset for object detection tasks - it contains SPLIT3 annotations for 7,306 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterData split for each class of each dataset for training and test.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Advances in neuroimaging, genomic, motion tracking, eye-tracking and many other technology-based data collection methods have led to a torrent of high dimensional datasets, which commonly have a small number of samples because of the intrinsic high cost of data collection involving human participants. High dimensional data with a small number of samples is of critical importance for identifying biomarkers and conducting feasibility and pilot work, however it can lead to biased machine learning (ML) performance estimates. Our review of studies which have applied ML to predict autistic from non-autistic individuals showed that small sample size is associated with higher reported classification accuracy. Thus, we have investigated whether this bias could be caused by the use of validation methods which do not sufficiently control overfitting. Our simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000. Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size. We also show that feature selection if performed on pooled training and testing data is contributing to bias considerably more than parameter tuning. In addition, the contribution to bias by data dimensionality, hyper-parameter space and number of CV folds was explored, and validation methods were compared with discriminable data. The results suggest how to design robust testing methodologies when working with small datasets and how to interpret the results of other studies based on what validation method was used.
Facebook
TwitterThis dataset was created by Dewizzz
Facebook
TwitterThe increase in power electronic based generation sources require accurate modeling of inverters. Accurate modeling requires experimental data over wider operation range. We used 8.35 kW off-the-shelf grid following split phase PV inverter in the experiments. We used controllable AC supply and controllable DC supply to emulate AC and DC side characteristics. The experiments were performed at NREL's Energy Systems Integration Facility. Inverter is tested under 100%, 75%, 50%, 25% load conditions. In the first dataset, for each operating condition, controllable AC source voltage is varied from 0.9 to 1.1 per unit (p.u) with a step value of 0.025 p.u while keeping the frequency at 60 Hz. In the second dataset, under similar load conditions (100%, 75%, 50%, 25% ), the frequency of the controllable AC source voltage was varied from 59 Hz to 61 Hz with a step value of 0.2 Hz. Voltage and frequency range is chosen based on inverter protection. Voltages and currents on DC and AC side are included in the dataset.
Facebook
TwitterRossil/nyc-taxi-data-split dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Thermal Detection Split 3 is a dataset for object detection tasks - it contains People annotations for 340 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.
Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.
ger_train.csv – The German training set as CSV file.
ger_validation.csv – The German validation set as CSV file.
en_test.csv – The English test set as CSV file.
en_train.csv – The English training set as CSV file.
en_validation.csv – The English validation set as CSV file.
splitting.py – The python code for splitting a dataset into train, test and validation set.
DataSetTrans_de.csv – The final German dataset as a CSV file.
DataSetTrans_en.csv – The final English dataset as a CSV file.
translation.py – The python code for translating the cleaned dataset.
Facebook
TwitterDataset Card for "llm-sgd-dst8-split-training-data"
More Information needed
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Data Split is a dataset for classification tasks - it contains 1 annotations for 639 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).