https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
jbourcier/fmow-splits dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Spam Detection Dataset
This is the dataset for spam classification task. It contains:
'train' subset with 8175 samples 'validation' subset with 1362 samples 'test' subset with 1636 samples
Source and modifications
This dataset is cloned from Deysi/spam-detection-dataset with the following added processing:
Convert 'string' to 'id' label that allows to be used and trained directly with transformer's trainer Split the original 'test' dataset (2725 samples) into 2… See the full description on the dataset page: https://huggingface.co/datasets/tanquangduong/spam-detection-dataset-splits.
Node classification on Film with 60%/20%/20% random splits for training/validation/test.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset, splits, models, and scripts from the manuscript "When Do Quantum Mechanical Descriptors Help Graph Neural Networks Predict Chemical Properties?" are provided. The curated dataset includes 37 QM descriptors for 64,921 unique molecules across six levels of theory: wB97XD, B3LYP, M06-2X, PBE0, TPSS, and BP86. This dataset is stored in the data.tar.gz file, which also contains a file for multitask constraints applied to various atomic and bond properties. The data splits (training, validation, and test splits) for both random and scaffold-based divisions are saved as separate index files in splits.tar.gz. The trained D-MPNN models for predicting QM descriptors are saved in the models.tar.gz file. The scripts.tar.gz file contains ready-to-use scripts for training machine learning models to predict QM descriptors, as well as scripts for predicting QM descriptors using our trained models on unseen molecules and for applying radial basis function (RBF) expansion to QM atom and bond features.
Below are descriptions of the available scripts:
atom_bond_descriptors.sh
: Trains atom/bond targets.atom_bond_descriptors_predict.sh
: Predicts atom/bond targets from pre-trained model.dipole_quadrupole_moments.sh
: Trains dipole and quadrupole moments.dipole_quadrupole_moments_predict.sh
: Predicts dipole and quadrupole moments from pre-trained model.energy_gaps_IP_EA.sh
: Trains energy gaps, ionization potential (IP), and electron affinity (EA).energy_gaps_IP_EA_predict.sh
: Predicts energy gaps, IP, and EA from pre-trained model.get_constraints.py
: Generates constraints file for testing dataset. This generated file needs to be provided before using our trained models to predict the atom/bond QM descriptors of your testing data.csv2pkl.py
: Converts QM atom and bond features to .pkl files using RBF expansion for use with Chemprop software.Below is the procedure for running the ml-QM-GNN on your own dataset:
get_constraints.py
to generate a constraint file required for predicting atom/bond QM descriptors with the trained ML models.atom_bond_descriptors_predict.sh
to predict atom and bond properties. Run dipole_quadrupole_moments_predict.sh
and energy_gaps_IP_EA_predict.sh
to calculate molecular QM descriptors.csv2pkl.py
to convert the data from predicted atom/bond descriptors .csv file into separate atom and bond feature files (which are saved as .pkl files here).Precinct splits are created when a district boundary cuts through a precinct. Examples of these are Congressional Districts, State House Representative Districts, and Community Development Districts. Disclaimer: Data provided are derived from multiple sources, with varying levels of accuracy. The St. Johns County Supervisor of Elections disclaims all responsibility for the accuracy or completeness of the data shown herein.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
giordanorogers/countdown-splits dataset hosted on Hugging Face and contributed by the HF Datasets community
This dataset was created by dani
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In bold, the most frequent feature function (without consideration of the window size parameters) of each iteration. Mean: averages over the ten cross-validated scores and the ten verification scores.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Thermal Detection Split 3 is a dataset for object detection tasks - it contains People annotations for 340 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
RSDD Split 2 is a dataset for object detection tasks - it contains Train Track Damage annotations for 1,353 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test splits for the categorical emotion datasets CREMA-D, emoDB, IEMOCAP, MELD, RAVDESS used inside audEERING.
For each dataset, a CSV file is provided listing the file names included in the test split.
The test splits were designed trying to balance gender and emotional categories as good as possible.
Dataset Card for "test-dataset-all-splits"
More Information needed
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by MADHUKALAKERI
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The most recent stock split for Westamerica Bancorporation (WABC) was a 3:1 split on February 26, 1998. The combined total of all historical stock splits for Westamerica Bancorporation result in 6 current shares for every original share available at the IPO in 1980.
https://www.coherentmarketinsights.com/privacy-policyhttps://www.coherentmarketinsights.com/privacy-policy
Ductless Mini Splits Market valuation is estimated to reach USD 17.92 Bn in 2025 and is anticipated to grow to USD 31.31 Bn by 2032 with steady CAGR of 8.3%.
This is the evaluation split for Task 6, Automated Audio Captioning, in DCASE 2020 Challenge.
This evaluation split is the Clotho testing split, which is thoroughly described in the corresponding paper:
K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.
available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990
This evaluation split is meant to be used for the purposes of the Task 6 at the scientific challenge DCASE 2020. This split it is not meant to be used for developing audio captioning methods. For developing audio captioning methods, you should use the development and evaluation splits of Clotho.
If you want the development and evaluation splits of Clotho dataset, you can find them also in Zenodo, at: https://zenodo.org/record/3490684
== License ==
The audio files in the archives:
clotho_audio_test.7z
and the associated meta-data in the CSV file:
clotho_metadata_test.csv
are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV file for each of the audio files. That is, each audio file in the 7z archive is listed in the CSV file with the meta-data. The meta-data for each file are:
File name
Start and ending samples for the excerpt that is used in the Clotho dataset
Uploader/user in the Freesound platform (manufacturer)
Link to the licence of the file
== References == [1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The most recent stock split for ABM Industries (ABM) was a 2:1 split on May 7, 2002. The combined total of all historical stock splits for ABM Industries result in 8 current shares for every original share available at the IPO in 1980.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by MUBBASSARKHAN AYUBKHAN JAHAGIRDAR MCA KLEIT
Released under CC0: Public Domain
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Yacine Bouaouni
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Motivation:Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large number of entity matching/record linkage tasks has been made available for evaluating entity matching methods. However, the lack of fixed development and test splits as well as correspondence sets including both matching and non-matching record pairs hinders the reproducibility and comparability of benchmark experiments. In an effort to enhance the reproducibility and comparability of the experiments, we complement existing entity matching benchmark tasks with fixed sets of non-matching pairs as well as fixed development and test splits. Dataset Description:An augmented version of the amazon-google products dataset for benchmarking entity matching/record linkage methods found at: https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolutio...The augmented version adds a fixed set of non-matching pairs to the original dataset. In addition, fixed splits for training, validation and testing as well as their corresponding feature vectors are provided. The feature vectors are built using data type specific similarity metrics.The dataset contains 1,363 records describing products deriving from amazon which are matched against 3,226 product records from google. The gold standards have manual annotations for 1,298 matching and 6,306 non-matching pairs. The total number of attributes used to decribe the product records are 4 while the attribute density is 0.75.The augmented dataset enhances the reproducibility of matching methods and the comparability of matching results.The dataset is part of the CompERBench repository which provides 21 complete benchmark tasks for entity matching for public download:http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
jbourcier/fmow-splits dataset hosted on Hugging Face and contributed by the HF Datasets community