100+ datasets found
  1. h

    fmow-splits

    • huggingface.co
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fmow-splits [Dataset]. https://huggingface.co/datasets/jbourcier/fmow-splits
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 5, 2025
    Authors
    Jules Bourcier
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    jbourcier/fmow-splits dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    spam-detection-dataset-splits

    • huggingface.co
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tan Quang DUONG (2023). spam-detection-dataset-splits [Dataset]. https://huggingface.co/datasets/tanquangduong/spam-detection-dataset-splits
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2023
    Authors
    Tan Quang DUONG
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Spam Detection Dataset

    This is the dataset for spam classification task. It contains:

    'train' subset with 8175 samples 'validation' subset with 1362 samples 'test' subset with 1636 samples

      Source and modifications
    

    This dataset is cloned from Deysi/spam-detection-dataset with the following added processing:

    Convert 'string' to 'id' label that allows to be used and trained directly with transformer's trainer Split the original 'test' dataset (2725 samples) into 2… See the full description on the dataset page: https://huggingface.co/datasets/tanquangduong/spam-detection-dataset-splits.

  3. P

    Film (60%/20%/20% random splits) Dataset

    • library.toponeai.link
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Film (60%/20%/20% random splits) Dataset [Dataset]. https://library.toponeai.link/dataset/film-60-20-20-random-splits
    Explore at:
    Dataset updated
    Feb 8, 2025
    Description

    Node classification on Film with 60%/20%/20% random splits for training/validation/test.

  4. Dataset, splits, models, and scripts for the QM descriptors prediction

    • zenodo.org
    • explore.openaire.eu
    application/gzip
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green (2024). Dataset, splits, models, and scripts for the QM descriptors prediction [Dataset]. http://doi.org/10.5281/zenodo.10668491
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset, splits, models, and scripts from the manuscript "When Do Quantum Mechanical Descriptors Help Graph Neural Networks Predict Chemical Properties?" are provided. The curated dataset includes 37 QM descriptors for 64,921 unique molecules across six levels of theory: wB97XD, B3LYP, M06-2X, PBE0, TPSS, and BP86. This dataset is stored in the data.tar.gz file, which also contains a file for multitask constraints applied to various atomic and bond properties. The data splits (training, validation, and test splits) for both random and scaffold-based divisions are saved as separate index files in splits.tar.gz. The trained D-MPNN models for predicting QM descriptors are saved in the models.tar.gz file. The scripts.tar.gz file contains ready-to-use scripts for training machine learning models to predict QM descriptors, as well as scripts for predicting QM descriptors using our trained models on unseen molecules and for applying radial basis function (RBF) expansion to QM atom and bond features.

    Below are descriptions of the available scripts:

    1. atom_bond_descriptors.sh: Trains atom/bond targets.
    2. atom_bond_descriptors_predict.sh: Predicts atom/bond targets from pre-trained model.
    3. dipole_quadrupole_moments.sh: Trains dipole and quadrupole moments.
    4. dipole_quadrupole_moments_predict.sh: Predicts dipole and quadrupole moments from pre-trained model.
    5. energy_gaps_IP_EA.sh: Trains energy gaps, ionization potential (IP), and electron affinity (EA).
    6. energy_gaps_IP_EA_predict.sh: Predicts energy gaps, IP, and EA from pre-trained model.
    7. get_constraints.py: Generates constraints file for testing dataset. This generated file needs to be provided before using our trained models to predict the atom/bond QM descriptors of your testing data.
    8. csv2pkl.py: Converts QM atom and bond features to .pkl files using RBF expansion for use with Chemprop software.

    Below is the procedure for running the ml-QM-GNN on your own dataset:

    1. Use get_constraints.py to generate a constraint file required for predicting atom/bond QM descriptors with the trained ML models.
    2. Execute atom_bond_descriptors_predict.sh to predict atom and bond properties. Run dipole_quadrupole_moments_predict.sh and energy_gaps_IP_EA_predict.sh to calculate molecular QM descriptors.
    3. Utilize csv2pkl.py to convert the data from predicted atom/bond descriptors .csv file into separate atom and bond feature files (which are saved as .pkl files here).
    4. Run Chemprop to train your models using the additional predicted features supported here.
  5. v

    Voting Precinct Splits

    • sjcmaps.votesjc.gov
    Updated May 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sjcsoegis (2023). Voting Precinct Splits [Dataset]. https://sjcmaps.votesjc.gov/datasets/30752172cc534b50bc5e96d0d7c61d70
    Explore at:
    Dataset updated
    May 10, 2023
    Dataset authored and provided by
    sjcsoegis
    Description

    Precinct splits are created when a district boundary cuts through a precinct. Examples of these are Congressional Districts, State House Representative Districts, and Community Development Districts. Disclaimer: Data provided are derived from multiple sources, with varying levels of accuracy. The St. Johns County Supervisor of Elections disclaims all responsibility for the accuracy or completeness of the data shown herein.

  6. h

    countdown-splits

    • huggingface.co
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giordano Rogers (2025). countdown-splits [Dataset]. https://huggingface.co/datasets/giordanorogers/countdown-splits
    Explore at:
    Dataset updated
    Apr 4, 2025
    Authors
    Giordano Rogers
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    giordanorogers/countdown-splits dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. cross-validation-splits

    • kaggle.com
    Updated May 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dani (2021). cross-validation-splits [Dataset]. https://www.kaggle.com/danyalaftab/crossvalidationsplits/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    dani
    Description

    Dataset

    This dataset was created by dani

    Contents

  8. Forward feature functions selection with 10 train/test splits of the SPX...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Becker; Francis Maes; Louis Wehenkel (2023). Forward feature functions selection with 10 train/test splits of the SPX dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0056621.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Julien Becker; Francis Maes; Louis Wehenkel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In bold, the most frequent feature function (without consideration of the window size parameters) of each iteration. Mean: averages over the ten cross-validated scores and the ten verification scores.

  9. R

    Thermal Detection Split 3 Dataset

    • universe.roboflow.com
    zip
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eli MDT Data Splits (2025). Thermal Detection Split 3 Dataset [Dataset]. https://universe.roboflow.com/eli-mdt-data-splits/thermal-detection-split-3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 10, 2025
    Dataset authored and provided by
    Eli MDT Data Splits
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    People Bounding Boxes
    Description

    Thermal Detection Split 3

    ## Overview
    
    Thermal Detection Split 3 is a dataset for object detection tasks - it contains People annotations for 340 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  10. R

    Rsdd Split 2 Dataset

    • universe.roboflow.com
    zip
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSCI 199 (2025). Rsdd Split 2 Dataset [Dataset]. https://universe.roboflow.com/csci-199/rsdd-split-2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset authored and provided by
    CSCI 199
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Train Track Damage Bounding Boxes
    Description

    RSDD Split 2

    ## Overview
    
    RSDD Split 2 is a dataset for object detection tasks - it contains Train Track Damage annotations for 1,353 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  11. Test splits for CREMA-D, emoDB, IEMOCAP, MELD, RAVDESS

    • zenodo.org
    bin, csv +2
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hagen Wierstorf; Anna Derington; Hagen Wierstorf; Anna Derington (2023). Test splits for CREMA-D, emoDB, IEMOCAP, MELD, RAVDESS [Dataset]. http://doi.org/10.5281/zenodo.10229583
    Explore at:
    csv, txt, bin, text/x-pythonAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hagen Wierstorf; Anna Derington; Hagen Wierstorf; Anna Derington
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 30, 2023
    Description

    Test splits for the categorical emotion datasets CREMA-D, emoDB, IEMOCAP, MELD, RAVDESS used inside audEERING.

    For each dataset, a CSV file is provided listing the file names included in the test split.

    The test splits were designed trying to balance gender and emotional categories as good as possible.

  12. test-dataset-all-splits

    • huggingface.co
    Updated Apr 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face H4 (2023). test-dataset-all-splits [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/test-dataset-all-splits
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    Description

    Dataset Card for "test-dataset-all-splits"

    More Information needed

  13. Cauliflower-Split-Dataser

    • kaggle.com
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MADHUKALAKERI (2025). Cauliflower-Split-Dataser [Dataset]. https://www.kaggle.com/datasets/madhukk23/my-splits-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 26, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MADHUKALAKERI
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by MADHUKALAKERI

    Released under CC0: Public Domain

    Contents

  14. M

    Westamerica Bancorporation - 45 Year Stock Split History | WABC

    • macrotrends.net
    csv
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MACROTRENDS (2025). Westamerica Bancorporation - 45 Year Stock Split History | WABC [Dataset]. https://www.macrotrends.net/stocks/charts/WABC/westamerica-bancorporation/stock-splits
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    MACROTRENDS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2010 - 2025
    Area covered
    United States
    Description

    The most recent stock split for Westamerica Bancorporation (WABC) was a 3:1 split on February 26, 1998. The combined total of all historical stock splits for Westamerica Bancorporation result in 6 current shares for every original share available at the IPO in 1980.

  15. c

    Ductless Mini Splits Market Share & Opportunities, 2025-2032

    • coherentmarketinsights.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coherent Market Insights (2025). Ductless Mini Splits Market Share & Opportunities, 2025-2032 [Dataset]. https://www.coherentmarketinsights.com/industry-reports/ductless-mini-splits-market
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Coherent Market Insights
    License

    https://www.coherentmarketinsights.com/privacy-policyhttps://www.coherentmarketinsights.com/privacy-policy

    Time period covered
    2025 - 2031
    Area covered
    Global
    Description

    Ductless Mini Splits Market valuation is estimated to reach USD 17.92 Bn in 2025 and is anticipated to grow to USD 31.31 Bn by 2032 with steady CAGR of 8.3%.

  16. Z

    Audio captioning DCASE 2020 evaluation (testing) split

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Dec 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantinos Drossos (2020). Audio captioning DCASE 2020 evaluation (testing) split [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3865657
    Explore at:
    Dataset updated
    Dec 8, 2020
    Dataset provided by
    Konstantinos Drossos
    Tuomas Virtanen
    Samuel Lipping
    Description

    This is the evaluation split for Task 6, Automated Audio Captioning, in DCASE 2020 Challenge.

    This evaluation split is the Clotho testing split, which is thoroughly described in the corresponding paper:

    K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.

    available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990

    This evaluation split is meant to be used for the purposes of the Task 6 at the scientific challenge DCASE 2020. This split it is not meant to be used for developing audio captioning methods. For developing audio captioning methods, you should use the development and evaluation splits of Clotho.

    If you want the development and evaluation splits of Clotho dataset, you can find them also in Zenodo, at: https://zenodo.org/record/3490684

    == License ==

    The audio files in the archives:

    clotho_audio_test.7z

    and the associated meta-data in the CSV file:

    clotho_metadata_test.csv

    are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV file for each of the audio files. That is, each audio file in the 7z archive is listed in the CSV file with the meta-data. The meta-data for each file are:

    File name

    Start and ending samples for the excerpt that is used in the Clotho dataset

    Uploader/user in the Freesound platform (manufacturer)

    Link to the licence of the file

    == References == [1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245

  17. M

    ABM Industries - 45 Year Stock Split History | ABM

    • macrotrends.net
    csv
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MACROTRENDS (2025). ABM Industries - 45 Year Stock Split History | ABM [Dataset]. https://www.macrotrends.net/stocks/charts/ABM/abm-industries/stock-splits
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    MACROTRENDS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2010 - 2025
    Area covered
    United States
    Description

    The most recent stock split for ABM Industries (ABM) was a 2:1 split on May 7, 2002. The combined total of all historical stock splits for ABM Industries result in 8 current shares for every original share available at the IPO in 1980.

  18. Butterfly Splits Zipped Dataset

    • kaggle.com
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MUBBASSARKHAN AYUBKHAN JAHAGIRDAR MCA KLEIT (2025). Butterfly Splits Zipped Dataset [Dataset]. https://www.kaggle.com/datasets/mubbassir/butterfly-splits-zipped/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MUBBASSARKHAN AYUBKHAN JAHAGIRDAR MCA KLEIT
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by MUBBASSARKHAN AYUBKHAN JAHAGIRDAR MCA KLEIT

    Released under CC0: Public Domain

    Contents

  19. PII-Data-K-Splits

    • kaggle.com
    Updated Mar 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yacine Bouaouni (2024). PII-Data-K-Splits [Dataset]. https://www.kaggle.com/datasets/jarvisai7/pii-data-k-splits/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yacine Bouaouni
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Yacine Bouaouni

    Released under Apache 2.0

    Contents

  20. Amazon-Google, Augmented Version, Fixed Splits

    • linkagelibrary.icpsr.umich.edu
    Updated Nov 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Primpeli; Christian Bizer (2020). Amazon-Google, Augmented Version, Fixed Splits [Dataset]. http://doi.org/10.3886/E127241V1
    Explore at:
    Dataset updated
    Nov 23, 2020
    Dataset provided by
    University of Mannheim (Germany)
    Authors
    Anna Primpeli; Christian Bizer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Motivation:Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large number of entity matching/record linkage tasks has been made available for evaluating entity matching methods. However, the lack of fixed development and test splits as well as correspondence sets including both matching and non-matching record pairs hinders the reproducibility and comparability of benchmark experiments. In an effort to enhance the reproducibility and comparability of the experiments, we complement existing entity matching benchmark tasks with fixed sets of non-matching pairs as well as fixed development and test splits. Dataset Description:An augmented version of the amazon-google products dataset for benchmarking entity matching/record linkage methods found at: https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolutio...The augmented version adds a fixed set of non-matching pairs to the original dataset. In addition, fixed splits for training, validation and testing as well as their corresponding feature vectors are provided. The feature vectors are built using data type specific similarity metrics.The dataset contains 1,363 records describing products deriving from amazon which are matched against 3,226 product records from google. The gold standards have manual annotations for 1,298 matching and 6,306 non-matching pairs. The total number of attributes used to decribe the product records are 4 while the attribute density is 0.75.The augmented dataset enhances the reproducibility of matching methods and the comparability of matching results.The dataset is part of the CompERBench repository which provides 21 complete benchmark tasks for entity matching for public download:http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
fmow-splits [Dataset]. https://huggingface.co/datasets/jbourcier/fmow-splits

fmow-splits

jbourcier/fmow-splits

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 5, 2025
Authors
Jules Bourcier
License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

jbourcier/fmow-splits dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu