20 datasets found
  1. h

    Miriad-Tooluse-Prompts-StratifiedKFold

    • huggingface.co
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    II Vietnam (2025). Miriad-Tooluse-Prompts-StratifiedKFold [Dataset]. https://huggingface.co/datasets/II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    II Vietnam
    Description

    II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1

    • huggingface.co
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    II Vietnam (2025). Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1 [Dataset]. https://huggingface.co/datasets/II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    II Vietnam
    Description

    II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    II Vietnam, Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1 [Dataset]. https://huggingface.co/datasets/II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1
    Explore at:
    Dataset authored and provided by
    II Vietnam
    Description

    II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. RANZR Clip-600x600 Stratified k fold TFrecords

    • kaggle.com
    Updated Feb 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Bhat (2021). RANZR Clip-600x600 Stratified k fold TFrecords [Dataset]. https://www.kaggle.com/deepakbhatp/ranzr-clip600x600-stratified-k-fold-tfrecords/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 28, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Deepak Bhat
    Description

    Dataset

    This dataset was created by Deepak Bhat

    Contents

  5. Iterative-stratification

    • kaggle.com
    Updated Nov 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raj Gandhi (2020). Iterative-stratification [Dataset]. https://www.kaggle.com/datasets/rajgandhi/iterativestratification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Raj Gandhi
    Description

    Dataset

    This dataset was created by Raj Gandhi

    Contents

  6. f

    DataSheet2_SKCV: Stratified K-fold cross-validation on ML classifiers for...

    • frontiersin.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sashikanta Prusty; Srikanta Patnaik; Sujit Kumar Dash (2023). DataSheet2_SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer.docx [Dataset]. http://doi.org/10.3389/fnano.2022.972421.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Sashikanta Prusty; Srikanta Patnaik; Sujit Kumar Dash
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cancer is the unregulated development of abnormal cells in the human body system. Cervical cancer, also known as cervix cancer, develops on the cervix’s surface. This causes an overabundance of cells to build up, eventually forming a lump or tumour. As a result, early detection is essential to determine what effective treatment we can take to overcome it. Therefore, the novel Machine Learning (ML) techniques come to a place that predicts cervical cancer before it becomes too serious. Furthermore, four common diagnosis testing namely, Hinselmann, Schiller, Cytology, and Biopsy have been compared and predicted with four common ML models, namely Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (K-NNs), and Extreme Gradient Boosting (XGB). Additionally, to enhance the better performance of ML models, the Stratified k-fold cross-validation (SKCV) method has been implemented over here. The findings of the experiments demonstrate that utilizing an RF classifier for analyzing the cervical cancer risk, could be a good alternative for assisting clinical specialists in classifying this disease in advance.

  7. PP2021 - Augmented KFold TFRecords (2/4)

    • kaggle.com
    zip
    Updated Apr 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Kuzmenkov (2021). PP2021 - Augmented KFold TFRecords (2/4) [Dataset]. https://www.kaggle.com/nickuzmenkov/pp2021-kfold-tfrecords-1
    Explore at:
    zip(13051777813 bytes)Available download formats
    Dataset updated
    Apr 13, 2021
    Authors
    Nick Kuzmenkov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description

    Dataset of TFRecords files made from Plant Pathology 2021 original competition data. Changes: * labels column of the initial train.csv DataFrame was binarized to multi-label format columns: complex, frog_eye_leaf_spot, healthy, powdery_mildew, rust, and scab * images were scaled to 512x512 * 77 duplicate images having different labels were removed (see the context in this notebook) * samples were stratified and split into 5 folds (see corresponding folders fold_0:fold_4) * images were heavily augmented with albumentations library (for raw images see this dataset) * each folder contains 5 copies of randomly augmented initial images (so that the model never meets the same images)

    I suggest adding all 5 datasets to your notebook: 4 augmented datasets = 20 epochs of unique images (1, 2, 3, 4) + 1 raw dataset for validation here.

    For a complete example see my TPU Training Notebook

    Contents:

    • preprocessed DataFrame train.csv
    • fold indexes DataFrame folds.csv
    • fold_0:fold_4 folders containing 64 .tfrec files, respectively, with feature map shown below: feature_map = { 'image': tf.io.FixedLenFeature([], tf.string), 'name': tf.io.FixedLenFeature([], tf.string), 'complex': tf.io.FixedLenFeature([], tf.int64), 'frog_eye_leaf_spot': tf.io.FixedLenFeature([], tf.int64), 'healthy': tf.io.FixedLenFeature([], tf.int64), 'powdery_mildew': tf.io.FixedLenFeature([], tf.int64), 'rust': tf.io.FixedLenFeature([], tf.int64), 'scab': tf.io.FixedLenFeature([], tf.int64)} ### Acknowledgements
    • photo from Unsplash here
  8. Random Forest classification results for the whole dataset with stratified...

    • plos.figshare.com
    txt
    Updated Aug 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salvador Chulián; Bernadette J. Stolz; Álvaro Martínez-Rubio; Cristina Blázquez Goñi; Juan F. Rodríguez Gutiérrez; Teresa Caballero Velázquez; Águeda Molinos Quintana; Manuel Ramírez Orellana; Ana Castillo Robleda; José Luis Fuster Soler; Alfredo Minguela Puras; María V. Martínez Sánchez; María Rosa; Víctor M. Pérez-García; Helen M. Byrne (2023). Random Forest classification results for the whole dataset with stratified k-fold and oversampling. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011329.s003
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Salvador Chulián; Bernadette J. Stolz; Álvaro Martínez-Rubio; Cristina Blázquez Goñi; Juan F. Rodríguez Gutiérrez; Teresa Caballero Velázquez; Águeda Molinos Quintana; Manuel Ramírez Orellana; Ana Castillo Robleda; José Luis Fuster Soler; Alfredo Minguela Puras; María V. Martínez Sánchez; María Rosa; Víctor M. Pérez-García; Helen M. Byrne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Random Forest classification results for the whole dataset with stratified k-fold and oversampling.

  9. Dataset for Classification of Suspicious Financial Transactions

    • zenodo.org
    bin, csv
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edho Dwi Jayanto; Edho Dwi Jayanto (2025). Dataset for Classification of Suspicious Financial Transactions [Dataset]. http://doi.org/10.5281/zenodo.15493392
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Edho Dwi Jayanto; Edho Dwi Jayanto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract This study investigates the application of machine learning models for detecting suspicious financial transactions. Utilizing a dataset of 12,571 transactions from PT Bank ABC, the research encompasses various stages such as data preprocessing, feature selection, and addressing class imbalance. The models evaluated include Random Forest, XGBoost, and SVM, which were assessed through cross-validation with StratifiedKFold and optimized using RandomizedSearchCV.

  10. iON_SWITCHING_KNN-KFOLD

    • kaggle.com
    Updated May 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul u Bhagat (2020). iON_SWITCHING_KNN-KFOLD [Dataset]. https://www.kaggle.com/rahulubhagat/ion-switching-knnkfold/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rahul u Bhagat
    Description

    Dataset

    This dataset was created by Rahul u Bhagat

    Contents

  11. Tabular_March_2021_Folds

    • kaggle.com
    Updated Mar 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Catadanna (2021). Tabular_March_2021_Folds [Dataset]. https://www.kaggle.com/catadanna/tabular-march-2021-folds
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Catadanna
    Description

    StratifiedKFold for training set for competition Tabular Playground March 2021

  12. PP2021 - Augmented KFold TFRecords 768 (2/4)

    • kaggle.com
    Updated May 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Araik Tamazian (2021). PP2021 - Augmented KFold TFRecords 768 (2/4) [Dataset]. https://www.kaggle.com/datasets/atamazian/pp2021-kfold-tfrecords-768-1z
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Araik Tamazian
    Description

    Dataset

    This dataset was created by Araik Tamazian

    Contents

  13. HUBMap KFold Training

    • kaggle.com
    Updated Jun 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Wijkhuizen (2023). HUBMap KFold Training [Dataset]. https://www.kaggle.com/datasets/markwijkhuizen/hubmap-kfold-training
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mark Wijkhuizen
    Description

    Dataset

    This dataset was created by Mark Wijkhuizen

    Contents

  14. LISH-MOA-Checkpoints-TF-KFold-by-MOA

    • kaggle.com
    Updated Oct 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francois Patry (2020). LISH-MOA-Checkpoints-TF-KFold-by-MOA [Dataset]. https://www.kaggle.com/datasets/francoispatry/lishmoacheckpointstfkfoldbymoa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 26, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Francois Patry
    Description

    Dataset

    This dataset was created by Francois Patry

    Contents

  15. 30daysml-kfold

    • kaggle.com
    zip
    Updated Aug 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chandan Yadav (2021). 30daysml-kfold [Dataset]. https://www.kaggle.com/chandan0709/30daysmlkfold
    Explore at:
    zip(42099149 bytes)Available download formats
    Dataset updated
    Aug 17, 2021
    Authors
    Chandan Yadav
    Description

    Dataset

    This dataset was created by Chandan Yadav

    Contents

    It contains the following files:

  16. 30-days-ml-kfold-normalized

    • kaggle.com
    Updated Aug 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanjin Alam (2021). 30-days-ml-kfold-normalized [Dataset]. https://www.kaggle.com/piashtanjin/30daysmlkfoldnormalized/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tanjin Alam
    Description

    Dataset

    This dataset was created by Tanjin Alam

    Contents

  17. b4p-oc-nd-short-description-anchor-kfold

    • kaggle.com
    Updated May 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DENPA92 (2022). b4p-oc-nd-short-description-anchor-kfold [Dataset]. https://www.kaggle.com/denpa92/b4p-oc-nd-short-description-anchor-kfold/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 26, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DENPA92
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by DENPA92

    Released under CC0: Public Domain

    Contents

  18. RANZCR CLiP Raw KFold TFRecords

    • kaggle.com
    Updated Mar 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Kuzmenkov (2021). RANZCR CLiP Raw KFold TFRecords [Dataset]. https://www.kaggle.com/nickuzmenkov/ranzcr-clip-raw-kfold-tfrecords/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nick Kuzmenkov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Nick Kuzmenkov

    Released under CC0: Public Domain

    Contents

  19. Deberta-V3L-KFold-BS-V37

    • kaggle.com
    Updated Apr 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    York G (2022). Deberta-V3L-KFold-BS-V37 [Dataset]. https://www.kaggle.com/datasets/yue300c/deberta-v3l-kfold-bs-v37/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 2, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    York G
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by York G

    Released under CC0: Public Domain

    Contents

  20. RANZCR CLiP - Augmented KFold TFRecords 3

    • kaggle.com
    Updated Mar 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Kuzmenkov (2021). RANZCR CLiP - Augmented KFold TFRecords 3 [Dataset]. https://www.kaggle.com/datasets/nickuzmenkov/ranzcr-clip-augmented-kfold-tfrecords-3/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nick Kuzmenkov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Nick Kuzmenkov

    Released under CC0: Public Domain

    Contents

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
II Vietnam (2025). Miriad-Tooluse-Prompts-StratifiedKFold [Dataset]. https://huggingface.co/datasets/II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold

Miriad-Tooluse-Prompts-StratifiedKFold

II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold

Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
II Vietnam
Description

II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu