20 datasets found

h
Miriad-Tooluse-Prompts-StratifiedKFold
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
II Vietnam (2025). Miriad-Tooluse-Prompts-StratifiedKFold [Dataset]. https://huggingface.co/datasets/II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
II Vietnam
Description
II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1
huggingface.co
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
II Vietnam (2025). Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1 [Dataset]. https://huggingface.co/datasets/II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
II Vietnam
Description
II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
II Vietnam, Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1 [Dataset]. https://huggingface.co/datasets/II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1
Explore at:
Dataset authored and provided by
II Vietnam
Description
II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
RANZR Clip-600x600 Stratified k fold TFrecords
kaggle.com
Updated Feb 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepak Bhat (2021). RANZR Clip-600x600 Stratified k fold TFrecords [Dataset]. https://www.kaggle.com/deepakbhatp/ranzr-clip600x600-stratified-k-fold-tfrecords/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 28, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Deepak Bhat
Description
Dataset

This dataset was created by Deepak Bhat

Contents
Iterative-stratification
kaggle.com
Updated Nov 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raj Gandhi (2020). Iterative-stratification [Dataset]. https://www.kaggle.com/datasets/rajgandhi/iterativestratification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raj Gandhi
Description
Dataset

This dataset was created by Raj Gandhi

Contents
f
DataSheet2_SKCV: Stratified K-fold cross-validation on ML classifiers for...
frontiersin.figshare.com
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sashikanta Prusty; Srikanta Patnaik; Sujit Kumar Dash (2023). DataSheet2_SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer.docx [Dataset]. http://doi.org/10.3389/fnano.2022.972421.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fnano.2022.972421.s002
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Sashikanta Prusty; Srikanta Patnaik; Sujit Kumar Dash
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cancer is the unregulated development of abnormal cells in the human body system. Cervical cancer, also known as cervix cancer, develops on the cervix’s surface. This causes an overabundance of cells to build up, eventually forming a lump or tumour. As a result, early detection is essential to determine what effective treatment we can take to overcome it. Therefore, the novel Machine Learning (ML) techniques come to a place that predicts cervical cancer before it becomes too serious. Furthermore, four common diagnosis testing namely, Hinselmann, Schiller, Cytology, and Biopsy have been compared and predicted with four common ML models, namely Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (K-NNs), and Extreme Gradient Boosting (XGB). Additionally, to enhance the better performance of ML models, the Stratified k-fold cross-validation (SKCV) method has been implemented over here. The findings of the experiments demonstrate that utilizing an RF classifier for analyzing the cervical cancer risk, could be a good alternative for assisting clinical specialists in classifying this disease in advance.
PP2021 - Augmented KFold TFRecords (2/4)
kaggle.com
zip
Updated Apr 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Kuzmenkov (2021). PP2021 - Augmented KFold TFRecords (2/4) [Dataset]. https://www.kaggle.com/nickuzmenkov/pp2021-kfold-tfrecords-1
Explore at:
zip(13051777813 bytes)Available download formats
Dataset updated
Apr 13, 2021
Authors
Nick Kuzmenkov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description

Dataset of TFRecords files made from Plant Pathology 2021 original competition data. Changes: * labels column of the initial train.csv DataFrame was binarized to multi-label format columns: complex, frog_eye_leaf_spot, healthy, powdery_mildew, rust, and scab * images were scaled to 512x512 * 77 duplicate images having different labels were removed (see the context in this notebook) * samples were stratified and split into 5 folds (see corresponding folders fold_0:fold_4) * images were heavily augmented with albumentations library (for raw images see this dataset) * each folder contains 5 copies of randomly augmented initial images (so that the model never meets the same images)

I suggest adding all 5 datasets to your notebook: 4 augmented datasets = 20 epochs of unique images (1, 2, 3, 4) + 1 raw dataset for validation here.

For a complete example see my TPU Training Notebook

Contents:

preprocessed DataFrame train.csv

fold indexes DataFrame folds.csv

fold_0:fold_4 folders containing 64 .tfrec files, respectively, with feature map shown below: feature_map = { 'image': tf.io.FixedLenFeature([], tf.string), 'name': tf.io.FixedLenFeature([], tf.string), 'complex': tf.io.FixedLenFeature([], tf.int64), 'frog_eye_leaf_spot': tf.io.FixedLenFeature([], tf.int64), 'healthy': tf.io.FixedLenFeature([], tf.int64), 'powdery_mildew': tf.io.FixedLenFeature([], tf.int64), 'rust': tf.io.FixedLenFeature([], tf.int64), 'scab': tf.io.FixedLenFeature([], tf.int64)} ### Acknowledgements

photo from Unsplash here
Random Forest classification results for the whole dataset with stratified...
plos.figshare.com
txt
Updated Aug 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvador Chulián; Bernadette J. Stolz; Álvaro Martínez-Rubio; Cristina Blázquez Goñi; Juan F. Rodríguez Gutiérrez; Teresa Caballero Velázquez; Águeda Molinos Quintana; Manuel Ramírez Orellana; Ana Castillo Robleda; José Luis Fuster Soler; Alfredo Minguela Puras; María V. Martínez Sánchez; María Rosa; Víctor M. Pérez-García; Helen M. Byrne (2023). Random Forest classification results for the whole dataset with stratified k-fold and oversampling. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011329.s003
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1011329.s003
Dataset updated
Aug 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Salvador Chulián; Bernadette J. Stolz; Álvaro Martínez-Rubio; Cristina Blázquez Goñi; Juan F. Rodríguez Gutiérrez; Teresa Caballero Velázquez; Águeda Molinos Quintana; Manuel Ramírez Orellana; Ana Castillo Robleda; José Luis Fuster Soler; Alfredo Minguela Puras; María V. Martínez Sánchez; María Rosa; Víctor M. Pérez-García; Helen M. Byrne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Random Forest classification results for the whole dataset with stratified k-fold and oversampling.
Dataset for Classification of Suspicious Financial Transactions
zenodo.org
bin, csv
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edho Dwi Jayanto; Edho Dwi Jayanto (2025). Dataset for Classification of Suspicious Financial Transactions [Dataset]. http://doi.org/10.5281/zenodo.15493392
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15493392
Dataset updated
Jun 9, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Edho Dwi Jayanto; Edho Dwi Jayanto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract— This study investigates the application of machine learning models for detecting suspicious financial transactions. Utilizing a dataset of 12,571 transactions from PT Bank ABC, the research encompasses various stages such as data preprocessing, feature selection, and addressing class imbalance. The models evaluated include Random Forest, XGBoost, and SVM, which were assessed through cross-validation with StratifiedKFold and optimized using RandomizedSearchCV.
iON_SWITCHING_KNN-KFOLD
kaggle.com
Updated May 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul u Bhagat (2020). iON_SWITCHING_KNN-KFOLD [Dataset]. https://www.kaggle.com/rahulubhagat/ion-switching-knnkfold/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rahul u Bhagat
Description
Dataset

This dataset was created by Rahul u Bhagat

Contents
Tabular_March_2021_Folds
kaggle.com
Updated Mar 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Catadanna (2021). Tabular_March_2021_Folds [Dataset]. https://www.kaggle.com/catadanna/tabular-march-2021-folds
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Catadanna
Description
StratifiedKFold for training set for competition Tabular Playground March 2021
PP2021 - Augmented KFold TFRecords 768 (2/4)
kaggle.com
Updated May 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Araik Tamazian (2021). PP2021 - Augmented KFold TFRecords 768 (2/4) [Dataset]. https://www.kaggle.com/datasets/atamazian/pp2021-kfold-tfrecords-768-1z
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 11, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Araik Tamazian
Description
Dataset

This dataset was created by Araik Tamazian

Contents
HUBMap KFold Training
kaggle.com
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Wijkhuizen (2023). HUBMap KFold Training [Dataset]. https://www.kaggle.com/datasets/markwijkhuizen/hubmap-kfold-training
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mark Wijkhuizen
Description
Dataset

This dataset was created by Mark Wijkhuizen

Contents
LISH-MOA-Checkpoints-TF-KFold-by-MOA
kaggle.com
Updated Oct 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francois Patry (2020). LISH-MOA-Checkpoints-TF-KFold-by-MOA [Dataset]. https://www.kaggle.com/datasets/francoispatry/lishmoacheckpointstfkfoldbymoa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 26, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Francois Patry
Description
Dataset

This dataset was created by Francois Patry

Contents
30daysml-kfold
kaggle.com
zip
Updated Aug 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chandan Yadav (2021). 30daysml-kfold [Dataset]. https://www.kaggle.com/chandan0709/30daysmlkfold
Explore at:
zip(42099149 bytes)Available download formats
Dataset updated
Aug 17, 2021
Authors
Chandan Yadav
Description
Dataset

This dataset was created by Chandan Yadav

Contents

It contains the following files:
30-days-ml-kfold-normalized
kaggle.com
Updated Aug 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tanjin Alam (2021). 30-days-ml-kfold-normalized [Dataset]. https://www.kaggle.com/piashtanjin/30daysmlkfoldnormalized/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 27, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tanjin Alam
Description
Dataset

This dataset was created by Tanjin Alam

Contents
b4p-oc-nd-short-description-anchor-kfold
kaggle.com
Updated May 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DENPA92 (2022). b4p-oc-nd-short-description-anchor-kfold [Dataset]. https://www.kaggle.com/denpa92/b4p-oc-nd-short-description-anchor-kfold/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DENPA92
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by DENPA92

Released under CC0: Public Domain

Contents
RANZCR CLiP Raw KFold TFRecords
kaggle.com
Updated Mar 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Kuzmenkov (2021). RANZCR CLiP Raw KFold TFRecords [Dataset]. https://www.kaggle.com/nickuzmenkov/ranzcr-clip-raw-kfold-tfrecords/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nick Kuzmenkov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Nick Kuzmenkov

Released under CC0: Public Domain

Contents
Deberta-V3L-KFold-BS-V37
kaggle.com
Updated Apr 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
York G (2022). Deberta-V3L-KFold-BS-V37 [Dataset]. https://www.kaggle.com/datasets/yue300c/deberta-v3l-kfold-bs-v37/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 2, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
York G
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by York G

Released under CC0: Public Domain

Contents
RANZCR CLiP - Augmented KFold TFRecords 3
kaggle.com
Updated Mar 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Kuzmenkov (2021). RANZCR CLiP - Augmented KFold TFRecords 3 [Dataset]. https://www.kaggle.com/datasets/nickuzmenkov/ranzcr-clip-augmented-kfold-tfrecords-3/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nick Kuzmenkov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Nick Kuzmenkov

Released under CC0: Public Domain

Contents
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

II Vietnam (2025). Miriad-Tooluse-Prompts-StratifiedKFold [Dataset]. https://huggingface.co/datasets/II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold

Miriad-Tooluse-Prompts-StratifiedKFold

II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold

Explore at:

Dataset updated

Jun 26, 2025

Dataset authored and provided by

II Vietnam

Description

II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

Miriad-Tooluse-Prompts-StratifiedKFold

Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1

Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1

RANZR Clip-600x600 Stratified k fold TFrecords

Dataset

Contents

Iterative-stratification

Dataset

Contents

DataSheet2_SKCV: Stratified K-fold cross-validation on ML classifiers for...

PP2021 - Augmented KFold TFRecords (2/4)

Description

Contents:

Random Forest classification results for the whole dataset with stratified...

Dataset for Classification of Suspicious Financial Transactions

iON_SWITCHING_KNN-KFOLD

Dataset

Contents

Tabular_March_2021_Folds

PP2021 - Augmented KFold TFRecords 768 (2/4)

Dataset

Contents

HUBMap KFold Training

Dataset

Contents

LISH-MOA-Checkpoints-TF-KFold-by-MOA

Dataset

Contents

30daysml-kfold

Dataset

Contents

30-days-ml-kfold-normalized

Dataset

Contents

b4p-oc-nd-short-description-anchor-kfold

Dataset

Contents

RANZCR CLiP Raw KFold TFRecords

Dataset

Contents

Deberta-V3L-KFold-BS-V37

Dataset

Contents

RANZCR CLiP - Augmented KFold TFRecords 3

Dataset

Contents

Miriad-Tooluse-Prompts-StratifiedKFold

II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold