II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold dataset hosted on Hugging Face and contributed by the HF Datasets community
II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Validation-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold-View-Patch-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
This dataset was created by Deepak Bhat
This dataset was created by Raj Gandhi
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer is the unregulated development of abnormal cells in the human body system. Cervical cancer, also known as cervix cancer, develops on the cervix’s surface. This causes an overabundance of cells to build up, eventually forming a lump or tumour. As a result, early detection is essential to determine what effective treatment we can take to overcome it. Therefore, the novel Machine Learning (ML) techniques come to a place that predicts cervical cancer before it becomes too serious. Furthermore, four common diagnosis testing namely, Hinselmann, Schiller, Cytology, and Biopsy have been compared and predicted with four common ML models, namely Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (K-NNs), and Extreme Gradient Boosting (XGB). Additionally, to enhance the better performance of ML models, the Stratified k-fold cross-validation (SKCV) method has been implemented over here. The findings of the experiments demonstrate that utilizing an RF classifier for analyzing the cervical cancer risk, could be a good alternative for assisting clinical specialists in classifying this disease in advance.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset of TFRecords files made from Plant Pathology 2021 original competition data. Changes:
* labels
column of the initial train.csv
DataFrame was binarized to multi-label format columns: complex
, frog_eye_leaf_spot
, healthy
, powdery_mildew
, rust
, and scab
* images were scaled to 512x512
* 77 duplicate images having different labels were removed (see the context in this notebook)
* samples were stratified and split into 5 folds (see corresponding folders fold_0
:fold_4
)
* images were heavily augmented with albumentations
library (for raw images see this dataset)
* each folder contains 5 copies of randomly augmented initial images (so that the model never meets the same images)
I suggest adding all 5 datasets to your notebook: 4 augmented datasets = 20 epochs of unique images (1, 2, 3, 4) + 1 raw dataset for validation here.
For a complete example see my TPU Training Notebook
train.csv
folds.csv
fold_0
:fold_4
folders containing 64 .tfrec
files, respectively, with feature map shown below:
feature_map = {
'image': tf.io.FixedLenFeature([], tf.string),
'name': tf.io.FixedLenFeature([], tf.string),
'complex': tf.io.FixedLenFeature([], tf.int64),
'frog_eye_leaf_spot': tf.io.FixedLenFeature([], tf.int64),
'healthy': tf.io.FixedLenFeature([], tf.int64),
'powdery_mildew': tf.io.FixedLenFeature([], tf.int64),
'rust': tf.io.FixedLenFeature([], tf.int64),
'scab': tf.io.FixedLenFeature([], tf.int64)}
### AcknowledgementsAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Random Forest classification results for the whole dataset with stratified k-fold and oversampling.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract— This study investigates the application of machine learning models for detecting suspicious financial transactions. Utilizing a dataset of 12,571 transactions from PT Bank ABC, the research encompasses various stages such as data preprocessing, feature selection, and addressing class imbalance. The models evaluated include Random Forest, XGBoost, and SVM, which were assessed through cross-validation with StratifiedKFold and optimized using RandomizedSearchCV.
This dataset was created by Rahul u Bhagat
StratifiedKFold for training set for competition Tabular Playground March 2021
This dataset was created by Araik Tamazian
This dataset was created by Mark Wijkhuizen
This dataset was created by Francois Patry
This dataset was created by Chandan Yadav
It contains the following files:
This dataset was created by Tanjin Alam
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by DENPA92
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Nick Kuzmenkov
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by York G
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Nick Kuzmenkov
Released under CC0: Public Domain
Not seeing a result you expected?
Learn how you can add new datasets to our index.
II-Vietnam/Miriad-Tooluse-Prompts-StratifiedKFold dataset hosted on Hugging Face and contributed by the HF Datasets community