16 datasets found

RANZR Clip-600x600 Stratified k fold TFrecords
kaggle.com
Updated Feb 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepak Bhat (2021). RANZR Clip-600x600 Stratified k fold TFrecords [Dataset]. https://www.kaggle.com/datasets/deepakbhatp/ranzr-clip600x600-stratified-k-fold-tfrecords/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 28, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Deepak Bhat
Description
Dataset

This dataset was created by Deepak Bhat

Contents
f
DataSheet2_SKCV: Stratified K-fold cross-validation on ML classifiers for...
frontiersin.figshare.com
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sashikanta Prusty; Srikanta Patnaik; Sujit Kumar Dash (2023). DataSheet2_SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer.docx [Dataset]. http://doi.org/10.3389/fnano.2022.972421.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fnano.2022.972421.s002
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Sashikanta Prusty; Srikanta Patnaik; Sujit Kumar Dash
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cancer is the unregulated development of abnormal cells in the human body system. Cervical cancer, also known as cervix cancer, develops on the cervix’s surface. This causes an overabundance of cells to build up, eventually forming a lump or tumour. As a result, early detection is essential to determine what effective treatment we can take to overcome it. Therefore, the novel Machine Learning (ML) techniques come to a place that predicts cervical cancer before it becomes too serious. Furthermore, four common diagnosis testing namely, Hinselmann, Schiller, Cytology, and Biopsy have been compared and predicted with four common ML models, namely Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (K-NNs), and Extreme Gradient Boosting (XGB). Additionally, to enhance the better performance of ML models, the Stratified k-fold cross-validation (SKCV) method has been implemented over here. The findings of the experiments demonstrate that utilizing an RF classifier for analyzing the cervical cancer risk, could be a good alternative for assisting clinical specialists in classifying this disease in advance.
Iterative-stratification
kaggle.com
Updated Nov 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raj Gandhi (2020). Iterative-stratification [Dataset]. https://www.kaggle.com/datasets/rajgandhi/iterativestratification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raj Gandhi
Description
Dataset

This dataset was created by Raj Gandhi

Contents
Poisonous Mushroom Stratified Kfold (5)
kaggle.com
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gagandeep Singh Bajwa (2024). Poisonous Mushroom Stratified Kfold (5) [Dataset]. https://www.kaggle.com/gaganbajwaa/poisonous-mushroom-stratified-kfold-5/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gagandeep Singh Bajwa
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Gagan Bajwa

Released under Apache 2.0

Contents
PP2021 - KFold TFRecords
kaggle.com
zip
Updated Mar 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Kuzmenkov (2021). PP2021 - KFold TFRecords [Dataset]. https://www.kaggle.com/nickuzmenkov/pp2021-kfold-tfrecords-0
Explore at:
zip(1883845807 bytes)Available download formats
Dataset updated
Mar 24, 2021
Authors
Nick Kuzmenkov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Nick Kuzmenkov

Released under CC0: Public Domain

Contents

It contains the following files:
Random Forest classification results for the whole dataset with stratified...
plos.figshare.com
txt
Updated Aug 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvador Chulián; Bernadette J. Stolz; Álvaro Martínez-Rubio; Cristina Blázquez Goñi; Juan F. Rodríguez Gutiérrez; Teresa Caballero Velázquez; Águeda Molinos Quintana; Manuel Ramírez Orellana; Ana Castillo Robleda; José Luis Fuster Soler; Alfredo Minguela Puras; María V. Martínez Sánchez; María Rosa; Víctor M. Pérez-García; Helen M. Byrne (2023). Random Forest classification results for the whole dataset with stratified k-fold and oversampling. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011329.s003
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1011329.s003
Dataset updated
Aug 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Salvador Chulián; Bernadette J. Stolz; Álvaro Martínez-Rubio; Cristina Blázquez Goñi; Juan F. Rodríguez Gutiérrez; Teresa Caballero Velázquez; Águeda Molinos Quintana; Manuel Ramírez Orellana; Ana Castillo Robleda; José Luis Fuster Soler; Alfredo Minguela Puras; María V. Martínez Sánchez; María Rosa; Víctor M. Pérez-García; Helen M. Byrne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Random Forest classification results for the whole dataset with stratified k-fold and oversampling.
PP2021 - Augmented KFold TFRecords (1/4)
kaggle.com
Updated May 8, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Kuzmenkov (2021). PP2021 - Augmented KFold TFRecords (1/4) [Dataset]. https://www.kaggle.com/datasets/nickuzmenkov/pp2021-kfold-tfrecords/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 8, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nick Kuzmenkov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description

Dataset of TFRecords files made from Plant Pathology 2021 original competition data. Changes: * labels column of the initial train.csv DataFrame was binarized to multi-label format columns: complex, frog_eye_leaf_spot, healthy, powdery_mildew, rust, and scab * images were scaled to 600x600 * 77 duplicate images having different labels were removed (see the context in this notebook) * samples were stratified and split into 5 folds (see corresponding folders fold_0:fold_4) * images were heavily augmented with albumentations library (for raw images see this dataset) * each folder contains 5 copies of randomly augmented initial images (so that the model never meets the same images)

I suggest adding all 5 datasets to your notebook: 4 augmented datasets = 20 epochs of unique images (1, 2, 3, 4) + 1 raw dataset for validation here.

For a complete example see my TPU Training Notebook

Contents:

preprocessed DataFrame train.csv

fold indexes DataFrame folds.csv

fold_0:fold_4 folders containing 64 .tfrec files, respectively, with feature map shown below: feature_map = { 'image': tf.io.FixedLenFeature([], tf.string), 'name': tf.io.FixedLenFeature([], tf.string), 'complex': tf.io.FixedLenFeature([], tf.int64), 'frog_eye_leaf_spot': tf.io.FixedLenFeature([], tf.int64), 'healthy': tf.io.FixedLenFeature([], tf.int64), 'powdery_mildew': tf.io.FixedLenFeature([], tf.int64), 'rust': tf.io.FixedLenFeature([], tf.int64), 'scab': tf.io.FixedLenFeature([], tf.int64)} ### Acknowledgements

photo from Unsplash here
E
Pairwise Multi-Class Document Classification for Semantic Relations between...
live.european-language-grid.eu
csv
Updated Apr 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles (Dataset) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/18317
Explore at:
csvAvailable download formats
Dataset updated
Apr 15, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93,
which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another.
Additional information can be found on GitHub.
The following data is supplemental to the experiments described in our research paper. The data consists of:
Datasets (articles, class labels, cross-validation splits)
Pretrained models (Transformers, GloVe, Doc2vec)
Model output (prediction) for the best performing models
This package consists of the Dataset part.
Dataset
The Wikipedia article corpus is available in enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2. The original data have been downloaded as XML dump, and the corresponding articles were extracted as plain-text with gensim.scripts.segment_wiki. The archive contains only articles that are available in training or test data.
The actual dataset is provided as used in the stratified k-fold with k=4 in train_testdata_4folds.tar.gz.
├── 1 │ ├── test.csv │ └── train.csv ├── 2 │ ├── test.csv │ └── train.csv ├── 3 │ ├── test.csv │ └── train.csv └── 4 ├── test.csv └── train.csv
4 directories, 8 files
f
Supplementary Table 5: Unsupervised learning of cross-modal mappings in...
tandf.figshare.com
pdf
Updated May 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jianmin Xu; Binghua Xu; Yipeng Li; Zhijian Su (2024). Supplementary Table 5: Unsupervised learning of cross-modal mappings in multi-omics data for survival stratification of gastric cancer [Dataset]. http://doi.org/10.25402/FON.17113550.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25402/FON.17113550.v2
Dataset updated
May 15, 2024
Dataset provided by
Taylor & Francis
Authors
Jianmin Xu; Binghua Xu; Yipeng Li; Zhijian Su
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Supplementary Table 5. Unsupervised learning of cross-modal mappings in multi-omics data for survival stratification of gastric cancerSignificant pathways for up-regulated and down-regulated genesAbstractPurpose: This study presents a survival-stratification model based on muti-omics integration using BiDNNs in GC. Methods: Based on the survival-related representation features yielded by BiDNNs through integrating transcriptomics and epigenomics data, K-means clustering analysis was performed to cluster tumor samples into different survival subgroups. The BiDNNs-based model was validated using 10-fold cross-validation and in two independent confirmation cohorts. Results: Using the BiDNNs-based survival stratification model, patients were grouped into two survival subgroups with log-rank P value=9.05E-05. The subgroups classification was robustly validated in 10-fold cross-validation (C-index=0.65±0.02) and in two confirmation cohorts (E-GEOD-26253, C-index=0.609; E-GEOD-62254, C-index=0.706). Conclusion: We propose and validate a robust and stable BiDNNs-based survival stratification model in GC.
Iterative-stratification-kfold
kaggle.com
Updated Nov 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AAKRITI ADHIKARI (2020). Iterative-stratification-kfold [Dataset]. https://www.kaggle.com/datasets/aadhika3/iterativestratificationkfold
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 28, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
AAKRITI ADHIKARI
Description
Dataset

This dataset was created by AAKRITI ADHIKARI

Contents
u
Daily Gridded North American Snowfall
data.ucar.edu
rda-web-prod.ucar.edu
+2more
netcdf
Updated Aug 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chan, Weihan; Henderson, Gina R.; Kluver, Daria; Leathers, Daniel; Mote, Tom; Robinson, David A. (2024). Daily Gridded North American Snowfall [Dataset]. http://doi.org/10.5065/5BJC-W635
Explore at:
netcdfAvailable download formats
Unique identifier
https://doi.org/10.5065/5BJC-W635
Dataset updated
Aug 4, 2024
Dataset provided by
Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory
Authors
Chan, Weihan; Henderson, Gina R.; Kluver, Daria; Leathers, Daniel; Mote, Tom; Robinson, David A.
Time period covered
Jan 1, 1900 - Dec 31, 2009
Area covered

Description
A daily gridded North American snowfall data with focus on the quality of the interpolated product is archived in this dataset. Daily snowfall amounts from National Weather Service Cooperative Observer Program stations and Meteorological Service of Canada surface stations are interpolated to 1 degree by 1 degree grids and examined for data record length and quality. The interpolation is validated spatially and temporally through the use of stratified sampling and k-fold cross-validation analyses. Interpolation errors average around 0.5 cm and range from less than 0.01 to greater than 2.5 cm. For most locations, this is within the measurement sensitivity. Grid cells with large variations in elevation experience higher errors and should be used with caution.
COTS-YOLO-StratifiedKFold
kaggle.com
Updated Jan 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuzuki (2022). COTS-YOLO-StratifiedKFold [Dataset]. https://www.kaggle.com/datasets/myintzu/cotsyolostratifiedkfold/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yuzuki
Description
Dataset

This dataset was created by Yuzuki

Contents
f
Additional file 2 of Machine learning pipeline for blood culture outcome...
springernature.figshare.com
researchdata.edu.au
application/csv
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin R. McFadden; Timothy J. J. Inglis; Mark Reynolds (2024). Additional file 2 of Machine learning pipeline for blood culture outcome prediction using Sysmex XN-2000 blood sample results in Western Australia [Dataset]. http://doi.org/10.6084/m9.figshare.26612528.v1
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26612528.v1
Dataset updated
Aug 13, 2024
Dataset provided by
figshare
Authors
Benjamin R. McFadden; Timothy J. J. Inglis; Mark Reynolds
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Western Australia, Australia
Description
Additional file 2. Contains results for all models evaluated during the model training and stratified 10-fold cross validation stage.
f
AUC metric for the corresponding groups and classifiers based on the...
plos.figshare.com
xls
Updated Oct 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan Koychev; Evgeniy Marinov; Simon Young; Sophia Lazarova; Denitsa Grigorova; Dean Palejev (2023). AUC metric for the corresponding groups and classifiers based on the original 7 features, the selected 10 optimal minimal subset of features and all 13 selected significant features. [Dataset]. http://doi.org/10.1371/journal.pone.0288039.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0288039.t003
Dataset updated
Oct 19, 2023
Dataset provided by
PLOS ONE
Authors
Ivan Koychev; Evgeniy Marinov; Simon Young; Sophia Lazarova; Denitsa Grigorova; Dean Palejev
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AUC metric for the corresponding groups and classifiers based on the original 7 features, the selected 10 optimal minimal subset of features and all 13 selected significant features.
EMNIST StratifiedKFold_models
kaggle.com
Updated Oct 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Олексій Чорний (2023). EMNIST StratifiedKFold_models [Dataset]. https://www.kaggle.com/datasets/oleksiichornyi/emnist-stratifiedkfold-models/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Олексій Чорний
Description
Dataset

This dataset was created by Олексій Чорний

Contents
f
Performance of TS predictors created by MuLT and SMLA on 10-fold CV...
plos.figshare.com
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucas Venezian Povoa; Carlos Henrique Costa Ribeiro; Israel Tojal da Silva (2023). Performance of TS predictors created by MuLT and SMLA on 10-fold CV experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0254596.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0254596.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Lucas Venezian Povoa; Carlos Henrique Costa Ribeiro; Israel Tojal da Silva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Namely, sensitivity is the rate of correct prediction of patients identified as sensitive to first line treatments, and specificity describes the correct prediction percentage of patients identified as non-sensitive to first line treatment.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Deepak Bhat (2021). RANZR Clip-600x600 Stratified k fold TFrecords [Dataset]. https://www.kaggle.com/datasets/deepakbhatp/ranzr-clip600x600-stratified-k-fold-tfrecords/data

RANZR Clip-600x600 Stratified k fold TFrecords

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 28, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Deepak Bhat

Description

Dataset

This dataset was created by Deepak Bhat

Clear search

Close search

Google apps

Main menu

RANZR Clip-600x600 Stratified k fold TFrecords

Dataset

Contents

DataSheet2_SKCV: Stratified K-fold cross-validation on ML classifiers for...

Iterative-stratification

Dataset

Contents

Poisonous Mushroom Stratified Kfold (5)

Dataset

Contents

PP2021 - KFold TFRecords

Dataset

Contents

Random Forest classification results for the whole dataset with stratified...

PP2021 - Augmented KFold TFRecords (1/4)

Description

Contents:

Pairwise Multi-Class Document Classification for Semantic Relations between...

Supplementary Table 5: Unsupervised learning of cross-modal mappings in...

Iterative-stratification-kfold

Dataset

Contents

Daily Gridded North American Snowfall

COTS-YOLO-StratifiedKFold

Dataset

Contents

Additional file 2 of Machine learning pipeline for blood culture outcome...

AUC metric for the corresponding groups and classifiers based on the...

EMNIST StratifiedKFold_models

Dataset

Contents

Performance of TS predictors created by MuLT and SMLA on 10-fold CV...

RANZR Clip-600x600 Stratified k fold TFrecords

Dataset

Contents