90 datasets found

Tutorial]1.Read various data format
kaggle.com
Updated Mar 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seol (2022). Tutorial]1.Read various data format [Dataset]. https://www.kaggle.com/datasets/lys620/tutorial1read-various-data-format/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Seol
Description
Dataset

This dataset was created by Seol

Contents
House Prices + Credit Card Datasets (Full)
kaggle.com
Updated Feb 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisette (2018). House Prices + Credit Card Datasets (Full) [Dataset]. https://www.kaggle.com/lespin/house-prices-dataset-full/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 27, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lisette
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Lisette

Released under CC0: Public Domain

Contents
b
Kaggle
bioregistry.io
Updated Mar 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Kaggle [Dataset]. http://identifiers.org/re3data:r3d100012705
Explore at:
Unique identifier
https://identifiers.org/re3data:r3d100012705
Dataset updated
Mar 18, 2022
Description
Kaggle is a platform for sharing data, performing reproducible analyses, interactive data analysis tutorials, and machine learning competitions.
Data from: SQL TUTORIAL
kaggle.com
Updated Jul 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kenneth Chidiebele (2023). SQL TUTORIAL [Dataset]. https://www.kaggle.com/datasets/kennethchidiebele/sql-tutorial/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kenneth Chidiebele
Description
Dataset

This dataset was created by Kenneth Chidiebele

Contents
tutorial
kaggle.com
Updated Nov 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
skyhwchoi (2020). tutorial [Dataset]. https://www.kaggle.com/skyhwchoi/tutorial/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 2, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
skyhwchoi
Description
Dataset

This dataset was created by skyhwchoi

Contents
practice dataset for tutorials
kaggle.com
Updated Feb 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christoforos Christoforou (2021). practice dataset for tutorials [Dataset]. https://www.kaggle.com/datasets/cchristoforou/practice-dataset-for-tutorials/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 25, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Christoforos Christoforou
Description
Dataset

This dataset was created by Christoforos Christoforou

Contents
R
Cat Dog Spider Pumpkin Hooman Dataset
universe.roboflow.com
zip
Updated Jan 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Guhl (2023). Cat Dog Spider Pumpkin Hooman Dataset [Dataset]. https://universe.roboflow.com/peter-guhl-de1vy/cat-dog-spider-pumpkin-hooman
Explore at:
zipAvailable download formats
Dataset updated
Jan 13, 2023
Dataset authored and provided by
Peter Guhl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Pumpkins Bounding Boxes
Description
Started out as a pumpkin detector to test training YOLOv5. Now suffering from extensive feature creep and probably ending up as a cat/dog/spider/pumpkin/randomobjects-detector. Or as a desaster.

The dataset does not fit https://docs.ultralytics.com/tutorials/training-tips-best-results/ well. There are no background images and the labeling is often only partial. Especially in the humans and pumpkin category where there are often lots of objects in one photo people apparently (and understandably) got bored and did not labe everything. And of course the images from the cat-category don't have the humans in it labeled since they come from a cat-identification model which ignored humans. It will need a lot of time to fixt that.

Dataset used: - Cat and Dog Data: Cat / Dog Tutorial NVIDIA Jetson https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-cat-dog.md © 2016-2019 NVIDIA according to bottom of linked page - Spider Data: Kaggle Animal 10 image set https://www.kaggle.com/datasets/alessiocorrado99/animals10 Animal pictures of 10 different categories taken from google images Kaggle project licensed GPL 2 - Pumpkin Data: Kaggle "Vegetable Images" https://www.researchgate.net/publication/352846889_DCNN-Based_Vegetable_Image_Classification_Using_Transfer_Learning_A_Comparative_Study https://www.kaggle.com/datasets/misrakahmed/vegetable-image-dataset Kaggle project licensed CC BY-SA 4.0 - Some pumpkin images manually copied from google image search - https://universe.roboflow.com/chess-project/chess-sample-rzbmc Provided by a Roboflow user License: CC BY 4.0 - https://universe.roboflow.com/steve-pamer-cvmbg/pumpkins-gfjw5 Provided by a Roboflow user License: CC BY 4.0 - https://universe.roboflow.com/nbduy/pumpkin-ryavl Provided by a Roboflow user License: CC BY 4.0 - https://universe.roboflow.com/homeworktest-wbx8v/cat_test-1x0bl/dataset/2 - https://universe.roboflow.com/220616nishikura/catdetector - https://universe.roboflow.com/atoany/cats-s4d4i/dataset/2 - https://universe.roboflow.com/personal-vruc2/agricultured-ioth22 - https://universe.roboflow.com/sreyoshiworkspace-radu9/pet_detection - https://universe.roboflow.com/artyom-hystt/my-dogs-lcpqe - license: Public Domain url: https://universe.roboflow.com/dolazy7-gmail-com-3vj05/sweetpumpkin/dataset/2 - https://universe.roboflow.com/tristram-dacayan/social-distancing-g4pbu - https://universe.roboflow.com/fyp-3edkl/social-distancing-2ygx5 License MIT - Spiders: https://universe.roboflow.com/lucas-lins-souza/animals-train-yruka

Currently I can't guarantee it's all correctly licenced. Checks are in progress. Inform me if you see one of your pictures and want it to be removed!
DATA PREPROCESSING TUTORIAL DATASET
kaggle.com
Updated Jan 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BCanOzen (2024). DATA PREPROCESSING TUTORIAL DATASET [Dataset]. https://www.kaggle.com/datasets/bcanozen/data-preprocessing-tutorial-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 29, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BCanOzen
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by BCanOzen

Released under MIT

Contents
A
‘US Adult Income’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘US Adult Income’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-adult-income-59e8/30e89061/?iid=048-639&v=presentation
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Analysis of ‘US Adult Income’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/johnolafenwa/us-census-data on 30 September 2021.

--- Dataset description provided by original source is as follows ---

US Adult Census data relating income to social factors such as Age, Education, race etc.

The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K".

This Data set is split into two CSV files, named adult-training.txt and adult-test.txt.

The goal here is to train a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.

Note that the dataset is made up of categorical and continuous features. It also contains missing values The categorical columns are: workclass, education, marital_status, occupation, relationship, race, gender, native_country

The continuous columns are: age, education_num, capital_gain, capital_loss, hours_per_week

This Dataset was obtained from the UCI repository, it can be found on

https://archive.ics.uci.edu/ml/datasets/census+income, http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/

USAGE This dataset is well suited to developing and testing wide linear classifiers, deep neutral network classifiers and a combination of both. For more info on Combined Deep and Wide Model classifiers, refer to the Research Paper by Google https://arxiv.org/abs/1606.07792

Refer to this kernel for sample usage : https://www.kaggle.com/johnolafenwa/wage-prediction

Complete Tutorial is available from http://johnolafenwa.blogspot.com.ng/2017/07/machine-learning-tutorial-1-wage.html?m=1

--- Original source retains full ownership of the source dataset ---
Lectures & Tutorials
kaggle.com
zip
Updated Aug 31, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adnan Zaidi (2019). Lectures & Tutorials [Dataset]. https://www.kaggle.com/datasets/adnanzaidi/lectures-tutorials
Explore at:
zip(951 bytes)Available download formats
Dataset updated
Aug 31, 2019
Authors
Adnan Zaidi
Description
Its contains multiple datasets and selected tutorials for learning purposes.
d
Replication Data for: \"A Topic-based Segmentation Model for Identifying...
search.dataone.org
dataverse.harvard.edu
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert (2024). Replication Data for: \"A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews\" [Dataset]. http://doi.org/10.7910/DVN/EE3DE2
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/EE3DE2
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert
Description
We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...
EE226-tutorial
kaggle.com
Updated Mar 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CS3319-02 (2021). EE226-tutorial [Dataset]. https://www.kaggle.com/massivedatamining/ee226tutorial/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
CS3319-02
Description
Dataset

This dataset was created by CS3319-02

Contents
tutorial
kaggle.com
Updated Aug 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ridho dwi Fachri (2021). tutorial [Dataset]. https://www.kaggle.com/ridhodwifachri/tutorial/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ridho dwi Fachri
Description
Dataset

This dataset was created by Ridho dwi Fachri

Contents
m
A CycleGAN deep learning technique for artifact reduction in fundus...
data.mendeley.com
Updated Jan 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tae Keun Yoo (2020). A CycleGAN deep learning technique for artifact reduction in fundus photography [Dataset]. http://doi.org/10.17632/dh2x8v6nf8.1
Explore at:
Unique identifier
https://doi.org/10.17632/dh2x8v6nf8.1
Dataset updated
Jan 21, 2020
Authors
Tae Keun Yoo
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Herein, we present a deep learning technique to remove artifacts automatically in fundus photograph. By using a CycleGAN model, we synthesize the retinal images with artifact reduction based on low-quality image, and validated this technique in the independent test dataset.

This study included total 2,206 anonymized retinal images. We collected the fundus photographs without qualification, which include normal and pathologic retinal images. Images including both photograph with and without artifacts were crawled from Google image and dataset search using English keywords related to retina. The search strategy was based on the key terms “fundus photography”, “retinal image”, “artifact”, “quality assessment”, “retinal image grade”, “diabetic retinopathy”, “age-related macular degeneration”, “glaucoma”, “cataract”, and “fundus dataset”. Images with artifact were manually classified by authors. Finally, 1,146 images with artifacts and 1,060 images without artifacts were collected. The experiment process complied with the Declaration of Helsinki. This study did not require ethics committee approval; instead, researchers used open web-based and deidentified data.

We used the CoLaboratory’s CycleGAN tutorial page to develop and to validate CycleGAN model, and all codes were available in the webpage (https://www.tensorflow.org/tutorials/generative/cyclegan).

**This dataset may include MESSIDOR, HRF, FIRE, DRIVE, Kaggle DMR, and freely available images from Google image search. Images with and without artifacts were categorized to investigate artifact reduction.
python-automation-tutorial
kaggle.com
Updated Nov 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucas Henrique Mateo (2024). python-automation-tutorial [Dataset]. https://www.kaggle.com/datasets/lucashmateo/python-automation-tutorial/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lucas Henrique Mateo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Lucas Henrique Mateo

Released under Apache 2.0

Contents
MNIST From Tensorflow Tutorial
kaggle.com
Updated Nov 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arpan Dhatt (2017). MNIST From Tensorflow Tutorial [Dataset]. https://www.kaggle.com/arpandhatt/mnist-from-tensorflow-tutorial/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arpan Dhatt
Description
Dataset

This dataset was created by Arpan Dhatt

Contents
numpy-tutorial-seattle
kaggle.com
Updated Jul 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chao CHEN (2019). numpy-tutorial-seattle [Dataset]. https://www.kaggle.com/datasets/monkeyboard568/numpytutorialseattle/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chao CHEN
Area covered
Seattle
Description
Dataset

This dataset was created by Chao CHEN

Contents
complete pandas tutorial
kaggle.com
Updated Aug 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pritam Purohit (2020). complete pandas tutorial [Dataset]. https://www.kaggle.com/pritampurohit/complete-pandas-tutorial/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 24, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Pritam Purohit
Description
Dataset

This dataset was created by Pritam Purohit

Contents
Recommender Systems Tutorial
kaggle.com
zip
Updated Sep 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniil Barysevich (2018). Recommender Systems Tutorial [Dataset]. https://www.kaggle.com/devvindan/recommender-systems-tutorial
Explore at:
zip(31864 bytes)Available download formats
Dataset updated
Sep 16, 2018
Authors
Daniil Barysevich
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Daniil Barysevich

Released under CC0: Public Domain

Contents
vit-tutorial-illustrations
kaggle.com
zip
Updated Nov 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhinand (2020). vit-tutorial-illustrations [Dataset]. https://www.kaggle.com/abhinand05/vittutorialillustrations
Explore at:
zip(1062032 bytes)Available download formats
Dataset updated
Nov 29, 2020
Authors
Abhinand
Description
Dataset

This dataset was created by Abhinand

Contents

It contains the following files:

Facebook

Twitter

Click to copy link

Link copied

Cite

Seol (2022). Tutorial]1.Read various data format [Dataset]. https://www.kaggle.com/datasets/lys620/tutorial1read-various-data-format/code

Tutorial]1.Read various data format

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 14, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Seol

Description

Dataset

This dataset was created by Seol

Clear search

Close search

Google apps

Main menu

Tutorial]1.Read various data format

Dataset

Contents

House Prices + Credit Card Datasets (Full)

Dataset

Contents

Kaggle

Data from: SQL TUTORIAL

Dataset

Contents

tutorial

Dataset

Contents

practice dataset for tutorials

Dataset

Contents

Cat Dog Spider Pumpkin Hooman Dataset

DATA PREPROCESSING TUTORIAL DATASET

Dataset

Contents

‘US Adult Income’ analyzed by Analyst-2

Lectures & Tutorials

Replication Data for: \"A Topic-based Segmentation Model for Identifying...

EE226-tutorial

Dataset

Contents

tutorial

Dataset

Contents

A CycleGAN deep learning technique for artifact reduction in fundus...

python-automation-tutorial

Dataset

Contents

MNIST From Tensorflow Tutorial

Dataset

Contents

numpy-tutorial-seattle

Dataset

Contents

complete pandas tutorial

Dataset

Contents

Recommender Systems Tutorial

Dataset

Contents

vit-tutorial-illustrations

Dataset

Contents

Tutorial]1.Read various data format

Dataset

Contents