7 datasets found

Cats and Dogs Sentdex Tutorial
kaggle.com
Updated Oct 8, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sherpa (2018). Cats and Dogs Sentdex Tutorial [Dataset]. https://www.kaggle.com/thesherpafromalabama/cats-and-dogs-sentdex-tutorial/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 8, 2018
Dataset provided by
Kaggle
Authors
Sherpa
Description
This is the dataset that goes along with the Deep Learning basics with Python, TensorFlow and Keras p.2 Tutorial provided by Sentdex. Link here: https://www.youtube.com/watch?v=j-3vuBynnOE&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=2

Original data source: https://www.youtube.com/redirect?redir_token=wz7434m56UWH1Vs4X1Je76XEoE58MTUzOTA3NDQwMUAxNTM4OTg4MDAx&q=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fdownload%2Fconfirmation.aspx%3Fid%3D54765&v=j-3vuBynnOE&event=video_description
b
Kaggle
bioregistry.io
registry.identifiers.org
Updated Mar 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Kaggle [Dataset]. http://identifiers.org/re3data:r3d100012705
Explore at:
Unique identifier
https://identifiers.org/re3data:r3d100012705
Dataset updated
Mar 18, 2022
Description
Kaggle is a platform for sharing data, performing reproducible analyses, interactive data analysis tutorials, and machine learning competitions.
A
‘US Adult Income’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘US Adult Income’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-adult-income-0e01/5ad34330/?iid=023-649&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Analysis of ‘US Adult Income’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/johnolafenwa/us-census-data on 28 January 2022.

--- Dataset description provided by original source is as follows ---

US Adult Census data relating income to social factors such as Age, Education, race etc.

The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K".

This Data set is split into two CSV files, named adult-training.txt and adult-test.txt.

The goal here is to train a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.

Note that the dataset is made up of categorical and continuous features. It also contains missing values The categorical columns are: workclass, education, marital_status, occupation, relationship, race, gender, native_country

The continuous columns are: age, education_num, capital_gain, capital_loss, hours_per_week

This Dataset was obtained from the UCI repository, it can be found on

https://archive.ics.uci.edu/ml/datasets/census+income, http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/

USAGE This dataset is well suited to developing and testing wide linear classifiers, deep neutral network classifiers and a combination of both. For more info on Combined Deep and Wide Model classifiers, refer to the Research Paper by Google https://arxiv.org/abs/1606.07792

Refer to this kernel for sample usage : https://www.kaggle.com/johnolafenwa/wage-prediction

Complete Tutorial is available from http://johnolafenwa.blogspot.com.ng/2017/07/machine-learning-tutorial-1-wage.html?m=1

--- Original source retains full ownership of the source dataset ---
A
‘homeprices-multiple-variables’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘homeprices-multiple-variables’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-homeprices-multiple-variables-e06e/acea5a36/?iid=001-683&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘homeprices-multiple-variables’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/pankeshpatel/homepricesmultiplevariables on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Context

Sample data of housing price. We have used this small data set to create a tutorial -- Machine learning for absolute beginners. The topic is Multivariate Regression.

Content

It has the following four attributes, describing a house - **area **: area of a house in square feet - bedrooms: number of bedrooms in a house - **age **: age of house - price: price of a house.

Area, bedrooms, and age are feature attributes and price is target attributes/variable.

Acknowledgements

Source: codebasics : https://twitter.com/codebasicshub

--- Original source retains full ownership of the source dataset ---
US Adult Income
kaggle.com
Updated Jul 14, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Olafenwa (2017). US Adult Income [Dataset]. https://www.kaggle.com/johnolafenwa/us-census-data/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 14, 2017
Dataset provided by
Kaggle
Authors
John Olafenwa
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
US Adult Census data relating income to social factors such as Age, Education, race etc.

The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K".

This Data set is split into two CSV files, named adult-training.txt and adult-test.txt.

The goal here is to train a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.

Note that the dataset is made up of categorical and continuous features. It also contains missing values The categorical columns are: workclass, education, marital_status, occupation, relationship, race, gender, native_country

The continuous columns are: age, education_num, capital_gain, capital_loss, hours_per_week

This Dataset was obtained from the UCI repository, it can be found on

https://archive.ics.uci.edu/ml/datasets/census+income, http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/

USAGE This dataset is well suited to developing and testing wide linear classifiers, deep neutral network classifiers and a combination of both. For more info on Combined Deep and Wide Model classifiers, refer to the Research Paper by Google https://arxiv.org/abs/1606.07792

Refer to this kernel for sample usage : https://www.kaggle.com/johnolafenwa/wage-prediction

Complete Tutorial is available from http://johnolafenwa.blogspot.com.ng/2017/07/machine-learning-tutorial-1-wage.html?m=1
Thanos or Grimace
kaggle.com
Updated Dec 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arturo Moncada Torres (2019). Thanos or Grimace [Dataset]. https://www.kaggle.com/arturomoncadatorres/thanos-or-grimace/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2019
Dataset provided by
Kaggle
Authors
Arturo Moncada Torres
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Thanos or Grimace?

Avengers: Infinity War is one of my favorite movies not only from the superhero genre, but of all time. Besides an insane character crossover, it features quite a few scenes with pop culture references. In one of them, Peter Quill (aka Star-Lord) refers to Thanos as "Grimace"

For those who don't know, Grimace is one of McDonaldland's characters. These were quite popular in the 80s and 90s (which fits with Quill's time in Earth as a kid). This reference inspired cosplayers Kittie Cosplay and Banana Steve in making an even crazier crossover: Grimos. This made me wonder: can we create an algorithm capable of distinguishing between these two purple characters?

Data Fetching

Unsurprisingly, I couldn't find an existing dataset for this project. Thus, I created it myself leveraging the search function of Google Images. Originally, I wanted to use a semi-automatic approach, similar to what is suggested in this tutorial. However, I could never get it running. My guess is that it is a bit old and no longer compatible with the most recent versions of web browsers. Therefore, I looked for other alternatives and found Fatkun Batch Download Image extension for Chrome. This is a very handy tool which allows you to manually select and download all the images of a single tab in your browser to your computer. Although at the beginning I was a bit bummed that I had to click through a lot of images, I quickly realized this was a necessary step if I wanted to have a decently curated dataset.

I selected and downloaded images that:

Had the character in any representation (e.g., photo, comic, drawing, cartoon, etc.)

Showed the character from different angles

All images are JPEGs.

Showcase

You can find the repository of the project where these data are put in action here.

Acknowledgements

Images were scrapped using Google Images and Fatkun Batch Download Image extension for Chrome

Disclaimer

All images were obtained from publicly available websites. If you own the rights to any of the shown images and wish to get it removed from this dataset, please let me know.
Swedish NER corpus
kaggle.com
Updated Dec 12, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Klintberg (2017). Swedish NER corpus [Dataset]. https://www.kaggle.com/andreasklintberg/swedish-ner-corpus/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 12, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Andreas Klintberg
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

Bootstrapped and manually annotated NER Swedish web news from 2012. NER stands for Named entity recognition, and its used to describe entities in a text such as organisations, locations and people for instance.

Its a very common operation in general NLP pipeline, and several algorithms can be used to train a model. Traditionally many NER systems were trained using some kind of CRF (conditionally random fields) approach, but nowadays many people successfully uses LSTM:s or other sequence based deep learning techniques.

A tutorial on how to use this dataset to train an NER for Stanford CoreNLP is available here https://medium.com/@klintcho/training-a-swedish-ner-model-for-stanford-corenlp-part-2-20a0cfd801dd

Content

The dataset is very simple and can easily be adapted into other formats, it is specifically adapted to CoreNLP NER. Thus the first column is a word. Second column (tab separated) is either the NER category (ORG, PER, LOC, MISC) or a 0 if it does not belong to any category (not an entity). Each word is separated by a new line, and each sentence is separated by an empty new line.

Sample structure (of two sentences, one three word sentence, and another 4 word sentence): Apple ORG is 0 nice 0 . 0

Per PER is 0 not 0 sad 0

Acknowledgements

Text is annotated from http://spraakbanken.gu.se/eng/resource/webbnyheter2012. Thanks Norah Klintberg Sakal for helping out with the annotation and reviewing all annotations as well.

Inspiration

Feel free to use this for whatever you like. As most datasets it would definitely benefit from becoming larger, feel free to create a pull request https://github.com/klintan/swedish-ner-corpus/ or update it here on Kaggle.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sherpa (2018). Cats and Dogs Sentdex Tutorial [Dataset]. https://www.kaggle.com/thesherpafromalabama/cats-and-dogs-sentdex-tutorial/code

Cats and Dogs Sentdex Tutorial

Deep Learning Basics

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 8, 2018

Dataset provided by

Kaggle

Authors

Sherpa

Description

This is the dataset that goes along with the Deep Learning basics with Python, TensorFlow and Keras p.2 Tutorial provided by Sentdex. Link here: https://www.youtube.com/watch?v=j-3vuBynnOE&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=2

Original data source: https://www.youtube.com/redirect?redir_token=wz7434m56UWH1Vs4X1Je76XEoE58MTUzOTA3NDQwMUAxNTM4OTg4MDAx&q=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fdownload%2Fconfirmation.aspx%3Fid%3D54765&v=j-3vuBynnOE&event=video_description

Clear search

Close search

Google apps

Main menu

Cats and Dogs Sentdex Tutorial

Kaggle

‘US Adult Income’ analyzed by Analyst-2

‘homeprices-multiple-variables’ analyzed by Analyst-2

Context

Content

Acknowledgements

US Adult Income

Thanos or Grimace

Thanos or Grimace?

Data Fetching

Showcase

Acknowledgements

Disclaimer

Swedish NER corpus

Context

Content

Acknowledgements

Inspiration

Cats and Dogs Sentdex Tutorial

Deep Learning Basics