7 datasets found
  1. Cats and Dogs Sentdex Tutorial

    • kaggle.com
    Updated Oct 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sherpa (2018). Cats and Dogs Sentdex Tutorial [Dataset]. https://www.kaggle.com/thesherpafromalabama/cats-and-dogs-sentdex-tutorial/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 8, 2018
    Dataset provided by
    Kaggle
    Authors
    Sherpa
    Description
  2. b

    Kaggle

    • bioregistry.io
    • registry.identifiers.org
    Updated Mar 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Kaggle [Dataset]. http://identifiers.org/re3data:r3d100012705
    Explore at:
    Dataset updated
    Mar 18, 2022
    Description

    Kaggle is a platform for sharing data, performing reproducible analyses, interactive data analysis tutorials, and machine learning competitions.

  3. A

    ‘US Adult Income’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘US Adult Income’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-adult-income-0e01/5ad34330/?iid=023-649&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Analysis of ‘US Adult Income’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/johnolafenwa/us-census-data on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    US Adult Census data relating income to social factors such as Age, Education, race etc.

    The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K".

    This Data set is split into two CSV files, named adult-training.txt and adult-test.txt.

    The goal here is to train a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.

    Note that the dataset is made up of categorical and continuous features. It also contains missing values The categorical columns are: workclass, education, marital_status, occupation, relationship, race, gender, native_country

    The continuous columns are: age, education_num, capital_gain, capital_loss, hours_per_week

    This Dataset was obtained from the UCI repository, it can be found on

    https://archive.ics.uci.edu/ml/datasets/census+income, http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/

    USAGE This dataset is well suited to developing and testing wide linear classifiers, deep neutral network classifiers and a combination of both. For more info on Combined Deep and Wide Model classifiers, refer to the Research Paper by Google https://arxiv.org/abs/1606.07792

    Refer to this kernel for sample usage : https://www.kaggle.com/johnolafenwa/wage-prediction

    Complete Tutorial is available from http://johnolafenwa.blogspot.com.ng/2017/07/machine-learning-tutorial-1-wage.html?m=1

    --- Original source retains full ownership of the source dataset ---

  4. A

    ‘homeprices-multiple-variables’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘homeprices-multiple-variables’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-homeprices-multiple-variables-e06e/acea5a36/?iid=001-683&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘homeprices-multiple-variables’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/pankeshpatel/homepricesmultiplevariables on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Sample data of housing price. We have used this small data set to create a tutorial -- Machine learning for absolute beginners. The topic is Multivariate Regression.

    Content

    It has the following four attributes, describing a house - **area **: area of a house in square feet - bedrooms: number of bedrooms in a house - **age **: age of house - price: price of a house.

    Area, bedrooms, and age are feature attributes and price is target attributes/variable.

    Acknowledgements

    Source: codebasics : https://twitter.com/codebasicshub

    --- Original source retains full ownership of the source dataset ---

  5. US Adult Income

    • kaggle.com
    Updated Jul 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Olafenwa (2017). US Adult Income [Dataset]. https://www.kaggle.com/johnolafenwa/us-census-data/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 14, 2017
    Dataset provided by
    Kaggle
    Authors
    John Olafenwa
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    US Adult Census data relating income to social factors such as Age, Education, race etc.

    The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K".

    This Data set is split into two CSV files, named adult-training.txt and adult-test.txt.

    The goal here is to train a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.

    Note that the dataset is made up of categorical and continuous features. It also contains missing values The categorical columns are: workclass, education, marital_status, occupation, relationship, race, gender, native_country

    The continuous columns are: age, education_num, capital_gain, capital_loss, hours_per_week

    This Dataset was obtained from the UCI repository, it can be found on

    https://archive.ics.uci.edu/ml/datasets/census+income, http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/

    USAGE This dataset is well suited to developing and testing wide linear classifiers, deep neutral network classifiers and a combination of both. For more info on Combined Deep and Wide Model classifiers, refer to the Research Paper by Google https://arxiv.org/abs/1606.07792

    Refer to this kernel for sample usage : https://www.kaggle.com/johnolafenwa/wage-prediction

    Complete Tutorial is available from http://johnolafenwa.blogspot.com.ng/2017/07/machine-learning-tutorial-1-wage.html?m=1

  6. Thanos or Grimace

    • kaggle.com
    Updated Dec 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arturo Moncada Torres (2019). Thanos or Grimace [Dataset]. https://www.kaggle.com/arturomoncadatorres/thanos-or-grimace/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2019
    Dataset provided by
    Kaggle
    Authors
    Arturo Moncada Torres
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Thanos or Grimace?

    Avengers: Infinity War is one of my favorite movies not only from the superhero genre, but of all time. Besides an insane character crossover, it features quite a few scenes with pop culture references. In one of them, Peter Quill (aka Star-Lord) refers to Thanos as "Grimace"

    For those who don't know, Grimace is one of McDonaldland's characters. These were quite popular in the 80s and 90s (which fits with Quill's time in Earth as a kid). This reference inspired cosplayers Kittie Cosplay and Banana Steve in making an even crazier crossover: Grimos. This made me wonder: can we create an algorithm capable of distinguishing between these two purple characters?

    Data Fetching

    Unsurprisingly, I couldn't find an existing dataset for this project. Thus, I created it myself leveraging the search function of Google Images. Originally, I wanted to use a semi-automatic approach, similar to what is suggested in this tutorial. However, I could never get it running. My guess is that it is a bit old and no longer compatible with the most recent versions of web browsers. Therefore, I looked for other alternatives and found Fatkun Batch Download Image extension for Chrome. This is a very handy tool which allows you to manually select and download all the images of a single tab in your browser to your computer. Although at the beginning I was a bit bummed that I had to click through a lot of images, I quickly realized this was a necessary step if I wanted to have a decently curated dataset.

    I selected and downloaded images that:

    • Had the character in any representation (e.g., photo, comic, drawing, cartoon, etc.)

    • Showed the character from different angles

    All images are JPEGs.

    Showcase

    You can find the repository of the project where these data are put in action here.

    Acknowledgements

    Images were scrapped using Google Images and Fatkun Batch Download Image extension for Chrome

    Disclaimer

    All images were obtained from publicly available websites. If you own the rights to any of the shown images and wish to get it removed from this dataset, please let me know.

  7. Swedish NER corpus

    • kaggle.com
    Updated Dec 12, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Klintberg (2017). Swedish NER corpus [Dataset]. https://www.kaggle.com/andreasklintberg/swedish-ner-corpus/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 12, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Andreas Klintberg
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Bootstrapped and manually annotated NER Swedish web news from 2012. NER stands for Named entity recognition, and its used to describe entities in a text such as organisations, locations and people for instance.

    Its a very common operation in general NLP pipeline, and several algorithms can be used to train a model. Traditionally many NER systems were trained using some kind of CRF (conditionally random fields) approach, but nowadays many people successfully uses LSTM:s or other sequence based deep learning techniques.

    A tutorial on how to use this dataset to train an NER for Stanford CoreNLP is available here https://medium.com/@klintcho/training-a-swedish-ner-model-for-stanford-corenlp-part-2-20a0cfd801dd

    Content

    The dataset is very simple and can easily be adapted into other formats, it is specifically adapted to CoreNLP NER. Thus the first column is a word. Second column (tab separated) is either the NER category (ORG, PER, LOC, MISC) or a 0 if it does not belong to any category (not an entity). Each word is separated by a new line, and each sentence is separated by an empty new line.

    Sample structure (of two sentences, one three word sentence, and another 4 word sentence): Apple ORG is 0 nice 0 . 0

    Per PER is 0 not 0 sad 0

    Acknowledgements

    Text is annotated from http://spraakbanken.gu.se/eng/resource/webbnyheter2012. Thanks Norah Klintberg Sakal for helping out with the annotation and reviewing all annotations as well.

    Inspiration

    Feel free to use this for whatever you like. As most datasets it would definitely benefit from becoming larger, feel free to create a pull request https://github.com/klintan/swedish-ner-corpus/ or update it here on Kaggle.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sherpa (2018). Cats and Dogs Sentdex Tutorial [Dataset]. https://www.kaggle.com/thesherpafromalabama/cats-and-dogs-sentdex-tutorial/code
Organization logo

Cats and Dogs Sentdex Tutorial

Deep Learning Basics

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 8, 2018
Dataset provided by
Kaggle
Authors
Sherpa
Description
Search
Clear search
Close search
Google apps
Main menu