100+ datasets found
  1. All Seaborn Built-in Datasets 📊✨

    • kaggle.com
    zip
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelrahman Mohamed (2024). All Seaborn Built-in Datasets 📊✨ [Dataset]. https://www.kaggle.com/datasets/abdoomoh/all-seaborn-built-in-datasets
    Explore at:
    zip(1383218 bytes)Available download formats
    Dataset updated
    Aug 27, 2024
    Authors
    Abdelrahman Mohamed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.

    • Included Datasets:
      • Anagrams: Analysis of word anagram patterns.
      • Anscombe: Anscombe's quartet demonstrating the importance of data visualization.
      • Attention: Data on attention span variations in different scenarios.
      • Brain Networks: Connectivity data within brain networks.
      • Car Crashes: US car crash statistics.
      • Diamonds: Data on diamond properties including price, cut, and clarity.
      • Dots: Randomly generated data for scatter plot visualization.
      • Dow Jones: Historical records of the Dow Jones Industrial Average.
      • Exercise: The relationship between exercise and health metrics.
      • Flights: Monthly passenger numbers on flights.
      • FMRI: Functional MRI data capturing brain activity.
      • Geyser: Eruption times of the Old Faithful geyser.
      • Glue: Strength of glue under different conditions.
      • Health Expenditure: Health expenditure statistics across countries.
      • Iris: Famous dataset for classifying Iris species.
      • MPG: Miles per gallon for various vehicles.
      • Penguins: Data on penguin species and their features.
      • Planets: Characteristics of discovered exoplanets.
      • Sea Ice: Measurements of sea ice extent.
      • Taxis: Taxi trips data in a city.
      • Tips: Tipping data collected from a restaurant.
      • Titanic: Survival data from the Titanic disaster.

    This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.

  2. h

    python-code-dataset-500k

    • huggingface.co
    Updated Jan 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James (2024). python-code-dataset-500k [Dataset]. https://huggingface.co/datasets/jtatman/python-code-dataset-500k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 22, 2024
    Authors
    James
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Attention: This dataset is a summary and reformat pulled from github code.

    You should make your own assumptions based on this. In fact, there is another dataset I formed through parsing that addresses several points:

    out of 500k python related items, most of them are python-ish, not pythonic the majority of the items here contain excessive licensing inclusion of original code the items here are sometimes not even python but have references There's a whole lot of gpl summaries… See the full description on the dataset page: https://huggingface.co/datasets/jtatman/python-code-dataset-500k.

  3. Dataset_Python_Question_Answer

    • kaggle.com
    zip
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chinmaya (2024). Dataset_Python_Question_Answer [Dataset]. https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer
    Explore at:
    zip(189137 bytes)Available download formats
    Dataset updated
    Mar 29, 2024
    Authors
    Chinmaya
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    This dataset is about Python programming. Question and answers are generated using Gemma. There are more than four hundred questions and their corresponding answers about Python programming.

    Questions are ranging from concepts like data-types, variables and keywords to regular-expression and threading.

    I have used this dataset here

    The code used for dataset generated is available here

  4. datasets

    • figshare.com
    txt
    Updated Sep 27, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Rodriguez-Contreras (2017). datasets [Dataset]. http://doi.org/10.6084/m9.figshare.5447167.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 27, 2017
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Carlos Rodriguez-Contreras
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains datasets to be downloaded from students for their practices with R and Python

  5. h

    code-search-net-python

    • huggingface.co
    Updated Dec 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Tarin Morales (2023). code-search-net-python [Dataset]. https://huggingface.co/datasets/Nan-Do/code-search-net-python
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 27, 2023
    Authors
    Fernando Tarin Morales
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "code-search-net-python"

      Dataset Description
    

    Homepage: None Repository: https://huggingface.co/datasets/Nan-Do/code-search-net-python Paper: None Leaderboard: None Point of Contact: @Nan-Do

      Dataset Summary
    

    This dataset is the Python portion of the CodeSarchNet annotated with a summary column.The code-search-net dataset includes open source functions that include comments found at GitHub.The summary is a short description of what the… See the full description on the dataset page: https://huggingface.co/datasets/Nan-Do/code-search-net-python.

  6. Datasets for manuscript "A data engineering framework for chemical flow...

    • catalog.data.gov
    • gimi9.com
    Updated Nov 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
    Explore at:
    Dataset updated
    Nov 7, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).

  7. Z

    #PraCegoVer dataset

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jan 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
    Explore at:
    Dataset updated
    Jan 19, 2023
    Dataset provided by
    Institute of Computing, University of Campinas
    Authors
    Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila
    Description

    Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

    PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

    PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

    Dataset Structure

    PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

    containing the images. The file dataset.json comprehends a list of json objects with the attributes:

    user: anonymized user that made the post;

    filename: image file name;

    raw_caption: raw caption;

    caption: clean caption;

    date: post date.

    Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

    Download Instructions

    If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

    cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

    Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

    python download_dataset.py --access_token=

  8. Pandas Practice Dataset

    • kaggle.com
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mrityunjay Pathak (2023). Pandas Practice Dataset [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/pandas-practice-dataset/discussion
    Explore at:
    zip(493 bytes)Available download formats
    Dataset updated
    Jan 27, 2023
    Authors
    Mrityunjay Pathak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    What is Pandas?

    Pandas is a Python library used for working with data sets.

    It has functions for analyzing, cleaning, exploring, and manipulating data.

    The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

    Why Use Pandas?

    Pandas allows us to analyze big data and make conclusions based on statistical theories.

    Pandas can clean messy data sets, and make them readable and relevant.

    Relevant data is very important in data science.

    What Can Pandas Do?

    Pandas gives you answers about the data. Like:

    Is there a correlation between two or more columns?

    What is average value?

    Max value?

    Min value?

  9. h

    codeparrot

    • huggingface.co
    Updated Sep 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Language Processing with Transformers (2021). codeparrot [Dataset]. https://huggingface.co/datasets/transformersbook/codeparrot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2021
    Dataset authored and provided by
    Natural Language Processing with Transformers
    Description

    CodeParrot 🦜 Dataset

      What is it?
    

    This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.

      Creation
    

    It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.

  10. f

    datasets

    • figshare.com
    txt
    Updated Oct 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Rodriguez-Contreras (2017). datasets [Dataset]. http://doi.org/10.6084/m9.figshare.5472970.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 5, 2017
    Dataset provided by
    figshare
    Authors
    Carlos Rodriguez-Contreras
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets for practising in class

  11. Ecommerce Dataset for Data Analysis

    • kaggle.com
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
    Explore at:
    zip(2028853 bytes)Available download formats
    Dataset updated
    Sep 19, 2024
    Authors
    Shrishti Manja
    Description

    This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

    About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

    Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

    This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

    This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

  12. Sample data files for Python Course

    • figshare.com
    txt
    Updated Nov 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Verhaar (2022). Sample data files for Python Course [Dataset]. http://doi.org/10.6084/m9.figshare.21501549.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 4, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Peter Verhaar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample data set used in an introductory course on Programming in Python

  13. Data set python

    • kaggle.com
    zip
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kolluri Nithin (2023). Data set python [Dataset]. https://www.kaggle.com/datasets/kollurinithin/data-set-python/code
    Explore at:
    zip(309360 bytes)Available download formats
    Dataset updated
    Jul 13, 2023
    Authors
    Kolluri Nithin
    Description

    Dataset

    This dataset was created by Kolluri Nithin

    Contents

  14. Images in CSV datasets

    • kaggle.com
    zip
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pascal (2024). Images in CSV datasets [Dataset]. https://www.kaggle.com/datasets/pyim59/images-in-csv-datasets
    Explore at:
    zip(347504240 bytes)Available download formats
    Dataset updated
    Oct 14, 2024
    Authors
    Pascal
    Description

    Images sous forme de fichiers CSV pour une application de méthodes de machine learning "classiques" Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim

    "mnist_big.csv"

    Reconnaissance d'images de chiffres manuscrits

    Version "mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

    Source : https://www.kaggle.com/datasets/oddrationale/mnist-in-csv

    "sign_mnist_big.csv"

    Reconnaissance d'images de gestes de la langue des signes

    Version "sign_mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

    Source : https://www.kaggle.com/datasets/datamunge/sign-language-mnist

    "zalando_small.csv"

    Reconnaissance de vêtements et chaussures (Zalando)

    Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

    "hmnist_8_8_RGB.csv"

    Reconnaissance de tumeurs de la peau (images en couleurs, trois valeurs R,G,B par pixel)

    Autres versions avec des images plus petites et/ou en niveaux de gris

    Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

    "cifar10_small.csv"

    Reconnaissance de petites images en couleurs dans 10 catégories Version en CSV du dataset CIFAR10

    Source : https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv?select=train.csv

  15. Vector datasets for workshop "Introduction to Geospatial Raster and Vector...

    • figshare.com
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Avery (2022). Vector datasets for workshop "Introduction to Geospatial Raster and Vector Data with Python" [Dataset]. http://doi.org/10.6084/m9.figshare.21273837.v1
    Explore at:
    application/x-sqlite3Available download formats
    Dataset updated
    Oct 5, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ryan Avery
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cadaster data from PDOK used to illustrate the use of geopandas and shapely, geospatial python packages for manipulating vector data. The brpgewaspercelen_definitief_2020.gpkg file has been subsetted in order to make the download manageable for workshops. Other datasets are copies of those available from PDOK.

  16. h

    python-qa-instructions-dataset

    • huggingface.co
    Updated Sep 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ketan (2023). python-qa-instructions-dataset [Dataset]. https://huggingface.co/datasets/iamketan25/python-qa-instructions-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2023
    Authors
    Ketan
    Description

    iamketan25/python-qa-instructions-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. COVID-19 Data Visualization Using Python

    • kaggle.com
    zip
    Updated Apr 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adithya Wijesinghe (2023). COVID-19 Data Visualization Using Python [Dataset]. https://www.kaggle.com/datasets/adithyawijesinghe/covid-19-data
    Explore at:
    zip(1291081 bytes)Available download formats
    Dataset updated
    Apr 21, 2023
    Authors
    Adithya Wijesinghe
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Data visualization using Python (Pandas, Plotly).

    Data was used to visualization of the infection rate and the death rate from 01/20 to 04/22.

    The data was made available on Github: https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv

  18. h

    xlcost-text-to-code

    • huggingface.co
    Updated Nov 3, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CodeParrot (2022). xlcost-text-to-code [Dataset]. https://huggingface.co/datasets/codeparrot/xlcost-text-to-code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2022
    Dataset authored and provided by
    CodeParrot
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    XLCoST is a machine learning benchmark dataset that contains fine-grained parallel data in 7 commonly used programming languages (C++, Java, Python, C#, Javascript, PHP, C), and natural language (English).

  19. DataCamp Experimental Design in Python DataSets

    • kaggle.com
    zip
    Updated Aug 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JABERI Mohamed Habib (2024). DataCamp Experimental Design in Python DataSets [Dataset]. https://www.kaggle.com/datasets/jaberimohamedhabib/experimental-design-in-python-datasets
    Explore at:
    zip(112825 bytes)Available download formats
    Dataset updated
    Aug 5, 2024
    Authors
    JABERI Mohamed Habib
    Description

    Dataset

    This dataset was created by JABERI Mohamed Habib

    Contents

  20. VegeNet - Image datasets and Codes

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jo Yen Tan; Jo Yen Tan (2022). VegeNet - Image datasets and Codes [Dataset]. http://doi.org/10.5281/zenodo.7254508
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jo Yen Tan; Jo Yen Tan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Compilation of python codes for data preprocessing and VegeNet building, as well as image datasets (zip files).

    Image datasets:

    1. vege_original : Images of vegetables captured manually in data acquisition stage
    2. vege_cropped_renamed : Images in (1) cropped to remove background areas and image labels renamed
    3. non-vege images : Images of non-vegetable foods for CNN network to recognize other-than-vegetable foods
    4. food_image_dataset : Complete set of vege (2) and non-vege (3) images for architecture building.
    5. food_image_dataset_split : Image dataset (4) split into train and test sets
    6. process : Images created when cropping (pre-processing step) to create dataset (2).
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Abdelrahman Mohamed (2024). All Seaborn Built-in Datasets 📊✨ [Dataset]. https://www.kaggle.com/datasets/abdoomoh/all-seaborn-built-in-datasets
Organization logo

All Seaborn Built-in Datasets 📊✨

A Complete Set of Seaborn Datasets for Analysis and Visualization

Explore at:
zip(1383218 bytes)Available download formats
Dataset updated
Aug 27, 2024
Authors
Abdelrahman Mohamed
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.

  • Included Datasets:
    • Anagrams: Analysis of word anagram patterns.
    • Anscombe: Anscombe's quartet demonstrating the importance of data visualization.
    • Attention: Data on attention span variations in different scenarios.
    • Brain Networks: Connectivity data within brain networks.
    • Car Crashes: US car crash statistics.
    • Diamonds: Data on diamond properties including price, cut, and clarity.
    • Dots: Randomly generated data for scatter plot visualization.
    • Dow Jones: Historical records of the Dow Jones Industrial Average.
    • Exercise: The relationship between exercise and health metrics.
    • Flights: Monthly passenger numbers on flights.
    • FMRI: Functional MRI data capturing brain activity.
    • Geyser: Eruption times of the Old Faithful geyser.
    • Glue: Strength of glue under different conditions.
    • Health Expenditure: Health expenditure statistics across countries.
    • Iris: Famous dataset for classifying Iris species.
    • MPG: Miles per gallon for various vehicles.
    • Penguins: Data on penguin species and their features.
    • Planets: Characteristics of discovered exoplanets.
    • Sea Ice: Measurements of sea ice extent.
    • Taxis: Taxi trips data in a city.
    • Tips: Tipping data collected from a restaurant.
    • Titanic: Survival data from the Titanic disaster.

This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.

Search
Clear search
Close search
Google apps
Main menu