100+ datasets found

All Seaborn Built-in Datasets 📊✨
kaggle.com
zip
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelrahman Mohamed (2024). All Seaborn Built-in Datasets 📊✨ [Dataset]. https://www.kaggle.com/datasets/abdoomoh/all-seaborn-built-in-datasets
Explore at:
zip(1383218 bytes)Available download formats
Dataset updated
Aug 27, 2024
Authors
Abdelrahman Mohamed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.

Included Datasets:

Anagrams: Analysis of word anagram patterns.

Anscombe: Anscombe's quartet demonstrating the importance of data visualization.

Attention: Data on attention span variations in different scenarios.

Brain Networks: Connectivity data within brain networks.

Car Crashes: US car crash statistics.

Diamonds: Data on diamond properties including price, cut, and clarity.

Dots: Randomly generated data for scatter plot visualization.

Dow Jones: Historical records of the Dow Jones Industrial Average.

Exercise: The relationship between exercise and health metrics.

Flights: Monthly passenger numbers on flights.

FMRI: Functional MRI data capturing brain activity.

Geyser: Eruption times of the Old Faithful geyser.

Glue: Strength of glue under different conditions.

Health Expenditure: Health expenditure statistics across countries.

Iris: Famous dataset for classifying Iris species.

MPG: Miles per gallon for various vehicles.

Penguins: Data on penguin species and their features.

Planets: Characteristics of discovered exoplanets.

Sea Ice: Measurements of sea ice extent.

Taxis: Taxi trips data in a city.

Tips: Tipping data collected from a restaurant.

Titanic: Survival data from the Titanic disaster.

This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.
h
python-code-dataset-500k
huggingface.co
Updated Jan 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James (2024). python-code-dataset-500k [Dataset]. https://huggingface.co/datasets/jtatman/python-code-dataset-500k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 22, 2024
Authors
James
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Attention: This dataset is a summary and reformat pulled from github code.

You should make your own assumptions based on this. In fact, there is another dataset I formed through parsing that addresses several points:

out of 500k python related items, most of them are python-ish, not pythonic the majority of the items here contain excessive licensing inclusion of original code the items here are sometimes not even python but have references There's a whole lot of gpl summaries… See the full description on the dataset page: https://huggingface.co/datasets/jtatman/python-code-dataset-500k.
Dataset_Python_Question_Answer
kaggle.com
zip
Updated Mar 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chinmaya (2024). Dataset_Python_Question_Answer [Dataset]. https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer
Explore at:
zip(189137 bytes)Available download formats
Dataset updated
Mar 29, 2024
Authors
Chinmaya
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
This dataset is about Python programming. Question and answers are generated using Gemma. There are more than four hundred questions and their corresponding answers about Python programming.

Questions are ranging from concepts like data-types, variables and keywords to regular-expression and threading.

I have used this dataset here

The code used for dataset generated is available here
datasets
figshare.com
txt
Updated Sep 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Rodriguez-Contreras (2017). datasets [Dataset]. http://doi.org/10.6084/m9.figshare.5447167.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5447167.v1
Dataset updated
Sep 27, 2017
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Carlos Rodriguez-Contreras
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This folder contains datasets to be downloaded from students for their practices with R and Python
h
code-search-net-python
huggingface.co
Updated Dec 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Tarin Morales (2023). code-search-net-python [Dataset]. https://huggingface.co/datasets/Nan-Do/code-search-net-python
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 27, 2023
Authors
Fernando Tarin Morales
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for "code-search-net-python"

Dataset Description

Homepage: None Repository: https://huggingface.co/datasets/Nan-Do/code-search-net-python Paper: None Leaderboard: None Point of Contact: @Nan-Do

Dataset Summary

This dataset is the Python portion of the CodeSarchNet annotated with a summary column.The code-search-net dataset includes open source functions that include comments found at GitHub.The summary is a short description of what the… See the full description on the dataset page: https://huggingface.co/datasets/Nan-Do/code-search-net-python.
Datasets for manuscript "A data engineering framework for chemical flow...
catalog.data.gov
gimi9.com
Updated Nov 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
Explore at:
Dataset updated
Nov 7, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).
Z
#PraCegoVer dataset
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jan 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
Explore at:
Dataset updated
Jan 19, 2023
Dataset provided by
Institute of Computing, University of Campinas
Authors
Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila
Description
Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

Dataset Structure

PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

containing the images. The file dataset.json comprehends a list of json objects with the attributes:

user: anonymized user that made the post;

filename: image file name;

raw_caption: raw caption;

caption: clean caption;

date: post date.

Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

Download Instructions

If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

python download_dataset.py --access_token=
Pandas Practice Dataset
kaggle.com
zip
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrityunjay Pathak (2023). Pandas Practice Dataset [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/pandas-practice-dataset/discussion
Explore at:
zip(493 bytes)Available download formats
Dataset updated
Jan 27, 2023
Authors
Mrityunjay Pathak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
What is Pandas?

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

What Can Pandas Do?

Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

What is average value?

Max value?

Min value?
h
codeparrot
huggingface.co
Updated Sep 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natural Language Processing with Transformers (2021). codeparrot [Dataset]. https://huggingface.co/datasets/transformersbook/codeparrot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2021
Dataset authored and provided by
Natural Language Processing with Transformers
Description
CodeParrot 🦜 Dataset

What is it?

This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.

Creation

It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.
f
datasets
figshare.com
txt
Updated Oct 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Rodriguez-Contreras (2017). datasets [Dataset]. http://doi.org/10.6084/m9.figshare.5472970.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5472970.v1
Dataset updated
Oct 5, 2017
Dataset provided by
figshare
Authors
Carlos Rodriguez-Contreras
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets for practising in class
Ecommerce Dataset for Data Analysis
kaggle.com
zip
Updated Sep 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
Explore at:
zip(2028853 bytes)Available download formats
Dataset updated
Sep 19, 2024
Authors
Shrishti Manja
Description
This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning
Sample data files for Python Course
figshare.com
txt
Updated Nov 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Verhaar (2022). Sample data files for Python Course [Dataset]. http://doi.org/10.6084/m9.figshare.21501549.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21501549.v1
Dataset updated
Nov 4, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Peter Verhaar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sample data set used in an introductory course on Programming in Python
Data set python
kaggle.com
zip
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kolluri Nithin (2023). Data set python [Dataset]. https://www.kaggle.com/datasets/kollurinithin/data-set-python/code
Explore at:
zip(309360 bytes)Available download formats
Dataset updated
Jul 13, 2023
Authors
Kolluri Nithin
Description
Dataset

This dataset was created by Kolluri Nithin

Contents
Images in CSV datasets
kaggle.com
zip
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pascal (2024). Images in CSV datasets [Dataset]. https://www.kaggle.com/datasets/pyim59/images-in-csv-datasets
Explore at:
zip(347504240 bytes)Available download formats
Dataset updated
Oct 14, 2024
Authors
Pascal
Description
Images sous forme de fichiers CSV pour une application de méthodes de machine learning "classiques" Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim

"mnist_big.csv"

Reconnaissance d'images de chiffres manuscrits

Version "mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

Source : https://www.kaggle.com/datasets/oddrationale/mnist-in-csv

"sign_mnist_big.csv"

Reconnaissance d'images de gestes de la langue des signes

Version "sign_mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

Source : https://www.kaggle.com/datasets/datamunge/sign-language-mnist

"zalando_small.csv"

Reconnaissance de vêtements et chaussures (Zalando)

Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

"hmnist_8_8_RGB.csv"

Reconnaissance de tumeurs de la peau (images en couleurs, trois valeurs R,G,B par pixel)

Autres versions avec des images plus petites et/ou en niveaux de gris

Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

"cifar10_small.csv"

Reconnaissance de petites images en couleurs dans 10 catégories Version en CSV du dataset CIFAR10

Source : https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv?select=train.csv
Vector datasets for workshop "Introduction to Geospatial Raster and Vector...
figshare.com
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Avery (2022). Vector datasets for workshop "Introduction to Geospatial Raster and Vector Data with Python" [Dataset]. http://doi.org/10.6084/m9.figshare.21273837.v1
Explore at:
application/x-sqlite3Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21273837.v1
Dataset updated
Oct 5, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ryan Avery
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cadaster data from PDOK used to illustrate the use of geopandas and shapely, geospatial python packages for manipulating vector data. The brpgewaspercelen_definitief_2020.gpkg file has been subsetted in order to make the download manageable for workshops. Other datasets are copies of those available from PDOK.
h
python-qa-instructions-dataset
huggingface.co
Updated Sep 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ketan (2023). python-qa-instructions-dataset [Dataset]. https://huggingface.co/datasets/iamketan25/python-qa-instructions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2023
Authors
Ketan
Description
iamketan25/python-qa-instructions-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
COVID-19 Data Visualization Using Python
kaggle.com
zip
Updated Apr 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adithya Wijesinghe (2023). COVID-19 Data Visualization Using Python [Dataset]. https://www.kaggle.com/datasets/adithyawijesinghe/covid-19-data
Explore at:
zip(1291081 bytes)Available download formats
Dataset updated
Apr 21, 2023
Authors
Adithya Wijesinghe
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
Data visualization using Python (Pandas, Plotly).

Data was used to visualization of the infection rate and the death rate from 01/20 to 04/22.

The data was made available on Github: https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv
h
xlcost-text-to-code
huggingface.co
Updated Nov 3, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CodeParrot (2022). xlcost-text-to-code [Dataset]. https://huggingface.co/datasets/codeparrot/xlcost-text-to-code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2022
Dataset authored and provided by
CodeParrot
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
XLCoST is a machine learning benchmark dataset that contains fine-grained parallel data in 7 commonly used programming languages (C++, Java, Python, C#, Javascript, PHP, C), and natural language (English).
DataCamp Experimental Design in Python DataSets
kaggle.com
zip
Updated Aug 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JABERI Mohamed Habib (2024). DataCamp Experimental Design in Python DataSets [Dataset]. https://www.kaggle.com/datasets/jaberimohamedhabib/experimental-design-in-python-datasets
Explore at:
zip(112825 bytes)Available download formats
Dataset updated
Aug 5, 2024
Authors
JABERI Mohamed Habib
Description
Dataset

This dataset was created by JABERI Mohamed Habib

Contents
VegeNet - Image datasets and Codes
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jo Yen Tan; Jo Yen Tan (2022). VegeNet - Image datasets and Codes [Dataset]. http://doi.org/10.5281/zenodo.7254508
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7254508
Dataset updated
Oct 27, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jo Yen Tan; Jo Yen Tan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Compilation of python codes for data preprocessing and VegeNet building, as well as image datasets (zip files).

Image datasets:

vege_original : Images of vegetables captured manually in data acquisition stage

vege_cropped_renamed : Images in (1) cropped to remove background areas and image labels renamed

non-vege images : Images of non-vegetable foods for CNN network to recognize other-than-vegetable foods

food_image_dataset : Complete set of vege (2) and non-vege (3) images for architecture building.

food_image_dataset_split : Image dataset (4) split into train and test sets

process : Images created when cropping (pre-processing step) to create dataset (2).

Facebook

Twitter

Click to copy link

Link copied

Cite

Abdelrahman Mohamed (2024). All Seaborn Built-in Datasets 📊✨ [Dataset]. https://www.kaggle.com/datasets/abdoomoh/all-seaborn-built-in-datasets

All Seaborn Built-in Datasets 📊✨

A Complete Set of Seaborn Datasets for Analysis and Visualization

Explore at:

zip(1383218 bytes)Available download formats

Dataset updated

Aug 27, 2024

Authors

Abdelrahman Mohamed

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.

Included Datasets:
- Anagrams: Analysis of word anagram patterns.
- Anscombe: Anscombe's quartet demonstrating the importance of data visualization.
- Attention: Data on attention span variations in different scenarios.
- Brain Networks: Connectivity data within brain networks.
- Car Crashes: US car crash statistics.
- Diamonds: Data on diamond properties including price, cut, and clarity.
- Dots: Randomly generated data for scatter plot visualization.
- Dow Jones: Historical records of the Dow Jones Industrial Average.
- Exercise: The relationship between exercise and health metrics.
- Flights: Monthly passenger numbers on flights.
- FMRI: Functional MRI data capturing brain activity.
- Geyser: Eruption times of the Old Faithful geyser.
- Glue: Strength of glue under different conditions.
- Health Expenditure: Health expenditure statistics across countries.
- Iris: Famous dataset for classifying Iris species.
- MPG: Miles per gallon for various vehicles.
- Penguins: Data on penguin species and their features.
- Planets: Characteristics of discovered exoplanets.
- Sea Ice: Measurements of sea ice extent.
- Taxis: Taxi trips data in a city.
- Tips: Tipping data collected from a restaurant.
- Titanic: Survival data from the Titanic disaster.

This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.

Clear search

Close search

Google apps

Main menu

All Seaborn Built-in Datasets 📊✨

python-code-dataset-500k

Dataset_Python_Question_Answer

datasets

code-search-net-python

Datasets for manuscript "A data engineering framework for chemical flow...

#PraCegoVer dataset

PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

Pandas Practice Dataset

codeparrot

datasets

Ecommerce Dataset for Data Analysis

Sample data files for Python Course

Data set python

Dataset

Contents

Images in CSV datasets

"mnist_big.csv"

"sign_mnist_big.csv"

"zalando_small.csv"

"hmnist_8_8_RGB.csv"

"cifar10_small.csv"

Vector datasets for workshop "Introduction to Geospatial Raster and Vector...

python-qa-instructions-dataset

COVID-19 Data Visualization Using Python

xlcost-text-to-code

DataCamp Experimental Design in Python DataSets

Dataset

Contents

VegeNet - Image datasets and Codes

All Seaborn Built-in Datasets 📊✨

A Complete Set of Seaborn Datasets for Analysis and Visualization