100+ datasets found

My First Data Science Project
kaggle.com
zip
Updated Aug 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NIYIBIGIRA Geredi (2022). My First Data Science Project [Dataset]. https://www.kaggle.com/datasets/niyibigirageredi/my-first-data-science-project
Explore at:
zip(156349 bytes)Available download formats
Dataset updated
Aug 24, 2022
Authors
NIYIBIGIRA Geredi
Description
Dataset

This dataset was created by NIYIBIGIRA Geredi

Contents
Data from: Data Science Projects
kaggle.com
zip
Updated Jul 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepika Ravinutala (2021). Data Science Projects [Dataset]. https://www.kaggle.com/deepikaravinutala/data-science-projects
Explore at:
zip(69455114 bytes)Available download formats
Dataset updated
Jul 6, 2021
Authors
Deepika Ravinutala
Description
Dataset

This dataset was created by Deepika Ravinutala

Contents
data science project
kaggle.com
zip
Updated Aug 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelrahman Attiea (2024). data science project [Dataset]. https://www.kaggle.com/datasets/abdelrahmanattiea/data-science-project
Explore at:
zip(23153 bytes)Available download formats
Dataset updated
Aug 18, 2024
Authors
Abdelrahman Attiea
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Abdelrahman Attiea

Released under Apache 2.0

Contents
data analysis project using python
kaggle.com
zip
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdirizak MX (2023). data analysis project using python [Dataset]. https://www.kaggle.com/datasets/abdirizakmx/data-analysis-project-using-python
Explore at:
zip(6754 bytes)Available download formats
Dataset updated
Dec 10, 2023
Authors
Abdirizak MX
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Abdirizak MX

Released under CC0: Public Domain

Contents
Insurance Dataset Based on Real-World Statistics
kaggle.com
zip
Updated Jan 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SamiAlyasin (2025). Insurance Dataset Based on Real-World Statistics [Dataset]. https://www.kaggle.com/datasets/samialyasin/insurance-data-personal-auto-line-of-business
Explore at:
zip(157388 bytes)Available download formats
Dataset updated
Jan 19, 2025
Authors
SamiAlyasin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
World
Description
This dataset is a synthetic yet realistic representation of personal auto insurance data, crafted using real-world statistics. While actual insurance data is sensitive and unavailable for public use, this dataset bridges the gap by offering a safe and practical alternative for building robust data science projects.

Why This Dataset? - Realistic Foundation: Synthetic data generated from real-world statistical patterns ensures practical relevance. - Safe for Use: No personal or sensitive information—completely anonymized and compliant with data privacy standards. - Flexible Applications: Ideal for testing models, developing prototypes, and showcasing portfolio projects.

How You Can Use It: - Build machine learning models for predicting customer conversion and retention. - Design risk assessment tools or premium optimization algorithms. - Create dashboards to visualize trends in customer segmentation and policy data. - Explore innovative solutions for the insurance industry using a realistic data foundation.

This dataset empowers you to work on real-world insurance scenarios without compromising on data sensitivity.
Data from: Towards Data Science
kaggle.com
zip
Updated Dec 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Hötter (2021). Towards Data Science [Dataset]. https://www.kaggle.com/datasets/johoetter/towards-data-science
Explore at:
zip(2751545 bytes)Available download formats
Dataset updated
Dec 13, 2021
Authors
Johannes Hötter
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

As a Data Scientist, you most likely at some point already have heard of TDS. It is an amazing publication about lots of AI-related topics, providing Hands-On project expertise, interesting framework and technology discussions and the theory behind hundreds of algorithms.

Content

I scraped the archive of TDS from 2018 until 2021 to collect the titles, taglines, urls and date of (almost) every article in that year). You can apply various techniques on this data, such as for instance topic modeling.

If needed, I can also continue labeling this dataset. Just drop me a note what you'd be interested in, and I'll add labels to this dataset.

Acknowledgements

Of course, special thanks to Towards Data Science and its editors for providing such great content on their publication. Reading such articles is always a great start into the day for me 😁

Inspiration

Think about ways to make sense of this data. What kind of articles have been published the most? What are the topics of the respective years or months?

Tip: You might also want to think about how you can enrich this data? There are many ways to do so!
Healthcare Dataset
kaggle.com
zip
Updated May 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Patil (2024). Healthcare Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/healthcare-dataset
Explore at:
zip(3054550 bytes)Available download formats
Dataset updated
May 8, 2024
Authors
Prasad Patil
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context:

This synthetic healthcare dataset has been created to serve as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practice, develop, and showcase their data manipulation and analysis skills in the context of the healthcare industry.

Inspiration:

The inspiration behind this dataset is rooted in the need for practical and diverse healthcare data for educational and research purposes. Healthcare data is often sensitive and subject to privacy regulations, making it challenging to access for learning and experimentation. To address this gap, I have leveraged Python's Faker library to generate a dataset that mirrors the structure and attributes commonly found in healthcare records. By providing this synthetic data, I hope to foster innovation, learning, and knowledge sharing in the healthcare analytics domain.

Dataset Information:

Each column provides specific information about the patient, their admission, and the healthcare services provided, making this dataset suitable for various data analysis and modeling tasks in the healthcare domain. Here's a brief explanation of each column in the dataset - - Name: This column represents the name of the patient associated with the healthcare record. - Age: The age of the patient at the time of admission, expressed in years. - Gender: Indicates the gender of the patient, either "Male" or "Female." - Blood Type: The patient's blood type, which can be one of the common blood types (e.g., "A+", "O-", etc.). - Medical Condition: This column specifies the primary medical condition or diagnosis associated with the patient, such as "Diabetes," "Hypertension," "Asthma," and more. - Date of Admission: The date on which the patient was admitted to the healthcare facility. - Doctor: The name of the doctor responsible for the patient's care during their admission. - Hospital: Identifies the healthcare facility or hospital where the patient was admitted. - Insurance Provider: This column indicates the patient's insurance provider, which can be one of several options, including "Aetna," "Blue Cross," "Cigna," "UnitedHealthcare," and "Medicare." - Billing Amount: The amount of money billed for the patient's healthcare services during their admission. This is expressed as a floating-point number. - Room Number: The room number where the patient was accommodated during their admission. - Admission Type: Specifies the type of admission, which can be "Emergency," "Elective," or "Urgent," reflecting the circumstances of the admission. - Discharge Date: The date on which the patient was discharged from the healthcare facility, based on the admission date and a random number of days within a realistic range. - Medication: Identifies a medication prescribed or administered to the patient during their admission. Examples include "Aspirin," "Ibuprofen," "Penicillin," "Paracetamol," and "Lipitor." - Test Results: Describes the results of a medical test conducted during the patient's admission. Possible values include "Normal," "Abnormal," or "Inconclusive," indicating the outcome of the test.

Usage Scenarios:

This dataset can be utilized for a wide range of purposes, including: - Developing and testing healthcare predictive models. - Practicing data cleaning, transformation, and analysis techniques. - Creating data visualizations to gain insights into healthcare trends. - Learning and teaching data science and machine learning concepts in a healthcare context. - You can treat it as a Multi-Class Classification Problem and solve it for Test Results which contains 3 categories(Normal, Abnormal, and Inconclusive).

Acknowledgments:

I acknowledge the importance of healthcare data privacy and security and emphasize that this dataset is entirely synthetic. It does not contain any real patient information or violate any privacy regulations.

I hope that this dataset contributes to the advancement of data science and healthcare analytics and inspires new ideas. Feel free to explore, analyze, and share your findings with the Kaggle community.

Image Credit:

Image by BC Y from Pixabay
Materials Platform for Data Science (MPDS) Dataset
kaggle.com
zip
Updated Dec 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata (2025). Materials Platform for Data Science (MPDS) Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/materials-platform-for-data-science
Explore at:
zip(164193 bytes)Available download formats
Dataset updated
Dec 26, 2025
Authors
Unidata
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Materials Platform for data science

Dataset includes 405,100 publications, 139,005 phase diagrams, 409,771 crystalline nanostructures, 1,075,676 physical property sets, and 189,682 material phases. It integrates decades of scientific research on inorganic materials, enabling computational materials design, machine learning applications, and materials informatics studies across industry and academia.

Built on data extracted from about half a million peer-reviewed scientific publications, it offers standardized data, detailed chemical structures, crystal structures, and extensive metadata on various materials. - Get the data

The dataset helps researchers and engineers advance scientific discovery, predicting materials behavior, and accelerating materials innovation through data-driven research.

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

It allows researchers and engineers to explore computational chemistry, develop machine learning models for predicting materials behaviors. By combining raw data, experimental records, and computational analyses, MPDS helps scientists and materials experts design new compounds, identify similar materials, and optimize materials properties for engineering applications.

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects
Kalla Data Science mini project
kaggle.com
zip
Updated Jan 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asraf28 (2024). Kalla Data Science mini project [Dataset]. https://www.kaggle.com/datasets/asraf28/kalla-data-science-mini-project
Explore at:
zip(95316 bytes)Available download formats
Dataset updated
Jan 13, 2024
Authors
Asraf28
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Asraf28

Released under Apache 2.0

Contents
Ken Jee YouTube Data
kaggle.com
zip
Updated Jan 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ken Jee (2022). Ken Jee YouTube Data [Dataset]. https://www.kaggle.com/datasets/kenjee/ken-jee-youtube-data
Explore at:
zip(6556461 bytes)Available download formats
Dataset updated
Jan 22, 2022
Authors
Ken Jee
Area covered
YouTube
Description
Context

I've been creating videos on YouTube since November of 2017 (https://www.youtube.com/c/KenJee1) with the mission of making data science accessible to more people. One of the best ways to do this is to tell stories and working on projects. This is my attempt at my first community project. I am making my YouTube data available for everyone to help better understand the growth of my YouTube community and think about ways that it could be improved! I would love for everyone in the community feel like they had some hand in contributing to the channel.

Announcement Video: https://youtu.be/YPph59-rTxA

I will be sharing my favorite projects in a few of my videos (with permission of course), and would also like to give away a few small prizes to the top featured notebooks. I hope you have fun with the analysis, I'm interested in seeing what you find in the data!

For those looking for a place to start, some things I'm thinking about are: - What are the themes of the comment data? - What types of video titles and thumbnails drive the most traffic? - Who is my core audience and what are they interested in? - What types of videos have lead to the most growth? - What type of content are people engaging with the most or watching the longest?

Some advanced projects could be: - Creating a chat bot to respond to common comments with videos where I have addressed a topic - Pulling sentiment from thumbnails and titles and comparing that with performance

Data I would like to add over time - Video descriptions - Video subtitles - Actual video data

Content

There are four files in this repo. The relevant data included in most of them is from Nov 2017 - Jan 2022. I gathered some of this data via the YouTube API and the rest from my specific analytics.

1) Aggregated Metrics By Video - This has all the topline metrics from my channel from its start (around 2015 to Jan 22 2022). I didn't post my first video until around 2) Aggregated Metrics By Video with Country and Subscriber Status - This has the same data as aggregated metrics by video, but it includes dimensions for which country people are viewing from and if the viewers are subscribed to the channel or not. 3) Video Performance Over Time - This has the daily data from each of my videos. 4) All Comments - This is all of my comment data gathered from the YouTube API. I have anonymized the users so don't worry about your name showing up!

Acknowledgements

This obviously wouldn't be possible without all of the wonderful people who watch and interact with my videos! I'm incredibly grateful for you all and I'm so happy I can share this project with you!

License

I collected this data from the YouTube API and through my own google analytics. Thus use of it must uphold the YouTube API's terms of service: https://developers.google.com/youtube/terms/api-services-terms-of-service
Python IPL Data Project
kaggle.com
zip
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pawan Kumar (2023). Python IPL Data Project [Dataset]. https://www.kaggle.com/datasets/pawankumar19/python-ipl-data-project
Explore at:
zip(161013 bytes)Available download formats
Dataset updated
Jan 27, 2023
Authors
Pawan Kumar
Description
Dataset

This dataset was created by Pawan Kumar

Contents
Glassdoor.com - Data Scientist Salary Dataset
kaggle.com
zip
Updated May 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fredeys (2024). Glassdoor.com - Data Scientist Salary Dataset [Dataset]. https://www.kaggle.com/datasets/fredeys/glassdoor-data-scientist-salary-dataset
Explore at:
zip(1563890 bytes)Available download formats
Dataset updated
May 22, 2024
Authors
Fredeys
Description
Glassdoor Job Listings Dataset

This dataset has been scraped from Glassdoor.com and contains comprehensive information about job offers. It is designed for those looking to analyze job market trends, salary estimates, company ratings, and other relevant job-related data.

Dataset Information

This dataset includes a collection of 1000 job listings, providing a wide range of details for each job offer. It is freely accessible and can be used for various analytical purposes, including salary analysis, job trend research, and company evaluation.

Columns in this dataset:

Job Title: The title of the job position.

Salary Estimate: The estimated salary range for the job.

Job Description: A detailed description of the job duties and requirements.

Rating: The overall rating of the company based on employee reviews.

Company Name: The name of the company offering the job.

Location: The location of the job position.

Size: The size of the company (number of employees).

Founded: The year the company was founded.

Type of Ownership: The ownership type of the company (e.g., Public, Private).

Industry: The industry to which the company belongs.

Sector: The broader sector encompassing the industry.

Revenue: The annual revenue of the company.

Usage

Feel free to use this dataset for your analysis and projects. Whether you are studying salary trends, job market patterns, or company ratings, this dataset provides a rich source of information to support your work.
Data from: python projects
kaggle.com
zip
Updated Mar 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keval joshi (2021). python projects [Dataset]. https://www.kaggle.com/kevaljoshi95/python-projects
Explore at:
zip(2741790 bytes)Available download formats
Dataset updated
Mar 13, 2021
Authors
Keval joshi
Description
Dataset

This dataset was created by Keval joshi

Contents
Meta Kaggle
kaggle.com
zip
Updated Feb 1, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2026). Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle
Explore at:
zip(10313419305 bytes)Available download formats
Dataset updated
Feb 1, 2026
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Meta Kaggle

Explore our public data on competitions, datasets, kernels (code / notebooks) and more

Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

https://imgur.com/2Egeb8R.png" alt="Kaggle Leaderboard Performance">

This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

August 2023 update

In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here: https://www.kaggle.com/datasets/kaggle/meta-kaggle-code

We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.
Social Media and Mental Health
kaggle.com
zip
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SouvikAhmed071 (2023). Social Media and Mental Health [Dataset]. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
Explore at:
zip(10944 bytes)Available download formats
Dataset updated
Jul 18, 2023
Authors
SouvikAhmed071
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.

This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.

The following is the Google Colab link to the project, done on Jupyter Notebook -

https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN

The following is the GitHub Repository of the project -

https://github.com/daerkns/social-media-and-mental-health

Libraries used for the Project -

Pandas Numpy Matplotlib Seaborn Sci-kit Learn
Materials and their Mechanical Properties
kaggle.com
zip
Updated Apr 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Purushottam Nawale (2023). Materials and their Mechanical Properties [Dataset]. https://www.kaggle.com/datasets/purushottamnawale/materials
Explore at:
zip(145487 bytes)Available download formats
Dataset updated
Apr 15, 2023
Authors
Purushottam Nawale
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
We utilized a dataset of Machine Design materials, which includes information on their mechanical properties. The dataset was obtained from the Autodesk Material Library and comprises 15 columns, also referred to as features/attributes. This dataset is a real-world dataset, and it does not contain any random values. However, due to missing values, we only utilized seven of these columns for our ML model. You can access the related GitHub Repository here: https://github.com/purushottamnawale/material-selection-using-machine-learning

To develop a ML model, we employed several Python libraries, including NumPy, pandas, scikit-learn, and graphviz, in addition to other technologies such as Weka, MS Excel, VS Code, Kaggle, Jupyter Notebook, and GitHub. We employed Weka software to swiftly visualize the data and comprehend the relationships between the features, without requiring any programming expertise.

My Problem statement is Material Selection for EV Chassis. So, if you have any specific ideas, be sure to implement them and add the codes on Kaggle.

A Detailed Research Paper is available on https://iopscience.iop.org/article/10.1088/1742-6596/2601/1/012014
Data from: python projects
kaggle.com
zip
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shekhar Parcha (2023). python projects [Dataset]. https://www.kaggle.com/datasets/shekharparcha/python-projects
Explore at:
zip(121494 bytes)Available download formats
Dataset updated
Jun 12, 2023
Authors
Shekhar Parcha
Description
Dataset

This dataset was created by Shekhar Parcha

Contents
Project Python- Data Cleaning - EDA- Visualization
kaggle.com
zip
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hussein Al Chami (2023). Project Python- Data Cleaning - EDA- Visualization [Dataset]. https://www.kaggle.com/datasets/husseinalchami/project-python-data-cleaning-eda-visualization
Explore at:
zip(322085 bytes)Available download formats
Dataset updated
Dec 10, 2023
Authors
Hussein Al Chami
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Hussein Al Chami

Released under MIT

Contents
Material science
kaggle.com
zip
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allanatrix (2025). Material science [Dataset]. https://www.kaggle.com/datasets/allanwandia/material-science
Explore at:
zip(5034116 bytes)Available download formats
Dataset updated
Mar 11, 2025
Authors
Allanatrix
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset provides a comprehensive collection of computed properties for a wide range of materials, sourced from the Materials Project database. Each entry represents a unique material, identified by a material_id, and includes detailed information about its chemical composition and physical properties. These properties are calculated using density functional theory (DFT), a widely used computational method in materials science for predicting material behavior. The dataset is ideal for researchers, data scientists, and machine learning practitioners interested in materials discovery, property prediction, and exploratory analysis.
World Bank Indicators (1960‑Present)
kaggle.com
zip
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George DiNicola (2025). World Bank Indicators (1960‑Present) [Dataset]. https://www.kaggle.com/datasets/georgejdinicola/world-bank-indicators
Explore at:
zip(52559856 bytes)Available download formats
Dataset updated
May 29, 2025
Authors
George DiNicola
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Overview

This dataset provides a comprehensive collection of time series data sourced from the World Bank Open Data Platform, covering a wide range of global indicators from 1960 to the most recently published year. It includes economic, social, environmental, and demographic metrics, making it an ideal resource for researchers, data scientists, and policymakers interested in global development trends, economic forecasting, or socio-economic analysis.

A tutorial on how to combined the dataset topics together into one large dataset can be found here

Why this Dataset?

My motivation for this project was to curate a high-quality collection of datasets for World Bank indicators organized by topics and structured in time-series, making them more accessible for data science projects. Since the World Bank’s Kaggle datasets have not been updated since 2019 https://www.kaggle.com/organizations/theworldbank, I saw an opportunity to provide more current data for the data analysis community.

Dataset Collection Contents

This collection brings together more than 800 World Bank indicators organized into 18 topic‑specific CSV files. Each file is structured as a country‑year panel: every row represents a unique combination of year (1960‑present) and ISO‑3 country code, while the columns hold the topic’s indicators.

The collection includes datasets with a variety of indicators, such as: - Economic Metrics: GDP growth (%), GDP per capita, consumer price inflation, merchandise trade, gross capital formation, and more.
- Social Metrics: School enrollment (primary, secondary, tertiary), infant mortality rate, maternal mortality rate, poverty headcount, and more.
- Environmental Metrics: Forest area, renewable energy consumption, food production indices, and more.
- Demographic Metrics: Urban population, life expectancy, net migration, and more.

Usage

This dataset is ideal for a variety of applications, including: - Economic forecasting and trend analysis (e.g., GDP growth, inflation).
- Socio-economic studies (e.g., education, health, poverty).
- Environmental impact analysis (e.g., renewable energy adoption).
- Demographic research (e.g., population trends, migration).

Topic datasets can be merged with each other using year and country code. This tutorial with notebook code can help you get started quickly.

Collection Methodology

The data is collected via a custom software application that discovers and groups high-quality indicators with rules-based logic & artificial intelligence, generates metadata, and performs ETL for the data from the World Bank API. The result is a clean, up‑to‑date collection of World Bank indicators in time-series format that is ready for analysis—no manual downloads or data wrangling required.

Modifications

The original World Bank data has been aggregated and transformed for ease of use. Missing values have been preserved as provided by the World Bank, and no significant transformations have been applied beyond formatting and aggregation into a single file.

Source & Attribution

The World Bank: World Development Indicators

This dataset is publicly available and sourced from the World Bank Open Data Platform and is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. When using this data, please attribute the World Bank as follows: "Data sourced from the World Bank, licensed under CC BY 4.0." For more details on the World Bank’s terms of use, visit: https://www.worldbank.org/en/about/legal/terms-of-use-for-datasets.

License

This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Feel free to use this data in Kaggle notebooks, academic research, or policy analysis. If you create a derived dataset or analysis, I encourage you to share it with the Kaggle community.

Facebook

Twitter

Click to copy link

Link copied

Cite

NIYIBIGIRA Geredi (2022). My First Data Science Project [Dataset]. https://www.kaggle.com/datasets/niyibigirageredi/my-first-data-science-project

My First Data Science Project

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

zip(156349 bytes)Available download formats

Dataset updated

Aug 24, 2022

Authors

NIYIBIGIRA Geredi

Description

Dataset

This dataset was created by NIYIBIGIRA Geredi

Clear search

Close search

Google apps

Main menu

My First Data Science Project

Dataset

Contents

Data from: Data Science Projects

Dataset

Contents

data science project

Dataset

Contents

data analysis project using python

Dataset

Contents

Insurance Dataset Based on Real-World Statistics

Data from: Towards Data Science

Context

Content

Acknowledgements

Inspiration

Healthcare Dataset

Context:

Inspiration:

Dataset Information:

Usage Scenarios:

Acknowledgments:

Image Credit:

Materials Platform for Data Science (MPDS) Dataset

Materials Platform for data science

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

Kalla Data Science mini project

Dataset

Contents

Ken Jee YouTube Data

Context

Content

Acknowledgements

License

Python IPL Data Project

Dataset

Contents

Glassdoor.com - Data Scientist Salary Dataset

Glassdoor Job Listings Dataset

Dataset Information

Columns in this dataset:

Usage

Data from: python projects

Dataset

Contents

Meta Kaggle

Meta Kaggle

Explore our public data on competitions, datasets, kernels (code / notebooks) and more

August 2023 update

Social Media and Mental Health

Materials and their Mechanical Properties

Data from: python projects

Dataset

Contents

Project Python- Data Cleaning - EDA- Visualization

Dataset

Contents

Material science

World Bank Indicators (1960‑Present)

Overview

Why this Dataset?

Dataset Collection Contents

Usage

Collection Methodology

Modifications

Source & Attribution

License

My First Data Science Project

Dataset

Contents

`Context:`

`Inspiration:`

`Dataset Information:`

`Usage Scenarios:`

`Acknowledgments:`

`Image Credit:`