100+ datasets found
  1. My First Data Science Project

    • kaggle.com
    zip
    Updated Aug 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NIYIBIGIRA Geredi (2022). My First Data Science Project [Dataset]. https://www.kaggle.com/datasets/niyibigirageredi/my-first-data-science-project
    Explore at:
    zip(156349 bytes)Available download formats
    Dataset updated
    Aug 24, 2022
    Authors
    NIYIBIGIRA Geredi
    Description

    Dataset

    This dataset was created by NIYIBIGIRA Geredi

    Contents

  2. Data from: Data Science Projects

    • kaggle.com
    zip
    Updated Jul 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepika Ravinutala (2021). Data Science Projects [Dataset]. https://www.kaggle.com/deepikaravinutala/data-science-projects
    Explore at:
    zip(69455114 bytes)Available download formats
    Dataset updated
    Jul 6, 2021
    Authors
    Deepika Ravinutala
    Description

    Dataset

    This dataset was created by Deepika Ravinutala

    Contents

  3. data science project

    • kaggle.com
    zip
    Updated Aug 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelrahman Attiea (2024). data science project [Dataset]. https://www.kaggle.com/datasets/abdelrahmanattiea/data-science-project
    Explore at:
    zip(23153 bytes)Available download formats
    Dataset updated
    Aug 18, 2024
    Authors
    Abdelrahman Attiea
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Abdelrahman Attiea

    Released under Apache 2.0

    Contents

  4. data analysis project using python

    • kaggle.com
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdirizak MX (2023). data analysis project using python [Dataset]. https://www.kaggle.com/datasets/abdirizakmx/data-analysis-project-using-python
    Explore at:
    zip(6754 bytes)Available download formats
    Dataset updated
    Dec 10, 2023
    Authors
    Abdirizak MX
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Abdirizak MX

    Released under CC0: Public Domain

    Contents

  5. Insurance Dataset Based on Real-World Statistics

    • kaggle.com
    zip
    Updated Jan 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SamiAlyasin (2025). Insurance Dataset Based on Real-World Statistics [Dataset]. https://www.kaggle.com/datasets/samialyasin/insurance-data-personal-auto-line-of-business
    Explore at:
    zip(157388 bytes)Available download formats
    Dataset updated
    Jan 19, 2025
    Authors
    SamiAlyasin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    World
    Description

    This dataset is a synthetic yet realistic representation of personal auto insurance data, crafted using real-world statistics. While actual insurance data is sensitive and unavailable for public use, this dataset bridges the gap by offering a safe and practical alternative for building robust data science projects.

    Why This Dataset? - Realistic Foundation: Synthetic data generated from real-world statistical patterns ensures practical relevance. - Safe for Use: No personal or sensitive information—completely anonymized and compliant with data privacy standards. - Flexible Applications: Ideal for testing models, developing prototypes, and showcasing portfolio projects.

    How You Can Use It: - Build machine learning models for predicting customer conversion and retention. - Design risk assessment tools or premium optimization algorithms. - Create dashboards to visualize trends in customer segmentation and policy data. - Explore innovative solutions for the insurance industry using a realistic data foundation.

    This dataset empowers you to work on real-world insurance scenarios without compromising on data sensitivity.

  6. Data from: Towards Data Science

    • kaggle.com
    zip
    Updated Dec 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Hötter (2021). Towards Data Science [Dataset]. https://www.kaggle.com/datasets/johoetter/towards-data-science
    Explore at:
    zip(2751545 bytes)Available download formats
    Dataset updated
    Dec 13, 2021
    Authors
    Johannes Hötter
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    As a Data Scientist, you most likely at some point already have heard of TDS. It is an amazing publication about lots of AI-related topics, providing Hands-On project expertise, interesting framework and technology discussions and the theory behind hundreds of algorithms.

    Content

    I scraped the archive of TDS from 2018 until 2021 to collect the titles, taglines, urls and date of (almost) every article in that year). You can apply various techniques on this data, such as for instance topic modeling.

    If needed, I can also continue labeling this dataset. Just drop me a note what you'd be interested in, and I'll add labels to this dataset.

    Acknowledgements

    Of course, special thanks to Towards Data Science and its editors for providing such great content on their publication. Reading such articles is always a great start into the day for me 😁

    Inspiration

    Think about ways to make sense of this data. What kind of articles have been published the most? What are the topics of the respective years or months?

    Tip: You might also want to think about how you can enrich this data? There are many ways to do so!

  7. Healthcare Dataset

    • kaggle.com
    zip
    Updated May 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). Healthcare Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/healthcare-dataset
    Explore at:
    zip(3054550 bytes)Available download formats
    Dataset updated
    May 8, 2024
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context:

    This synthetic healthcare dataset has been created to serve as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practice, develop, and showcase their data manipulation and analysis skills in the context of the healthcare industry.

    Inspiration:

    The inspiration behind this dataset is rooted in the need for practical and diverse healthcare data for educational and research purposes. Healthcare data is often sensitive and subject to privacy regulations, making it challenging to access for learning and experimentation. To address this gap, I have leveraged Python's Faker library to generate a dataset that mirrors the structure and attributes commonly found in healthcare records. By providing this synthetic data, I hope to foster innovation, learning, and knowledge sharing in the healthcare analytics domain.

    Dataset Information:

    Each column provides specific information about the patient, their admission, and the healthcare services provided, making this dataset suitable for various data analysis and modeling tasks in the healthcare domain. Here's a brief explanation of each column in the dataset - - Name: This column represents the name of the patient associated with the healthcare record. - Age: The age of the patient at the time of admission, expressed in years. - Gender: Indicates the gender of the patient, either "Male" or "Female." - Blood Type: The patient's blood type, which can be one of the common blood types (e.g., "A+", "O-", etc.). - Medical Condition: This column specifies the primary medical condition or diagnosis associated with the patient, such as "Diabetes," "Hypertension," "Asthma," and more. - Date of Admission: The date on which the patient was admitted to the healthcare facility. - Doctor: The name of the doctor responsible for the patient's care during their admission. - Hospital: Identifies the healthcare facility or hospital where the patient was admitted. - Insurance Provider: This column indicates the patient's insurance provider, which can be one of several options, including "Aetna," "Blue Cross," "Cigna," "UnitedHealthcare," and "Medicare." - Billing Amount: The amount of money billed for the patient's healthcare services during their admission. This is expressed as a floating-point number. - Room Number: The room number where the patient was accommodated during their admission. - Admission Type: Specifies the type of admission, which can be "Emergency," "Elective," or "Urgent," reflecting the circumstances of the admission. - Discharge Date: The date on which the patient was discharged from the healthcare facility, based on the admission date and a random number of days within a realistic range. - Medication: Identifies a medication prescribed or administered to the patient during their admission. Examples include "Aspirin," "Ibuprofen," "Penicillin," "Paracetamol," and "Lipitor." - Test Results: Describes the results of a medical test conducted during the patient's admission. Possible values include "Normal," "Abnormal," or "Inconclusive," indicating the outcome of the test.

    Usage Scenarios:

    This dataset can be utilized for a wide range of purposes, including: - Developing and testing healthcare predictive models. - Practicing data cleaning, transformation, and analysis techniques. - Creating data visualizations to gain insights into healthcare trends. - Learning and teaching data science and machine learning concepts in a healthcare context. - You can treat it as a Multi-Class Classification Problem and solve it for Test Results which contains 3 categories(Normal, Abnormal, and Inconclusive).

    Acknowledgments:

    • I acknowledge the importance of healthcare data privacy and security and emphasize that this dataset is entirely synthetic. It does not contain any real patient information or violate any privacy regulations.
    • I hope that this dataset contributes to the advancement of data science and healthcare analytics and inspires new ideas. Feel free to explore, analyze, and share your findings with the Kaggle community.

    Image Credit:

    Image by BC Y from Pixabay

  8. Materials Platform for Data Science (MPDS) Dataset

    • kaggle.com
    zip
    Updated Dec 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). Materials Platform for Data Science (MPDS) Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/materials-platform-for-data-science
    Explore at:
    zip(164193 bytes)Available download formats
    Dataset updated
    Dec 26, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Materials Platform for data science

    Dataset includes 405,100 publications, 139,005 phase diagrams, 409,771 crystalline nanostructures, 1,075,676 physical property sets, and 189,682 material phases. It integrates decades of scientific research on inorganic materials, enabling computational materials design, machine learning applications, and materials informatics studies across industry and academia.

    Built on data extracted from about half a million peer-reviewed scientific publications, it offers standardized data, detailed chemical structures, crystal structures, and extensive metadata on various materials. - Get the data

    The dataset helps researchers and engineers advance scientific discovery, predicting materials behavior, and accelerating materials innovation through data-driven research.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    It allows researchers and engineers to explore computational chemistry, develop machine learning models for predicting materials behaviors. By combining raw data, experimental records, and computational analyses, MPDS helps scientists and materials experts design new compounds, identify similar materials, and optimize materials properties for engineering applications.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  9. Kalla Data Science mini project

    • kaggle.com
    zip
    Updated Jan 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asraf28 (2024). Kalla Data Science mini project [Dataset]. https://www.kaggle.com/datasets/asraf28/kalla-data-science-mini-project
    Explore at:
    zip(95316 bytes)Available download formats
    Dataset updated
    Jan 13, 2024
    Authors
    Asraf28
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Asraf28

    Released under Apache 2.0

    Contents

  10. Ken Jee YouTube Data

    • kaggle.com
    zip
    Updated Jan 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ken Jee (2022). Ken Jee YouTube Data [Dataset]. https://www.kaggle.com/datasets/kenjee/ken-jee-youtube-data
    Explore at:
    zip(6556461 bytes)Available download formats
    Dataset updated
    Jan 22, 2022
    Authors
    Ken Jee
    Area covered
    YouTube
    Description

    Context

    I've been creating videos on YouTube since November of 2017 (https://www.youtube.com/c/KenJee1) with the mission of making data science accessible to more people. One of the best ways to do this is to tell stories and working on projects. This is my attempt at my first community project. I am making my YouTube data available for everyone to help better understand the growth of my YouTube community and think about ways that it could be improved! I would love for everyone in the community feel like they had some hand in contributing to the channel.

    Announcement Video: https://youtu.be/YPph59-rTxA

    I will be sharing my favorite projects in a few of my videos (with permission of course), and would also like to give away a few small prizes to the top featured notebooks. I hope you have fun with the analysis, I'm interested in seeing what you find in the data!

    For those looking for a place to start, some things I'm thinking about are: - What are the themes of the comment data? - What types of video titles and thumbnails drive the most traffic? - Who is my core audience and what are they interested in? - What types of videos have lead to the most growth? - What type of content are people engaging with the most or watching the longest?

    Some advanced projects could be: - Creating a chat bot to respond to common comments with videos where I have addressed a topic - Pulling sentiment from thumbnails and titles and comparing that with performance

    Data I would like to add over time - Video descriptions - Video subtitles - Actual video data

    Content

    There are four files in this repo. The relevant data included in most of them is from Nov 2017 - Jan 2022. I gathered some of this data via the YouTube API and the rest from my specific analytics.

    1) Aggregated Metrics By Video - This has all the topline metrics from my channel from its start (around 2015 to Jan 22 2022). I didn't post my first video until around 2) Aggregated Metrics By Video with Country and Subscriber Status - This has the same data as aggregated metrics by video, but it includes dimensions for which country people are viewing from and if the viewers are subscribed to the channel or not. 3) Video Performance Over Time - This has the daily data from each of my videos. 4) All Comments - This is all of my comment data gathered from the YouTube API. I have anonymized the users so don't worry about your name showing up!

    Acknowledgements

    This obviously wouldn't be possible without all of the wonderful people who watch and interact with my videos! I'm incredibly grateful for you all and I'm so happy I can share this project with you!

    License

    I collected this data from the YouTube API and through my own google analytics. Thus use of it must uphold the YouTube API's terms of service: https://developers.google.com/youtube/terms/api-services-terms-of-service

  11. Python IPL Data Project

    • kaggle.com
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pawan Kumar (2023). Python IPL Data Project [Dataset]. https://www.kaggle.com/datasets/pawankumar19/python-ipl-data-project
    Explore at:
    zip(161013 bytes)Available download formats
    Dataset updated
    Jan 27, 2023
    Authors
    Pawan Kumar
    Description

    Dataset

    This dataset was created by Pawan Kumar

    Contents

  12. Glassdoor.com - Data Scientist Salary Dataset

    • kaggle.com
    zip
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fredeys (2024). Glassdoor.com - Data Scientist Salary Dataset [Dataset]. https://www.kaggle.com/datasets/fredeys/glassdoor-data-scientist-salary-dataset
    Explore at:
    zip(1563890 bytes)Available download formats
    Dataset updated
    May 22, 2024
    Authors
    Fredeys
    Description

    Glassdoor Job Listings Dataset

    This dataset has been scraped from Glassdoor.com and contains comprehensive information about job offers. It is designed for those looking to analyze job market trends, salary estimates, company ratings, and other relevant job-related data.

    Dataset Information

    This dataset includes a collection of 1000 job listings, providing a wide range of details for each job offer. It is freely accessible and can be used for various analytical purposes, including salary analysis, job trend research, and company evaluation.

    Columns in this dataset:

    1. Job Title: The title of the job position.
    2. Salary Estimate: The estimated salary range for the job.
    3. Job Description: A detailed description of the job duties and requirements.
    4. Rating: The overall rating of the company based on employee reviews.
    5. Company Name: The name of the company offering the job.
    6. Location: The location of the job position.
    7. Size: The size of the company (number of employees).
    8. Founded: The year the company was founded.
    9. Type of Ownership: The ownership type of the company (e.g., Public, Private).
    10. Industry: The industry to which the company belongs.
    11. Sector: The broader sector encompassing the industry.
    12. Revenue: The annual revenue of the company.

    Usage

    Feel free to use this dataset for your analysis and projects. Whether you are studying salary trends, job market patterns, or company ratings, this dataset provides a rich source of information to support your work.

  13. Data from: python projects

    • kaggle.com
    zip
    Updated Mar 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keval joshi (2021). python projects [Dataset]. https://www.kaggle.com/kevaljoshi95/python-projects
    Explore at:
    zip(2741790 bytes)Available download formats
    Dataset updated
    Mar 13, 2021
    Authors
    Keval joshi
    Description

    Dataset

    This dataset was created by Keval joshi

    Contents

  14. Meta Kaggle

    • kaggle.com
    zip
    Updated Feb 1, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2026). Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle
    Explore at:
    zip(10313419305 bytes)Available download formats
    Dataset updated
    Feb 1, 2026
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Meta Kaggle

    Explore our public data on competitions, datasets, kernels (code / notebooks) and more

    Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

    Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

    https://imgur.com/2Egeb8R.png" alt="Kaggle Leaderboard Performance">

    This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

    Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

    August 2023 update

    In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here: https://www.kaggle.com/datasets/kaggle/meta-kaggle-code

    We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.

  15. Social Media and Mental Health

    • kaggle.com
    zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SouvikAhmed071 (2023). Social Media and Mental Health [Dataset]. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
    Explore at:
    zip(10944 bytes)Available download formats
    Dataset updated
    Jul 18, 2023
    Authors
    SouvikAhmed071
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

    The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.

    This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.

    The following is the Google Colab link to the project, done on Jupyter Notebook -

    https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN

    The following is the GitHub Repository of the project -

    https://github.com/daerkns/social-media-and-mental-health

    Libraries used for the Project -

    Pandas
    Numpy
    Matplotlib
    Seaborn
    Sci-kit Learn
    
  16. Materials and their Mechanical Properties

    • kaggle.com
    zip
    Updated Apr 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Purushottam Nawale (2023). Materials and their Mechanical Properties [Dataset]. https://www.kaggle.com/datasets/purushottamnawale/materials
    Explore at:
    zip(145487 bytes)Available download formats
    Dataset updated
    Apr 15, 2023
    Authors
    Purushottam Nawale
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    We utilized a dataset of Machine Design materials, which includes information on their mechanical properties. The dataset was obtained from the Autodesk Material Library and comprises 15 columns, also referred to as features/attributes. This dataset is a real-world dataset, and it does not contain any random values. However, due to missing values, we only utilized seven of these columns for our ML model. You can access the related GitHub Repository here: https://github.com/purushottamnawale/material-selection-using-machine-learning

    To develop a ML model, we employed several Python libraries, including NumPy, pandas, scikit-learn, and graphviz, in addition to other technologies such as Weka, MS Excel, VS Code, Kaggle, Jupyter Notebook, and GitHub. We employed Weka software to swiftly visualize the data and comprehend the relationships between the features, without requiring any programming expertise.

    My Problem statement is Material Selection for EV Chassis. So, if you have any specific ideas, be sure to implement them and add the codes on Kaggle.

    A Detailed Research Paper is available on https://iopscience.iop.org/article/10.1088/1742-6596/2601/1/012014

  17. Data from: python projects

    • kaggle.com
    zip
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shekhar Parcha (2023). python projects [Dataset]. https://www.kaggle.com/datasets/shekharparcha/python-projects
    Explore at:
    zip(121494 bytes)Available download formats
    Dataset updated
    Jun 12, 2023
    Authors
    Shekhar Parcha
    Description

    Dataset

    This dataset was created by Shekhar Parcha

    Contents

  18. Project Python- Data Cleaning - EDA- Visualization

    • kaggle.com
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hussein Al Chami (2023). Project Python- Data Cleaning - EDA- Visualization [Dataset]. https://www.kaggle.com/datasets/husseinalchami/project-python-data-cleaning-eda-visualization
    Explore at:
    zip(322085 bytes)Available download formats
    Dataset updated
    Dec 10, 2023
    Authors
    Hussein Al Chami
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Hussein Al Chami

    Released under MIT

    Contents

  19. Material science

    • kaggle.com
    zip
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allanatrix (2025). Material science [Dataset]. https://www.kaggle.com/datasets/allanwandia/material-science
    Explore at:
    zip(5034116 bytes)Available download formats
    Dataset updated
    Mar 11, 2025
    Authors
    Allanatrix
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a comprehensive collection of computed properties for a wide range of materials, sourced from the Materials Project database. Each entry represents a unique material, identified by a material_id, and includes detailed information about its chemical composition and physical properties. These properties are calculated using density functional theory (DFT), a widely used computational method in materials science for predicting material behavior. The dataset is ideal for researchers, data scientists, and machine learning practitioners interested in materials discovery, property prediction, and exploratory analysis.

  20. World Bank Indicators (1960‑Present)

    • kaggle.com
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George DiNicola (2025). World Bank Indicators (1960‑Present) [Dataset]. https://www.kaggle.com/datasets/georgejdinicola/world-bank-indicators
    Explore at:
    zip(52559856 bytes)Available download formats
    Dataset updated
    May 29, 2025
    Authors
    George DiNicola
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset provides a comprehensive collection of time series data sourced from the World Bank Open Data Platform, covering a wide range of global indicators from 1960 to the most recently published year. It includes economic, social, environmental, and demographic metrics, making it an ideal resource for researchers, data scientists, and policymakers interested in global development trends, economic forecasting, or socio-economic analysis.

    A tutorial on how to combined the dataset topics together into one large dataset can be found here

    Why this Dataset?

    My motivation for this project was to curate a high-quality collection of datasets for World Bank indicators organized by topics and structured in time-series, making them more accessible for data science projects. Since the World Bank’s Kaggle datasets have not been updated since 2019 https://www.kaggle.com/organizations/theworldbank, I saw an opportunity to provide more current data for the data analysis community.

    Dataset Collection Contents

    This collection brings together more than 800 World Bank indicators organized into 18 topic‑specific CSV files. Each file is structured as a country‑year panel: every row represents a unique combination of year (1960‑present) and ISO‑3 country code, while the columns hold the topic’s indicators.

    The collection includes datasets with a variety of indicators, such as: - Economic Metrics: GDP growth (%), GDP per capita, consumer price inflation, merchandise trade, gross capital formation, and more.
    - Social Metrics: School enrollment (primary, secondary, tertiary), infant mortality rate, maternal mortality rate, poverty headcount, and more.
    - Environmental Metrics: Forest area, renewable energy consumption, food production indices, and more.
    - Demographic Metrics: Urban population, life expectancy, net migration, and more.

    Usage

    This dataset is ideal for a variety of applications, including: - Economic forecasting and trend analysis (e.g., GDP growth, inflation).
    - Socio-economic studies (e.g., education, health, poverty).
    - Environmental impact analysis (e.g., renewable energy adoption).
    - Demographic research (e.g., population trends, migration).

    Topic datasets can be merged with each other using year and country code. This tutorial with notebook code can help you get started quickly.

    Collection Methodology

    The data is collected via a custom software application that discovers and groups high-quality indicators with rules-based logic & artificial intelligence, generates metadata, and performs ETL for the data from the World Bank API. The result is a clean, up‑to‑date collection of World Bank indicators in time-series format that is ready for analysis—no manual downloads or data wrangling required.

    Modifications

    The original World Bank data has been aggregated and transformed for ease of use. Missing values have been preserved as provided by the World Bank, and no significant transformations have been applied beyond formatting and aggregation into a single file.

    Source & Attribution

    The World Bank: World Development Indicators

    This dataset is publicly available and sourced from the World Bank Open Data Platform and is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. When using this data, please attribute the World Bank as follows: "Data sourced from the World Bank, licensed under CC BY 4.0." For more details on the World Bank’s terms of use, visit: https://www.worldbank.org/en/about/legal/terms-of-use-for-datasets.

    License

    This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Feel free to use this data in Kaggle notebooks, academic research, or policy analysis. If you create a derived dataset or analysis, I encourage you to share it with the Kaggle community.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
NIYIBIGIRA Geredi (2022). My First Data Science Project [Dataset]. https://www.kaggle.com/datasets/niyibigirageredi/my-first-data-science-project
Organization logo

My First Data Science Project

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zip(156349 bytes)Available download formats
Dataset updated
Aug 24, 2022
Authors
NIYIBIGIRA Geredi
Description

Dataset

This dataset was created by NIYIBIGIRA Geredi

Contents

Search
Clear search
Close search
Google apps
Main menu