11 datasets found
  1. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  2. titanic_dataset

    • kaggle.com
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SURENDHAN (2024). titanic_dataset [Dataset]. https://www.kaggle.com/datasets/surendhan/titanic-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SURENDHAN
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Titanic dataset on Kaggle is a well-known dataset used for machine learning and data science projects, especially for binary classification tasks. It includes data on the passengers of the Titanic, which sank on its maiden voyage in 1912. This dataset is often used to predict the likelihood of a passenger's survival based on various features. Here is a detailed description of the dataset:

    Overview The Titanic dataset includes information about the passengers on the Titanic, such as their demographic information, class, fare, and whether they survived the disaster. The goal is to predict the survival of the passengers.

    Files The dataset typically includes three files:

    train.csv: The training set, which includes the features and the target variable (Survived). test.csv: The test set, which includes the features but not the target variable. You use this file to make predictions that can be submitted to Kaggle. gender_submission.csv: An example of a submission file in the correct format. Features The dataset contains the following columns:

    PassengerId: Unique ID for each passenger. Survived: Target variable (0 = No, 1 = Yes) indicating if the passenger survived. Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd). Name: Name of the passenger. Sex: Gender of the passenger (male or female). Age: Age of the passenger in years. Fractional values indicate age in months for infants. SibSp: Number of siblings or spouses aboard the Titanic. Parch: Number of parents or children aboard the Titanic. Ticket: Ticket number. Fare: Passenger fare. Cabin: Cabin number. Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).

  3. Automate simple moving average trading with Python

    • kaggle.com
    Updated Sep 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSvikram100 (2020). Automate simple moving average trading with Python [Dataset]. https://www.kaggle.com/csvikram100/automate-simple-moving-average-trading-with-python/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 23, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    CSvikram100
    Description

    Dataset

    This dataset was created by CSvikram100

    Contents

  4. Bellabeat case study using R

    • kaggle.com
    Updated Oct 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R. Naga Amrutha (2022). Bellabeat case study using R [Dataset]. https://www.kaggle.com/rnagaamrutha/bellabeatcasestudywithr/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    R. Naga Amrutha
    Description

    Dataset

    This dataset was created by R. Naga Amrutha

    Contents

  5. Bacteria Dataset

    • kaggle.com
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanchana1990 (2024). Bacteria Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7955145
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kanchana1990
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Dataset Overview

    This dataset provides a comprehensive overview of 200 unique bacterial species, highlighting their scientific classification, natural habitats, and potential impacts on human health. Designed for data scientists and researchers, this collection serves as a foundational resource for studies in microbiology, public health, and environmental science. Each entry has been meticulously compiled to offer insights into the diverse roles bacteria play in ecosystems and their interactions with humans.

    Data Science Applications

    With 200 carefully curated entries, this dataset is ideal for a variety of data science applications, including but not limited to: - Predictive modeling to understand factors influencing bacterial habitats and human health implications. - Clustering analyses to uncover patterns and relationships among bacterial families and their characteristics. - Data visualization projects to illustrate the diversity of bacterial life and its relevance to ecosystems and health.

    Column Descriptors

    1. Name: The scientific name of the bacterial species.
    2. Family: The taxonomic family to which the bacterium belongs.
    3. Where Found: Natural habitats or common environments where the bacterium is typically found, including multiple locations if applicable.
    4. Harmful to Humans: Indicates whether the bacterium is known to have harmful effects on human health ("Yes" or "No").

    Ethically Mined Data

    The compilation of this dataset adheres to ethical data mining practices, ensuring respect for intellectual property rights and scientific integrity. No proprietary or confidential information has been included without appropriate permissions and acknowledgments.

    Sources

    The data within this dataset has been gathered and synthesized from a range of authoritative sources, ensuring reliability and accuracy:

    Websites: - CDC (Centers for Disease Control and Prevention): Offers extensive information on pathogenic bacteria and their impact on human health. - WHO (World Health Organization): Provides global health-related data, including details on bacteria responsible for infectious diseases.

    Scientific Journals: - "Journal of Bacteriology": A peer-reviewed scientific journal that publishes research articles on the biology of bacteria. - "Microbiology": Offers articles on microbiology, virology, and molecular biology, with a focus on novel bacterial species and their functions.

    Textbooks: - "Brock Biology of Microorganisms" by Michael T. Madigan et al.: A comprehensive textbook covering the principles of microbiology, including detailed information on bacteria. - "Prescott's Microbiology" by Joanne Willey, Linda Sherwood, and Christopher J. Woolverton: Provides a thorough introduction to the field of microbiology, with an emphasis on bacterial species and their roles.

    This dataset represents a synthesis of credible scientific knowledge aimed at fostering research and education in microbiology and related fields.

  6. Cyclistic Bike Share (Jul-2011 to Jun-2022)

    • kaggle.com
    Updated Jul 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    oXiaoFango (2022). Cyclistic Bike Share (Jul-2011 to Jun-2022) [Dataset]. https://www.kaggle.com/datasets/oxiaofango/cyclistic-bike-share-jul2011-to-jun2022/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    oXiaoFango
    Description

    Dataset

    This dataset was created by oXiaoFango

    Released under Data files Ā© Original Authors

    Contents

  7. UBER NYC

    • kaggle.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Ramadan (2025). UBER NYC [Dataset]. https://www.kaggle.com/datasets/ahmedramadan74/uber-nyc/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 14, 2025
    Dataset provided by
    Kaggle
    Authors
    Ahmed Ramadan
    Description

    These data are systematically sampled under statistical conditions Link my notebook

    This project has some objectives that we need to achieve:-

    1) Data Analysis:

    • EDA and figure out underlying trip patterns in 2021.
    • Try exploring Uber's user portrait in NYC (which orders are urgent and what kind of users should be given higher priorities?)

    I have done the data analysis, and here is the link my notebooks

    2) Data Science:

    • build predict model to predict the peak footfall.
  8. šŸ“Š Futuristic Smart City Citizen Activity Dataset

    • kaggle.com
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Soundankar (2025). šŸ“Š Futuristic Smart City Citizen Activity Dataset [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/futuristic-smart-city-citizen-activity-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atharva Soundankar
    Description

    This dataset provides a detailed insight into the daily activities of citizens in a futuristic smart city. It covers various aspects such as:

    Demographics (Age, Gender) Mobility (Mode of Transport, Walking Steps) Lifestyle & Social Engagement (Work, Shopping, Entertainment, Social Media) Health & Well-being (Calories Burned, Sleep Hours) Energy & Sustainability (Home Energy Consumption, Carbon Footprint, Charging Station Usage) With 1000 rows and 15 columns, this dataset is ideal for data analysis, machine learning, and visualization projects related to urban mobility, sustainability, health trends, and behavioral analytics.

    This dataset can be used to:

    āœ… Analyze citizen behavior trends

    āœ… Understand sustainable urban mobility

    āœ… Predict energy consumption patterns

    āœ… Identify health and social media habits

  9. TMDB Movies List For movie recommender system

    • kaggle.com
    Updated Aug 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABDUL BASIT JAWAID (2022). TMDB Movies List For movie recommender system [Dataset]. https://www.kaggle.com/datasets/abjr002/movies-list-for-movie-recommender-system
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2022
    Dataset provided by
    Kaggle
    Authors
    ABDUL BASIT JAWAID
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    These files contain metadata for over 20,000 movies listed in the Full TMDB Dataset. The dataset consists of movies released on or before August 2022 as well as some of the upcoming movies till Dec 2028. Data points include title, release dates, languages, genre, popularity, TMDB vote counts, and vote averages.

    Acknowledgements

    The Movie Details have been collected from the TMDB Open API. This product uses the TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional movies, actors and actresses, crew members, and TV shows. You can try it for yourself here.

    Inspiration

    This dataset is assembled as part of my Project for Recommender Systems. I wanted to perform an extensive EDA on Movie Data to build various types of Recommender Systems.

  10. TV, halftime shows, and the Big Game Dataset

    • kaggle.com
    Updated Aug 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amit Hasan Shuvo (2020). TV, halftime shows, and the Big Game Dataset [Dataset]. https://www.kaggle.com/amithasanshuvo/tv-halftime-shows-and-the-big-game-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Amit Hasan Shuvo
    Description

    Content

    Whether or not you like football, the Super Bowl is a spectacle. There's a little something for everyone at your Super Bowl party. Drama in the form of blowouts, comebacks, and controversy for the sports fan. There are the ridiculously expensive ads, some hilarious, others gut-wrenching, thought-provoking, and weird. The half-time shows with the biggest musicians in the world, sometimes riding giant mechanical tigers or leaping from the roof of the stadium. In this notebook, we're going to find out how some of the elements of this show interact with each other. After exploring and cleaning our data a little, we're going to answer questions like:

    1. What are the most extreme game outcomes?
    2. How does the game affect television viewership?
    3. How have viewership, TV ratings, and ad cost evolved over time?
    4. Who are the most prolific musicians in terms of halftime show performances?

    Acknowledgements

    The dataset we'll use was scraped and polished from Wikipedia. It is made up of three CSV files, one with game data, one with TV data, and one with halftime musician data for all 52 Super Bowls through 2018.

    Reference

    This dataset is one of the projects of Data Scientist with Python Career Track at DataCamp. Link: https://www.datacamp.com/projects/684

  11. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
Organization logo

World Bank: Education Data

World Bank: Education Data (BigQuery Dataset)

Explore at:
45 scholarly articles cite this dataset (View in Google Scholar)
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

http://data.worldbank.org/data-catalog/ed-stats

https://cloud.google.com/bigquery/public-data/world-bank-education

Citation: The World Bank: Education Statistics

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @till_indeman from Unplash.

Inspiration

Of total government spending, what percentage is spent on education?

Search
Clear search
Close search
Google apps
Main menu