11 datasets found

World Bank: Education Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

http://data.worldbank.org/data-catalog/ed-stats

https://cloud.google.com/bigquery/public-data/world-bank-education

Citation: The World Bank: Education Statistics

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @till_indeman from Unplash.

Inspiration

Of total government spending, what percentage is spent on education?
titanic_dataset
kaggle.com
Updated Jun 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SURENDHAN (2024). titanic_dataset [Dataset]. https://www.kaggle.com/datasets/surendhan/titanic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SURENDHAN
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Titanic dataset on Kaggle is a well-known dataset used for machine learning and data science projects, especially for binary classification tasks. It includes data on the passengers of the Titanic, which sank on its maiden voyage in 1912. This dataset is often used to predict the likelihood of a passenger's survival based on various features. Here is a detailed description of the dataset:

Overview The Titanic dataset includes information about the passengers on the Titanic, such as their demographic information, class, fare, and whether they survived the disaster. The goal is to predict the survival of the passengers.

Files The dataset typically includes three files:

train.csv: The training set, which includes the features and the target variable (Survived). test.csv: The test set, which includes the features but not the target variable. You use this file to make predictions that can be submitted to Kaggle. gender_submission.csv: An example of a submission file in the correct format. Features The dataset contains the following columns:

PassengerId: Unique ID for each passenger. Survived: Target variable (0 = No, 1 = Yes) indicating if the passenger survived. Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd). Name: Name of the passenger. Sex: Gender of the passenger (male or female). Age: Age of the passenger in years. Fractional values indicate age in months for infants. SibSp: Number of siblings or spouses aboard the Titanic. Parch: Number of parents or children aboard the Titanic. Ticket: Ticket number. Fare: Passenger fare. Cabin: Cabin number. Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).
Automate simple moving average trading with Python
kaggle.com
Updated Sep 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CSvikram100 (2020). Automate simple moving average trading with Python [Dataset]. https://www.kaggle.com/csvikram100/automate-simple-moving-average-trading-with-python/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
CSvikram100
Description
Dataset

This dataset was created by CSvikram100

Contents
Bellabeat case study using R
kaggle.com
Updated Oct 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R. Naga Amrutha (2022). Bellabeat case study using R [Dataset]. https://www.kaggle.com/rnagaamrutha/bellabeatcasestudywithr/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
R. Naga Amrutha
Description
Dataset

This dataset was created by R. Naga Amrutha

Contents
Bacteria Dataset
kaggle.com
Updated Mar 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kanchana1990 (2024). Bacteria Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7955145
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7955145
Dataset updated
Mar 27, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kanchana1990
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Dataset Overview

This dataset provides a comprehensive overview of 200 unique bacterial species, highlighting their scientific classification, natural habitats, and potential impacts on human health. Designed for data scientists and researchers, this collection serves as a foundational resource for studies in microbiology, public health, and environmental science. Each entry has been meticulously compiled to offer insights into the diverse roles bacteria play in ecosystems and their interactions with humans.

Data Science Applications

With 200 carefully curated entries, this dataset is ideal for a variety of data science applications, including but not limited to: - Predictive modeling to understand factors influencing bacterial habitats and human health implications. - Clustering analyses to uncover patterns and relationships among bacterial families and their characteristics. - Data visualization projects to illustrate the diversity of bacterial life and its relevance to ecosystems and health.

Column Descriptors

Name: The scientific name of the bacterial species.

Family: The taxonomic family to which the bacterium belongs.

Where Found: Natural habitats or common environments where the bacterium is typically found, including multiple locations if applicable.

Harmful to Humans: Indicates whether the bacterium is known to have harmful effects on human health ("Yes" or "No").

Ethically Mined Data

The compilation of this dataset adheres to ethical data mining practices, ensuring respect for intellectual property rights and scientific integrity. No proprietary or confidential information has been included without appropriate permissions and acknowledgments.

Sources

The data within this dataset has been gathered and synthesized from a range of authoritative sources, ensuring reliability and accuracy:

Websites: - CDC (Centers for Disease Control and Prevention): Offers extensive information on pathogenic bacteria and their impact on human health. - WHO (World Health Organization): Provides global health-related data, including details on bacteria responsible for infectious diseases.

Scientific Journals: - "Journal of Bacteriology": A peer-reviewed scientific journal that publishes research articles on the biology of bacteria. - "Microbiology": Offers articles on microbiology, virology, and molecular biology, with a focus on novel bacterial species and their functions.

Textbooks: - "Brock Biology of Microorganisms" by Michael T. Madigan et al.: A comprehensive textbook covering the principles of microbiology, including detailed information on bacteria. - "Prescott's Microbiology" by Joanne Willey, Linda Sherwood, and Christopher J. Woolverton: Provides a thorough introduction to the field of microbiology, with an emphasis on bacterial species and their roles.

This dataset represents a synthesis of credible scientific knowledge aimed at fostering research and education in microbiology and related fields.
Cyclistic Bike Share (Jul-2011 to Jun-2022)
kaggle.com
Updated Jul 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
oXiaoFango (2022). Cyclistic Bike Share (Jul-2011 to Jun-2022) [Dataset]. https://www.kaggle.com/datasets/oxiaofango/cyclistic-bike-share-jul2011-to-jun2022/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
oXiaoFango
Description
Dataset

This dataset was created by oXiaoFango

Released under Data files © Original Authors

Contents
UBER NYC
kaggle.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Ramadan (2025). UBER NYC [Dataset]. https://www.kaggle.com/datasets/ahmedramadan74/uber-nyc/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 14, 2025
Dataset provided by
Kaggle
Authors
Ahmed Ramadan
Description
These data are systematically sampled under statistical conditions Link my notebook

This project has some objectives that we need to achieve:-

1) Data Analysis:

EDA and figure out underlying trip patterns in 2021.

Try exploring Uber's user portrait in NYC (which orders are urgent and what kind of users should be given higher priorities?)

I have done the data analysis, and here is the link my notebooks

2) Data Science:

build predict model to predict the peak footfall.
📊 Futuristic Smart City Citizen Activity Dataset
kaggle.com
Updated Mar 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atharva Soundankar (2025). 📊 Futuristic Smart City Citizen Activity Dataset [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/futuristic-smart-city-citizen-activity-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atharva Soundankar
Description
This dataset provides a detailed insight into the daily activities of citizens in a futuristic smart city. It covers various aspects such as:

Demographics (Age, Gender) Mobility (Mode of Transport, Walking Steps) Lifestyle & Social Engagement (Work, Shopping, Entertainment, Social Media) Health & Well-being (Calories Burned, Sleep Hours) Energy & Sustainability (Home Energy Consumption, Carbon Footprint, Charging Station Usage) With 1000 rows and 15 columns, this dataset is ideal for data analysis, machine learning, and visualization projects related to urban mobility, sustainability, health trends, and behavioral analytics.

This dataset can be used to:

✅ Analyze citizen behavior trends

✅ Understand sustainable urban mobility

✅ Predict energy consumption patterns

✅ Identify health and social media habits
TMDB Movies List For movie recommender system
kaggle.com
Updated Aug 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ABDUL BASIT JAWAID (2022). TMDB Movies List For movie recommender system [Dataset]. https://www.kaggle.com/datasets/abjr002/movies-list-for-movie-recommender-system
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2022
Dataset provided by
Kaggle
Authors
ABDUL BASIT JAWAID
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

These files contain metadata for over 20,000 movies listed in the Full TMDB Dataset. The dataset consists of movies released on or before August 2022 as well as some of the upcoming movies till Dec 2028. Data points include title, release dates, languages, genre, popularity, TMDB vote counts, and vote averages.

Acknowledgements

The Movie Details have been collected from the TMDB Open API. This product uses the TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional movies, actors and actresses, crew members, and TV shows. You can try it for yourself here.

Inspiration

This dataset is assembled as part of my Project for Recommender Systems. I wanted to perform an extensive EDA on Movie Data to build various types of Recommender Systems.
TV, halftime shows, and the Big Game Dataset
kaggle.com
Updated Aug 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amit Hasan Shuvo (2020). TV, halftime shows, and the Big Game Dataset [Dataset]. https://www.kaggle.com/amithasanshuvo/tv-halftime-shows-and-the-big-game-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Amit Hasan Shuvo
Description
Content

Whether or not you like football, the Super Bowl is a spectacle. There's a little something for everyone at your Super Bowl party. Drama in the form of blowouts, comebacks, and controversy for the sports fan. There are the ridiculously expensive ads, some hilarious, others gut-wrenching, thought-provoking, and weird. The half-time shows with the biggest musicians in the world, sometimes riding giant mechanical tigers or leaping from the roof of the stadium. In this notebook, we're going to find out how some of the elements of this show interact with each other. After exploring and cleaning our data a little, we're going to answer questions like:

What are the most extreme game outcomes?

How does the game affect television viewership?

How have viewership, TV ratings, and ad cost evolved over time?

Who are the most prolific musicians in terms of halftime show performances?

Acknowledgements

The dataset we'll use was scraped and polished from Wikipedia. It is made up of three CSV files, one with game data, one with TV data, and one with halftime musician data for all 52 Super Bowls through 2018.

Reference

This dataset is one of the projects of Data Scientist with Python Career Track at DataCamp. Link: https://www.datacamp.com/projects/684
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education

World Bank: Education Data

World Bank: Education Data (BigQuery Dataset)

Explore at:

45 scholarly articles cite this dataset (View in Google Scholar)

zip(0 bytes)Available download formats

Dataset updated

Mar 20, 2019

Dataset authored and provided by

World Bankhttp://worldbank.org/

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

http://data.worldbank.org/data-catalog/ed-stats

https://cloud.google.com/bigquery/public-data/world-bank-education

Citation: The World Bank: Education Statistics

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @till_indeman from Unplash.

Inspiration

Of total government spending, what percentage is spent on education?

Clear search

Close search

Google apps

Main menu

World Bank: Education Data

Context

Content

Acknowledgements

Inspiration

titanic_dataset

Automate simple moving average trading with Python

Dataset

Contents

Bellabeat case study using R

Dataset

Contents

Bacteria Dataset

Dataset Overview

Data Science Applications

Column Descriptors

Ethically Mined Data

Sources

Cyclistic Bike Share (Jul-2011 to Jun-2022)

Dataset

Contents

UBER NYC

This project has some objectives that we need to achieve:-

1) Data Analysis:

2) Data Science:

📊 Futuristic Smart City Citizen Activity Dataset

TMDB Movies List For movie recommender system

Context

Acknowledgements

Inspiration

TV, halftime shows, and the Big Game Dataset

Content

Acknowledgements

Reference

Data Cleaning Sample

World Bank: Education Data

World Bank: Education Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration