3 datasets found

Cyclistic Bike Share: A Case Study
kaggle.com
zip
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casey Kellerhals (2023). Cyclistic Bike Share: A Case Study [Dataset]. https://www.kaggle.com/datasets/caskelle/cyclistic-bike-share-a-case-study/code
Explore at:
zip(269575250 bytes)Available download formats
Dataset updated
Jul 25, 2023
Authors
Casey Kellerhals
Description
The Mission Statement

Cyclistic, a bike sharing company, wants to analyze their user data to find the main differences in behavior between their two types of users. The Casual Riders are those who pay for each ride and the Annual Member who pays a yearly subscription to the service.

PHASE 1 : ASK

Key objectives: 1.Identify The Business Task: - Cyclistic wants to analyze the data to find the key differences between Casual Riders and Annual Members. The goal of this project is to reach out to the casual riders and incentivize them into paying for the annual subscription.

Consider Key Stakeholders:

The key stakeholders in this project are the executive team and the director of marketing, Lily Moreno.

PHASE 2 : Prepare

Key objectives: 1. Download Data And Store It Appropriately - Downloaded the data as .csv files, which were saved in their own folder to keep everything organized. I then uploaded those files into BigQuery for cleaning and analysis. For this project I downloaded all of 2022 and up to May of 2023, as this is the most recent data that I have access to.

Identify How It's Organized

The data is organized into months, from 01-2022 to 05-2023.

Sort and Filter The Data and Determine The Credibility of The Data

For this data I used BigQuery and SQL in order to sort, filter and analyze the credibility of the data. The data is collected first hand by Cyslistic and there is a lot of information to work with. I filtered out the data that I wanted to work with, the data that I chose were the types of bikes, the types of members and the date the bikes were used.

PHASE 3 : Process

Key objectives: 1.Clean The Data and Prepare The Data For Analysis: -I used some simple SQL code in order to determine that no members were missing, that no information was repeated and that there were no misspellings in the data as well.

--no misspelling in either member or casual. This ensures that all results will not have missing information. SELECT DISTINCT member_casual
FROM table

--This shows how many casual riders and members used the service, should add up to the numb of rows in the dataset SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM table GROUP BY member_type

--Shows that every bike has a distinct ID. SELECT DISTINCT ride_id FROM table

--Shows that there are no typos in the types of bikes, so no data will be missing from results. SELECT DISTINCT rideable_type FROM table

PHASE 4 : Analyze

Key objectives: 1. Aggregate Your Data So It's Useful and Accessible -I had to write some SQL code so that I could combine all the data from the different files I had uploaded onto BigQuery

select rideable_type, started_at, ended_at, member_casual from table 1 union all select rideable_type, started_at, ended_at, member_casual from table 2 union all select rideable_type, started_at, ended_at, member_casual from table 3 union all select rideable_type, started_at, ended_at, member_casual from table 4 union all select rideable_type, started_at, ended_at, member_casual from table 5 union all select rideable_type, started_at, ended_at, member_casual from table 6 union all select rideable_type, started_at, ended_at, member_casual from table 7 union all select rideable_type, started_at, ended_at, member_casual from table 8 union all select rideable_type, started_at, ended_at, member_casual from table 9 union all select rideable_type, started_at, ended_at, member_casual from table10 union all select rideable_type, started_at, ended_at, member_casual from table 11 union all select rideable_type, started_at, ended_at, member_casual from table 12 union all select rideable_type, started_at, ended_at, member_casual from table 13 union all select rideable_type, started_at, ended_at, member_casual from table 14 union all select rideable_type, started_at, ended_at, member_casual from table 15 union all select rideable_type, started_at, ended_at, member_casual from table 16 union all select rideable_type, started_at, ended_at, member_casual from table 17

Identify trends and relationships -After I had aggregated all of the data I had chosen, I then ran SQL code to determine the trends and relationships contained within the data. After analyzing the data, I uploaded that data into google sheets to make the graphs to express those trends and make it easier to identify the key differences between Casual Riders and Annual Members.

--This shows how many casual and annual members used bikes SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM Aggregate Data Table GROUP BY member_type

![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14378099%2Fe09c3496bf38d323f8323f52f67...
MLB 2016 Pitch-by-Pitch
console.cloud.google.com
Updated Jul 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Sportradar&hl=fr (2023). MLB 2016 Pitch-by-Pitch [Dataset]. https://console.cloud.google.com/marketplace/product/sportradar-public-data/mlb-pitch-by-pitch?hl=fr
Explore at:
Dataset updated
Jul 27, 2023
Dataset provided by
Sportradarhttp://sportradar.com/
Googlehttp://google.com/
Description
This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. This dataset contains the following tables: games_wide (every pitch, steal, or lineup event for each at bat in the 2016 regular season), games_post_wide(every pitch, steal, or lineup event for each at-bat in the 2016 post season), and schedules ( the schedule for every team in the regular season). The schemas for the games_wide and games_post_wide tables are identical. With this data you can effectively replay a game and rebuild basic statistics for players and teams. Note: This data was built via a denormalization process over raw game log files which may contain scoring errors and in some cases missing data. For official scoring and statistical information please consult mlb.com , baseball-reference.com , or sportradar.com . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Malaysia's Weather Data (1996-2024)
kaggle.com
zip
Updated Sep 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahmir Varqha (2024). Malaysia's Weather Data (1996-2024) [Dataset]. https://www.kaggle.com/datasets/shahmirvarqha/weather-data-malaysia/code
Explore at:
zip(325977525 bytes)Available download formats
Dataset updated
Sep 1, 2024
Authors
Shahmir Varqha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Malaysia
Description
Contains hourly weather, air quality and uv data in Malaysia from 1996 - present across a variety of locations.

Includes weather factors like temperature, wind speed and uv index. Air quality data includes pollutant_value.

Where is the data from and how is it extracted? quality-of-life repo

Data files

full_weather.csv - Contains main weather data from 1996-2023 July

air_quality_indicators.csv - Info about air quality indicators

air_quality_warnings.csv - Info about air quality warning levels

uv_info - Info about uv indexes and their dangers

full_locations.csv - Info about each location extracted

The dataset is accessible using BigQuery/SQL or CSV . The cloud dataset is updated everyday.

BigQuery (SQL)

Try out the public notebook: https://www.kaggle.com/code/shahmirvarqha/weather-bigquery

There are several fact tables (main data is here):

prod.air_quality - All air quality data

prod.weather - Only airport weather stations

prod.personal_weather - Only personal weather stations

prod.uv - Merges UV data with warnings and labels

prod.full_weather_places - Similar to full_weather.csv file

The following tables have additional information:

prod.state_locations

prod.full_locations - Similar to full_locations.csv file

prod.city_states

prod.city_places

prod.air_quality_warnings

prod.air_quality_indicators

prod.uv_info

*Not all days and hours are available (especially earlier on, there is a lot of missing data).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Casey Kellerhals (2023). Cyclistic Bike Share: A Case Study [Dataset]. https://www.kaggle.com/datasets/caskelle/cyclistic-bike-share-a-case-study/code

Cyclistic Bike Share: A Case Study

Explore at:

zip(269575250 bytes)Available download formats

Dataset updated

Jul 25, 2023

Authors

Casey Kellerhals

Description

The Mission Statement

Cyclistic, a bike sharing company, wants to analyze their user data to find the main differences in behavior between their two types of users. The Casual Riders are those who pay for each ride and the Annual Member who pays a yearly subscription to the service.

PHASE 1 : ASK

Key objectives: 1.Identify The Business Task: - Cyclistic wants to analyze the data to find the key differences between Casual Riders and Annual Members. The goal of this project is to reach out to the casual riders and incentivize them into paying for the annual subscription.

Consider Key Stakeholders:
- The key stakeholders in this project are the executive team and the director of marketing, Lily Moreno.

PHASE 2 : Prepare

Key objectives: 1. Download Data And Store It Appropriately - Downloaded the data as .csv files, which were saved in their own folder to keep everything organized. I then uploaded those files into BigQuery for cleaning and analysis. For this project I downloaded all of 2022 and up to May of 2023, as this is the most recent data that I have access to.

Identify How It's Organized
- The data is organized into months, from 01-2022 to 05-2023.
Sort and Filter The Data and Determine The Credibility of The Data
- For this data I used BigQuery and SQL in order to sort, filter and analyze the credibility of the data. The data is collected first hand by Cyslistic and there is a lot of information to work with. I filtered out the data that I wanted to work with, the data that I chose were the types of bikes, the types of members and the date the bikes were used.

PHASE 3 : Process

Key objectives: 1.Clean The Data and Prepare The Data For Analysis: -I used some simple SQL code in order to determine that no members were missing, that no information was repeated and that there were no misspellings in the data as well.

--no misspelling in either member or casual. This ensures that all results will not have missing information. SELECT DISTINCT member_casual
FROM table

--This shows how many casual riders and members used the service, should add up to the numb of rows in the dataset SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM table GROUP BY member_type

--Shows that every bike has a distinct ID. SELECT DISTINCT ride_id FROM table

--Shows that there are no typos in the types of bikes, so no data will be missing from results. SELECT DISTINCT rideable_type FROM table

PHASE 4 : Analyze

Key objectives: 1. Aggregate Your Data So It's Useful and Accessible -I had to write some SQL code so that I could combine all the data from the different files I had uploaded onto BigQuery

select rideable_type, started_at, ended_at, member_casual from table 1 union all select rideable_type, started_at, ended_at, member_casual from table 2 union all select rideable_type, started_at, ended_at, member_casual from table 3 union all select rideable_type, started_at, ended_at, member_casual from table 4 union all select rideable_type, started_at, ended_at, member_casual from table 5 union all select rideable_type, started_at, ended_at, member_casual from table 6 union all select rideable_type, started_at, ended_at, member_casual from table 7 union all select rideable_type, started_at, ended_at, member_casual from table 8 union all select rideable_type, started_at, ended_at, member_casual from table 9 union all select rideable_type, started_at, ended_at, member_casual from table10 union all select rideable_type, started_at, ended_at, member_casual from table 11 union all select rideable_type, started_at, ended_at, member_casual from table 12 union all select rideable_type, started_at, ended_at, member_casual from table 13 union all select rideable_type, started_at, ended_at, member_casual from table 14 union all select rideable_type, started_at, ended_at, member_casual from table 15 union all select rideable_type, started_at, ended_at, member_casual from table 16 union all select rideable_type, started_at, ended_at, member_casual from table 17

Identify trends and relationships -After I had aggregated all of the data I had chosen, I then ran SQL code to determine the trends and relationships contained within the data. After analyzing the data, I uploaded that data into google sheets to make the graphs to express those trends and make it easier to identify the key differences between Casual Riders and Annual Members.

--This shows how many casual and annual members used bikes SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM Aggregate Data Table GROUP BY member_type

![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14378099%2Fe09c3496bf38d323f8323f52f67...

Clear search

Close search

Google apps

Main menu

Cyclistic Bike Share: A Case Study

The Mission Statement

PHASE 1 : ASK

PHASE 2 : Prepare

PHASE 3 : Process

PHASE 4 : Analyze

MLB 2016 Pitch-by-Pitch

Malaysia's Weather Data (1996-2024)

Contains hourly weather, air quality and uv data in Malaysia from 1996 - present across a variety of locations.

Data files

BigQuery (SQL)

Cyclistic Bike Share: A Case Study

The Mission Statement

PHASE 1 : ASK

PHASE 2 : Prepare

PHASE 3 : Process

PHASE 4 : Analyze