3 datasets found
  1. Cyclistic Bike Share: A Case Study

    • kaggle.com
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Kellerhals (2023). Cyclistic Bike Share: A Case Study [Dataset]. https://www.kaggle.com/datasets/caskelle/cyclistic-bike-share-a-case-study/code
    Explore at:
    zip(269575250 bytes)Available download formats
    Dataset updated
    Jul 25, 2023
    Authors
    Casey Kellerhals
    Description

    The Mission Statement

    Cyclistic, a bike sharing company, wants to analyze their user data to find the main differences in behavior between their two types of users. The Casual Riders are those who pay for each ride and the Annual Member who pays a yearly subscription to the service.

    PHASE 1 : ASK

    Key objectives: 1.Identify The Business Task: - Cyclistic wants to analyze the data to find the key differences between Casual Riders and Annual Members. The goal of this project is to reach out to the casual riders and incentivize them into paying for the annual subscription.

    1. Consider Key Stakeholders:
      • The key stakeholders in this project are the executive team and the director of marketing, Lily Moreno.

    PHASE 2 : Prepare

    Key objectives: 1. Download Data And Store It Appropriately - Downloaded the data as .csv files, which were saved in their own folder to keep everything organized. I then uploaded those files into BigQuery for cleaning and analysis. For this project I downloaded all of 2022 and up to May of 2023, as this is the most recent data that I have access to.

    1. Identify How It's Organized

      • The data is organized into months, from 01-2022 to 05-2023.
    2. Sort and Filter The Data and Determine The Credibility of The Data

      • For this data I used BigQuery and SQL in order to sort, filter and analyze the credibility of the data. The data is collected first hand by Cyslistic and there is a lot of information to work with. I filtered out the data that I wanted to work with, the data that I chose were the types of bikes, the types of members and the date the bikes were used.

    PHASE 3 : Process

    Key objectives: 1.Clean The Data and Prepare The Data For Analysis: -I used some simple SQL code in order to determine that no members were missing, that no information was repeated and that there were no misspellings in the data as well.

    --no misspelling in either member or casual. This ensures that all results will not have missing information. SELECT DISTINCT member_casual
    FROM table

    --This shows how many casual riders and members used the service, should add up to the numb of rows in the dataset SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM table GROUP BY member_type

    --Shows that every bike has a distinct ID. SELECT DISTINCT ride_id FROM table

    --Shows that there are no typos in the types of bikes, so no data will be missing from results. SELECT DISTINCT rideable_type FROM table

    PHASE 4 : Analyze

    Key objectives: 1. Aggregate Your Data So It's Useful and Accessible -I had to write some SQL code so that I could combine all the data from the different files I had uploaded onto BigQuery

    select rideable_type, started_at, ended_at, member_casual from table 1 union all select rideable_type, started_at, ended_at, member_casual from table 2 union all select rideable_type, started_at, ended_at, member_casual from table 3 union all select rideable_type, started_at, ended_at, member_casual from table 4 union all select rideable_type, started_at, ended_at, member_casual from table 5 union all select rideable_type, started_at, ended_at, member_casual from table 6 union all select rideable_type, started_at, ended_at, member_casual from table 7 union all select rideable_type, started_at, ended_at, member_casual from table 8 union all select rideable_type, started_at, ended_at, member_casual from table 9 union all select rideable_type, started_at, ended_at, member_casual from table10 union all select rideable_type, started_at, ended_at, member_casual from table 11 union all select rideable_type, started_at, ended_at, member_casual from table 12 union all select rideable_type, started_at, ended_at, member_casual from table 13 union all select rideable_type, started_at, ended_at, member_casual from table 14 union all select rideable_type, started_at, ended_at, member_casual from table 15 union all select rideable_type, started_at, ended_at, member_casual from table 16 union all select rideable_type, started_at, ended_at, member_casual from table 17

    1. Identify trends and relationships -After I had aggregated all of the data I had chosen, I then ran SQL code to determine the trends and relationships contained within the data. After analyzing the data, I uploaded that data into google sheets to make the graphs to express those trends and make it easier to identify the key differences between Casual Riders and Annual Members.

    --This shows how many casual and annual members used bikes SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM Aggregate Data Table GROUP BY member_type

    ![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14378099%2Fe09c3496bf38d323f8323f52f67...

  2. MLB 2016 Pitch-by-Pitch

    • console.cloud.google.com
    Updated Jul 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Sportradar&hl=fr (2023). MLB 2016 Pitch-by-Pitch [Dataset]. https://console.cloud.google.com/marketplace/product/sportradar-public-data/mlb-pitch-by-pitch?hl=fr
    Explore at:
    Dataset updated
    Jul 27, 2023
    Dataset provided by
    Sportradarhttp://sportradar.com/
    Googlehttp://google.com/
    Description

    This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. This dataset contains the following tables: games_wide (every pitch, steal, or lineup event for each at bat in the 2016 regular season), games_post_wide(every pitch, steal, or lineup event for each at-bat in the 2016 post season), and schedules ( the schedule for every team in the regular season). The schemas for the games_wide and games_post_wide tables are identical. With this data you can effectively replay a game and rebuild basic statistics for players and teams. Note: This data was built via a denormalization process over raw game log files which may contain scoring errors and in some cases missing data. For official scoring and statistical information please consult mlb.com , baseball-reference.com , or sportradar.com . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  3. Malaysia's Weather Data (1996-2024)

    • kaggle.com
    zip
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahmir Varqha (2024). Malaysia's Weather Data (1996-2024) [Dataset]. https://www.kaggle.com/datasets/shahmirvarqha/weather-data-malaysia/code
    Explore at:
    zip(325977525 bytes)Available download formats
    Dataset updated
    Sep 1, 2024
    Authors
    Shahmir Varqha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Malaysia
    Description

    Contains hourly weather, air quality and uv data in Malaysia from 1996 - present across a variety of locations.

    Includes weather factors like temperature, wind speed and uv index. Air quality data includes pollutant_value.

    Where is the data from and how is it extracted? quality-of-life repo

    Data files

    full_weather.csv - Contains main weather data from 1996-2023 July

    air_quality_indicators.csv - Info about air quality indicators

    air_quality_warnings.csv - Info about air quality warning levels

    uv_info - Info about uv indexes and their dangers

    full_locations.csv - Info about each location extracted

    The dataset is accessible using BigQuery/SQL or CSV . The cloud dataset is updated everyday.

    BigQuery (SQL)

    Try out the public notebook: https://www.kaggle.com/code/shahmirvarqha/weather-bigquery

    There are several fact tables (main data is here):

    prod.air_quality - All air quality data

    prod.weather - Only airport weather stations

    prod.personal_weather - Only personal weather stations

    prod.uv - Merges UV data with warnings and labels

    prod.full_weather_places - Similar to full_weather.csv file

    The following tables have additional information:

    prod.state_locations

    prod.full_locations - Similar to full_locations.csv file

    prod.city_states

    prod.city_places

    prod.air_quality_warnings

    prod.air_quality_indicators

    prod.uv_info

    *Not all days and hours are available (especially earlier on, there is a lot of missing data).

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Casey Kellerhals (2023). Cyclistic Bike Share: A Case Study [Dataset]. https://www.kaggle.com/datasets/caskelle/cyclistic-bike-share-a-case-study/code
Organization logo

Cyclistic Bike Share: A Case Study

Explore at:
zip(269575250 bytes)Available download formats
Dataset updated
Jul 25, 2023
Authors
Casey Kellerhals
Description

The Mission Statement

Cyclistic, a bike sharing company, wants to analyze their user data to find the main differences in behavior between their two types of users. The Casual Riders are those who pay for each ride and the Annual Member who pays a yearly subscription to the service.

PHASE 1 : ASK

Key objectives: 1.Identify The Business Task: - Cyclistic wants to analyze the data to find the key differences between Casual Riders and Annual Members. The goal of this project is to reach out to the casual riders and incentivize them into paying for the annual subscription.

  1. Consider Key Stakeholders:
    • The key stakeholders in this project are the executive team and the director of marketing, Lily Moreno.

PHASE 2 : Prepare

Key objectives: 1. Download Data And Store It Appropriately - Downloaded the data as .csv files, which were saved in their own folder to keep everything organized. I then uploaded those files into BigQuery for cleaning and analysis. For this project I downloaded all of 2022 and up to May of 2023, as this is the most recent data that I have access to.

  1. Identify How It's Organized

    • The data is organized into months, from 01-2022 to 05-2023.
  2. Sort and Filter The Data and Determine The Credibility of The Data

    • For this data I used BigQuery and SQL in order to sort, filter and analyze the credibility of the data. The data is collected first hand by Cyslistic and there is a lot of information to work with. I filtered out the data that I wanted to work with, the data that I chose were the types of bikes, the types of members and the date the bikes were used.

PHASE 3 : Process

Key objectives: 1.Clean The Data and Prepare The Data For Analysis: -I used some simple SQL code in order to determine that no members were missing, that no information was repeated and that there were no misspellings in the data as well.

--no misspelling in either member or casual. This ensures that all results will not have missing information. SELECT DISTINCT member_casual
FROM table

--This shows how many casual riders and members used the service, should add up to the numb of rows in the dataset SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM table GROUP BY member_type

--Shows that every bike has a distinct ID. SELECT DISTINCT ride_id FROM table

--Shows that there are no typos in the types of bikes, so no data will be missing from results. SELECT DISTINCT rideable_type FROM table

PHASE 4 : Analyze

Key objectives: 1. Aggregate Your Data So It's Useful and Accessible -I had to write some SQL code so that I could combine all the data from the different files I had uploaded onto BigQuery

select rideable_type, started_at, ended_at, member_casual from table 1 union all select rideable_type, started_at, ended_at, member_casual from table 2 union all select rideable_type, started_at, ended_at, member_casual from table 3 union all select rideable_type, started_at, ended_at, member_casual from table 4 union all select rideable_type, started_at, ended_at, member_casual from table 5 union all select rideable_type, started_at, ended_at, member_casual from table 6 union all select rideable_type, started_at, ended_at, member_casual from table 7 union all select rideable_type, started_at, ended_at, member_casual from table 8 union all select rideable_type, started_at, ended_at, member_casual from table 9 union all select rideable_type, started_at, ended_at, member_casual from table10 union all select rideable_type, started_at, ended_at, member_casual from table 11 union all select rideable_type, started_at, ended_at, member_casual from table 12 union all select rideable_type, started_at, ended_at, member_casual from table 13 union all select rideable_type, started_at, ended_at, member_casual from table 14 union all select rideable_type, started_at, ended_at, member_casual from table 15 union all select rideable_type, started_at, ended_at, member_casual from table 16 union all select rideable_type, started_at, ended_at, member_casual from table 17

  1. Identify trends and relationships -After I had aggregated all of the data I had chosen, I then ran SQL code to determine the trends and relationships contained within the data. After analyzing the data, I uploaded that data into google sheets to make the graphs to express those trends and make it easier to identify the key differences between Casual Riders and Annual Members.

--This shows how many casual and annual members used bikes SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM Aggregate Data Table GROUP BY member_type

![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14378099%2Fe09c3496bf38d323f8323f52f67...

Search
Clear search
Close search
Google apps
Main menu