Facebook
TwitterCyclistic, a bike sharing company, wants to analyze their user data to find the main differences in behavior between their two types of users. The Casual Riders are those who pay for each ride and the Annual Member who pays a yearly subscription to the service.
Key objectives: 1.Identify The Business Task: - Cyclistic wants to analyze the data to find the key differences between Casual Riders and Annual Members. The goal of this project is to reach out to the casual riders and incentivize them into paying for the annual subscription.
Key objectives: 1. Download Data And Store It Appropriately - Downloaded the data as .csv files, which were saved in their own folder to keep everything organized. I then uploaded those files into BigQuery for cleaning and analysis. For this project I downloaded all of 2022 and up to May of 2023, as this is the most recent data that I have access to.
Identify How It's Organized
Sort and Filter The Data and Determine The Credibility of The Data
Key objectives: 1.Clean The Data and Prepare The Data For Analysis: -I used some simple SQL code in order to determine that no members were missing, that no information was repeated and that there were no misspellings in the data as well.
--no misspelling in either member or casual. This ensures that all results will not have missing information.
SELECT
DISTINCT member_casual
FROM
table
--This shows how many casual riders and members used the service, should add up to the numb of rows in the dataset SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM table GROUP BY member_type
--Shows that every bike has a distinct ID. SELECT DISTINCT ride_id FROM table
--Shows that there are no typos in the types of bikes, so no data will be missing from results. SELECT DISTINCT rideable_type FROM table
Key objectives: 1. Aggregate Your Data So It's Useful and Accessible -I had to write some SQL code so that I could combine all the data from the different files I had uploaded onto BigQuery
select rideable_type, started_at, ended_at, member_casual from table 1 union all select rideable_type, started_at, ended_at, member_casual from table 2 union all select rideable_type, started_at, ended_at, member_casual from table 3 union all select rideable_type, started_at, ended_at, member_casual from table 4 union all select rideable_type, started_at, ended_at, member_casual from table 5 union all select rideable_type, started_at, ended_at, member_casual from table 6 union all select rideable_type, started_at, ended_at, member_casual from table 7 union all select rideable_type, started_at, ended_at, member_casual from table 8 union all select rideable_type, started_at, ended_at, member_casual from table 9 union all select rideable_type, started_at, ended_at, member_casual from table10 union all select rideable_type, started_at, ended_at, member_casual from table 11 union all select rideable_type, started_at, ended_at, member_casual from table 12 union all select rideable_type, started_at, ended_at, member_casual from table 13 union all select rideable_type, started_at, ended_at, member_casual from table 14 union all select rideable_type, started_at, ended_at, member_casual from table 15 union all select rideable_type, started_at, ended_at, member_casual from table 16 union all select rideable_type, started_at, ended_at, member_casual from table 17
--This shows how many casual and annual members used bikes SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM Aggregate Data Table GROUP BY member_type
Facebook
TwitterThis public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. This dataset contains the following tables: games_wide (every pitch, steal, or lineup event for each at bat in the 2016 regular season), games_post_wide(every pitch, steal, or lineup event for each at-bat in the 2016 post season), and schedules ( the schedule for every team in the regular season). The schemas for the games_wide and games_post_wide tables are identical. With this data you can effectively replay a game and rebuild basic statistics for players and teams. Note: This data was built via a denormalization process over raw game log files which may contain scoring errors and in some cases missing data. For official scoring and statistical information please consult mlb.com , baseball-reference.com , or sportradar.com . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Includes weather factors like temperature, wind speed and uv index. Air quality data includes pollutant_value.
Where is the data from and how is it extracted? quality-of-life repo
full_weather.csv - Contains main weather data from 1996-2023 July
air_quality_indicators.csv - Info about air quality indicators
air_quality_warnings.csv - Info about air quality warning levels
uv_info - Info about uv indexes and their dangers
full_locations.csv - Info about each location extracted
The dataset is accessible using BigQuery/SQL or CSV . The cloud dataset is updated everyday.
Try out the public notebook: https://www.kaggle.com/code/shahmirvarqha/weather-bigquery
There are several fact tables (main data is here):
prod.air_quality - All air quality data
prod.weather - Only airport weather stations
prod.personal_weather - Only personal weather stations
prod.uv - Merges UV data with warnings and labels
prod.full_weather_places - Similar to full_weather.csv file
The following tables have additional information:
prod.state_locations
prod.full_locations - Similar to full_locations.csv file
prod.city_states
prod.city_places
prod.air_quality_warnings
prod.air_quality_indicators
prod.uv_info
*Not all days and hours are available (especially earlier on, there is a lot of missing data).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterCyclistic, a bike sharing company, wants to analyze their user data to find the main differences in behavior between their two types of users. The Casual Riders are those who pay for each ride and the Annual Member who pays a yearly subscription to the service.
Key objectives: 1.Identify The Business Task: - Cyclistic wants to analyze the data to find the key differences between Casual Riders and Annual Members. The goal of this project is to reach out to the casual riders and incentivize them into paying for the annual subscription.
Key objectives: 1. Download Data And Store It Appropriately - Downloaded the data as .csv files, which were saved in their own folder to keep everything organized. I then uploaded those files into BigQuery for cleaning and analysis. For this project I downloaded all of 2022 and up to May of 2023, as this is the most recent data that I have access to.
Identify How It's Organized
Sort and Filter The Data and Determine The Credibility of The Data
Key objectives: 1.Clean The Data and Prepare The Data For Analysis: -I used some simple SQL code in order to determine that no members were missing, that no information was repeated and that there were no misspellings in the data as well.
--no misspelling in either member or casual. This ensures that all results will not have missing information.
SELECT
DISTINCT member_casual
FROM
table
--This shows how many casual riders and members used the service, should add up to the numb of rows in the dataset SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM table GROUP BY member_type
--Shows that every bike has a distinct ID. SELECT DISTINCT ride_id FROM table
--Shows that there are no typos in the types of bikes, so no data will be missing from results. SELECT DISTINCT rideable_type FROM table
Key objectives: 1. Aggregate Your Data So It's Useful and Accessible -I had to write some SQL code so that I could combine all the data from the different files I had uploaded onto BigQuery
select rideable_type, started_at, ended_at, member_casual from table 1 union all select rideable_type, started_at, ended_at, member_casual from table 2 union all select rideable_type, started_at, ended_at, member_casual from table 3 union all select rideable_type, started_at, ended_at, member_casual from table 4 union all select rideable_type, started_at, ended_at, member_casual from table 5 union all select rideable_type, started_at, ended_at, member_casual from table 6 union all select rideable_type, started_at, ended_at, member_casual from table 7 union all select rideable_type, started_at, ended_at, member_casual from table 8 union all select rideable_type, started_at, ended_at, member_casual from table 9 union all select rideable_type, started_at, ended_at, member_casual from table10 union all select rideable_type, started_at, ended_at, member_casual from table 11 union all select rideable_type, started_at, ended_at, member_casual from table 12 union all select rideable_type, started_at, ended_at, member_casual from table 13 union all select rideable_type, started_at, ended_at, member_casual from table 14 union all select rideable_type, started_at, ended_at, member_casual from table 15 union all select rideable_type, started_at, ended_at, member_casual from table 16 union all select rideable_type, started_at, ended_at, member_casual from table 17
--This shows how many casual and annual members used bikes SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM Aggregate Data Table GROUP BY member_type