18 datasets found
  1. Cyclistic_data_visualization

    • kaggle.com
    Updated Jun 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Woychick (2021). Cyclistic_data_visualization [Dataset]. https://www.kaggle.com/markwoychick/cyclistic-data-visualization
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mark Woychick
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I created these files and analysis as part of working on a case study for the Google Data Analyst certificate.

    Question investigated: Do annual members and casual riders use Cyclistic bikes differently? Why do we want to know?: Knowing bike usage/behavior by rider type will allow the Marketing, Analytics, and Executive team stakeholders to design, assess, and approve appropriate strategies that drive profitability.

    Content

    I used the script noted below to clean the files and then added some additional steps to create the visualizations to complete my analysis. The additional steps are noted in corresponding R Markdown file for this data set.

    Acknowledgements

    Files: most recent 1 year of data available, Divvy_Trips_2019_Q2.csv, Divvy_Trips_2019_Q3.csv, Divvy_Trips_2019_Q4.csv, Divvy_Trips_2020_Q1.csv Source: Downloaded from https://divvy-tripdata.s3.amazonaws.com/index.html

    Data cleaning script: followed this script to clean and merge files https://docs.google.com/document/d/1gUs7-pu4iCHH3PTtkC1pMvHfmyQGu0hQBG5wvZOzZkA/copy

    Note: Combined data set has 3,876,042 rows, so you will likely need to run R analysis on your computer (e.g., R Console) rather than in the cloud (e.g., RStudio Cloud)

    Inspiration

    This was my first attempt to conduct an analysis in R and create the R Markdown file. As you might guess, it was an eye-opening experience, with both exciting discoveries and aggravating moments.

    One thing I have not yet been able to figure out is how to add a legend to the map. I was able to get a legend to appear on a separate (empty) map, but not on the map you will see here.

    I am also interested to see what others did with this analysis - what were the findings and insights you found?

  2. DA Analyst Capstone Project

    • kaggle.com
    zip
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tara Jacobs (2024). DA Analyst Capstone Project [Dataset]. https://www.kaggle.com/datasets/tarajacobs/mock-user-profiles-from-social-networks
    Explore at:
    zip(8714 bytes)Available download formats
    Dataset updated
    May 18, 2024
    Authors
    Tara Jacobs
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Screenshot 2024-05-18 213045https://github.com/Tara10523/couresera.github.io/assets/54953888/7dd9c8ea-ee24-49cf-8bf4-dc921d19bcd8"> Screenshot 2024-05-18 213108https://github.com/Tara10523/couresera.github.io/assets/54953888/5fc3a63b-2142-49a9-a020-f4eded582618"> Screenshot 2024-05-18 213137https://github.com/Tara10523/couresera.github.io/assets/54953888/86f2ee28-8b9e-49fd-88c3-4064159c60da">

    Screenshot 2024-05-19 090932https://github.com/Tara10523/couresera.github.io/assets/54953888/773a416f-5abe-4aa3-8ee0-fd5bd1366e37"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/027e6041-0717-4d69-843f-76a93c6160ef">

    BigQuery | Big Query data Cleaning

    Tableau | Creating Visuals with Tableau

    Sheets | Cleaning NULL Values , creating data tables

    R studio | Organizing and cleaning data to create a visual code

    SQL SSMS | Transform, clean and manipulate Data

    Linkedin | Survey Poll

    imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/41ffca7f-5c3e-42b2-bbf0-9c857ac81c16"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/6d476522-6300-4b34-9f76-31459a3d866e"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/2cae2c1c-6e85-43f2-9cab-a77a75d4b641"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/a6a0d731-e6e1-4793-8819-c7a2c867bc86">

    Source for mock dating site pH7-Social-Dating-CMS source for mock social site tailwhip99 / social_media_site

    imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/3d963ad2-7897-4a05-9c90-0395a3efc54d"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/62726f29-3cbc-4b1d-9136-cca4ddacb087"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/8d68e5c5-b9ea-48dc-bef0-d003f18bf270"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/80af72a5-7ed8-46f1-b56a-268cd623bd1e"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/6b9dfb44-cf2b-49ca-9d07-4fbe756e2985"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/10d3fcd9-84be-43b9-a907-807ada2e6497"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/f86217cd-1aff-498c-8eb1-6f08afc1d4c2"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/b9d607ad-930a-4829-b574-c427b82c7305"> imagehttps://github.com/Tara10523/couresera.github.io/assets/54953888/b0e53006-b0fa-436b-8c2b-752cdc31c448">

  3. n

    A dataset of 5 million city trees from 63 US cities: species, location,...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +2more
    zip
    Updated Aug 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz (2022). A dataset of 5 million city trees from 63 US cities: species, location, nativity status, health, and more. [Dataset]. http://doi.org/10.5061/dryad.2jm63xsrf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 31, 2022
    Dataset provided by
    Worcester Polytechnic Institute
    Harvard University
    Cornell University
    Stanford University
    The Biota of North America Program (BONAP)
    Authors
    Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    United States
    Description

    Sustainable cities depend on urban forests. City trees -- a pillar of urban forests -- improve our health, clean the air, store CO2, and cool local temperatures. Comparatively less is known about urban forests as ecosystems, particularly their spatial composition, nativity statuses, biodiversity, and tree health. Here, we assembled and standardized a new dataset of N=5,660,237 trees from 63 of the largest US cities. The data comes from tree inventories conducted at the level of cities and/or neighborhoods. Each data sheet includes detailed information on tree location, species, nativity status (whether a tree species is naturally occurring or introduced), health, size, whether it is in a park or urban area, and more (comprising 28 standardized columns per datasheet). This dataset could be analyzed in combination with citizen-science datasets on bird, insect, or plant biodiversity; social and demographic data; or data on the physical environment. Urban forests offer a rare opportunity to intentionally design biodiverse, heterogenous, rich ecosystems. Methods See eLife manuscript for full details. Below, we provide a summary of how the dataset was collected and processed.

    Data Acquisition We limited our search to the 150 largest cities in the USA (by census population). To acquire raw data on street tree communities, we used a search protocol on both Google and Google Datasets Search (https://datasetsearch.research.google.com/). We first searched the city name plus each of the following: street trees, city trees, tree inventory, urban forest, and urban canopy (all combinations totaled 20 searches per city, 10 each in Google and Google Datasets Search). We then read the first page of google results and the top 20 results from Google Datasets Search. If the same named city in the wrong state appeared in the results, we redid the 20 searches adding the state name. If no data were found, we contacted a relevant state official via email or phone with an inquiry about their street tree inventory. Datasheets were received and transformed to .csv format (if they were not already in that format). We received data on street trees from 64 cities. One city, El Paso, had data only in summary format and was therefore excluded from analyses.

    Data Cleaning All code used is in the zipped folder Data S5 in the eLife publication. Before cleaning the data, we ensured that all reported trees for each city were located within the greater metropolitan area of the city (for certain inventories, many suburbs were reported - some within the greater metropolitan area, others not). First, we renamed all columns in the received .csv sheets, referring to the metadata and according to our standardized definitions (Table S4). To harmonize tree health and condition data across different cities, we inspected metadata from the tree inventories and converted all numeric scores to a descriptive scale including “excellent,” “good”, “fair”, “poor”, “dead”, and “dead/dying”. Some cities included only three points on this scale (e.g., “good”, “poor”, “dead/dying”) while others included five (e.g., “excellent,” “good”, “fair”, “poor”, “dead”). Second, we used pandas in Python (W. McKinney & Others, 2011) to correct typos, non-ASCII characters, variable spellings, date format, units used (we converted all units to metric), address issues, and common name format. In some cases, units were not specified for tree diameter at breast height (DBH) and tree height; we determined the units based on typical sizes for trees of a particular species. Wherever diameter was reported, we assumed it was DBH. We standardized health and condition data across cities, preserving the highest granularity available for each city. For our analysis, we converted this variable to a binary (see section Condition and Health). We created a column called “location_type” to label whether a given tree was growing in the built environment or in green space. All of the changes we made, and decision points, are preserved in Data S9. Third, we checked the scientific names reported using gnr_resolve in the R library taxize (Chamberlain & Szöcs, 2013), with the option Best_match_only set to TRUE (Data S9). Through an iterative process, we manually checked the results and corrected typos in the scientific names until all names were either a perfect match (n=1771 species) or partial match with threshold greater than 0.75 (n=453 species). BGS manually reviewed all partial matches to ensure that they were the correct species name, and then we programmatically corrected these partial matches (for example, Magnolia grandifolia-- which is not a species name of a known tree-- was corrected to Magnolia grandiflora, and Pheonix canariensus was corrected to its proper spelling of Phoenix canariensis). Because many of these tree inventories were crowd-sourced or generated in part through citizen science, such typos and misspellings are to be expected. Some tree inventories reported species by common names only. Therefore, our fourth step in data cleaning was to convert common names to scientific names. We generated a lookup table by summarizing all pairings of common and scientific names in the inventories for which both were reported. We manually reviewed the common to scientific name pairings, confirming that all were correct. Then we programmatically assigned scientific names to all common names (Data S9). Fifth, we assigned native status to each tree through reference to the Biota of North America Project (Kartesz, 2018), which has collected data on all native and non-native species occurrences throughout the US states. Specifically, we determined whether each tree species in a given city was native to that state, not native to that state, or that we did not have enough information to determine nativity (for cases where only the genus was known). Sixth, some cities reported only the street address but not latitude and longitude. For these cities, we used the OpenCageGeocoder (https://opencagedata.com/) to convert addresses to latitude and longitude coordinates (Data S9). OpenCageGeocoder leverages open data and is used by many academic institutions (see https://opencagedata.com/solutions/academia). Seventh, we trimmed each city dataset to include only the standardized columns we identified in Table S4. After each stage of data cleaning, we performed manual spot checking to identify any issues.

  4. Cleaned_Cyclistic_Data

    • kaggle.com
    zip
    Updated Dec 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanley Prawiradjaja (2021). Cleaned_Cyclistic_Data [Dataset]. https://www.kaggle.com/datasets/stanleyprawiradjaja/cleaned-cyclistic-data
    Explore at:
    zip(215550772 bytes)Available download formats
    Dataset updated
    Dec 22, 2021
    Authors
    Stanley Prawiradjaja
    Description

    This is a bike sharing data for a fictitious Cyclistic company. Actual data is based on Divvy, a Chicago bike sharing. The original data was cleaned using Postgres and Google Sheet This data has been cleaned to exclude data that's missing the station IDs and trips that has duration over 24 hours. Several columns was created to calculate trip durations and day of the week.

    --- Finding duplicate , assumed duplicated ID means duplicated data. Result = no duplicated data found---

    SELECT ride_id, COUNT(ride_id) AS ride_id_count FROM "Cyclistic" GROUP BY ride_id HAVING COUNT(ride_id)>1

    --- Extract station table for data cleaning ----

    SELECT DISTINCT start_station_name, start_station_id, end_station_id, end_station_name FROM "Cyclistic" ORDER BY start_station_name ;

    Using Google Sheet Clean start_station_id code, clean missing station name, clean station id with extra .0, Assign id to NULL station data

    ---- Update main table with cleaned station name and id ---- UPDATE "Cyclistic" SET end_lng = lng FROM "cleaned_station_info" WHERE start_station_id = id;

    ---- Original data has latitude and longitude data that varies by small amount of decimal points. To make the data more uniform, the latitude and longitude were averaged out based on the station ID and use 8 decimal points for location accuracy. Data was then checked using Google Maps to make sure data is accurate to the nearest Divvy location in Chicago. ---

    SELECT DISTINCT start_station_id, start_station_name, ROUND(AVG(start_lat)::DECIMAL,8) lat, ROUND(AVG(start_lng)::DECIMAL,8) lng FROM "Cyclistic" GROUP BY start_station_id, start_station_name ORDER BY start_station_id

    --- Create a cleaned table for export excluding data that are less than 2 minutes and more than 24 hours. Based on data where duration less than 2 minutes, the ride always ends up at the same station. It is assumed that the rider canceled the ride or had trouble using the bike, therefore this data is excluded. For data more than 24 hours it's assumed that there's an error in docking in the bicycle or other problem with logging out of the ride. New table also exclude NULL data where start_station_name and end_station_name is missing ---

    SELECT
    *
    FROM (
    SELECT
    ride_id, member_casual, rideable_type,
    start_station_id, start_station_name,
    end_station_id, end_station_name,
    started_at, ended_at,
    ended_at - started_at as duration,
    start_lat, start_lng, end_lat, end_lng\

    FROM "Cyclistic"\

    WHERE start_station_name IS NOT NULL AND end_station_name IS NOT NULL ) AS duration_tbl\

    WHERE duration >= '00:02' and duration <= '24:00' \

  5. a

    Mesoamerican Pyramid Sample Spreadsheet

    • hub.arcgis.com
    Updated Mar 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tennessee Geographic Alliance (2019). Mesoamerican Pyramid Sample Spreadsheet [Dataset]. https://hub.arcgis.com/documents/239d8d8128f8496181b68367e26eea04
    Explore at:
    Dataset updated
    Mar 7, 2019
    Dataset authored and provided by
    Tennessee Geographic Alliance
    Area covered
    Mesoamerica
    Description

    Follow these instructions to use the Google Spreadsheet in your own activity. Begin by copying the Google Spreadsheet into your own Google Drive account. Prefill the username column for your students/participants. This will help keep the students from overwriting their peers' work.Change the editing permissions for the spreadsheet and share it with your students/participants.Demonstrate what data goes into each column from the Wikipedia page. Be sure to demonstrate how to find the latitude and longitude from Wikipedia. For the images, make sure the students copy the url that ends in the appropriate file type (jpg, png, etc).Be prepared for lots of mistakes. This is a great learning opportunity to talk about data quality. When the students are done completing the spreadsheet, check the spreadsheet for obvious errors. Pay special attention to the sign of the longitude. All of those values should be negative. Download the spreadsheet as a CSV.Log into your AGO Org account.Click on the Content tab -> Add item -> From my computerUpload the CSV and save it as a layer feature. Be sure to include a few tags (Mesoamerica, pyramid, Aztec, Maya would be good ones).Once the layer has been uploaded and converted into a feature layer, click the Settings button and check Delete Protection and save. From the feature layer Overview tab, change the share settings to share with your students. I usually set up a group (something like Mesoamerica), add the students to the group, then share the feature layer with that group.From here explore the data. Symbolize the data by culture to see if there are spatial patterns to their distribution. Symbolize the data by height to see if some cultures built taller pyramids or if taller pyramids were confined to certain regions. Students can also set up the pop-ups to use the image URL in the data.From here, students can save their maps, add additional data from ArcGIS Online, create story maps, etc. If you are looking for more great data, from your ArcGIS Online map, choose Add -> Add Layer from Web and paste the following into the URL. https://services1.arcgis.com/TQSFiGYN0xveoERF/arcgis/rest/services/MesoAmerican_civs/FeatureServerImage thumbnail is from Wikipedia.

  6. Global Salary DataSet 2022

    • kaggle.com
    zip
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RicardoAUgas (2023). Global Salary DataSet 2022 [Dataset]. https://www.kaggle.com/datasets/ricardoaugas/salary-transparency-dataset-2022/data
    Explore at:
    zip(1444962 bytes)Available download formats
    Dataset updated
    Mar 1, 2023
    Authors
    RicardoAUgas
    Description

    This compensation data originated from a large google sheets document that became viral after it was posted on LinkedIn by Christen Nino De Guzman, Program Manager at Google. The LinkedIn post stated as follows:

    "Let's talk #SalaryTransparency! A few weeks ago, I encouraged others to share their salaries in an anonymous Google form and now more than 58,000 people have come together to share details of their offers in a google sheet. Everything from sign on bonus, annual salary, diverse identity and age. Many fields that traditional sites like Glassdoor don’t include.

    All of the responses were anonymous and publicly viewable. Huge shoutout to Brennan Pankiw for creating the survey! You can view responses here: https://lnkd.in/gPkYFQsN "

    The Google sheets became extremely laggy and it crashed often as a result of the number of people accesing the document and its size. Therefore, I downloaded the raw data and took it upon myself to clean up the data and create an user friendly visualization of the compensation data. Using the skills I recently acquired during my Harvard Business Analytics Program, I began to identify all of the gaps, typos, variations of company names, and false data using tools such as R studio and Tableau.

    DATA

    To anyone interested in both the raw data and the cleaned data presented here, please reach out to Ricardo Ugas.

    CONTACT

    Linkedin: https://www.linkedin.com/in/ugas/ Resume/CV: https://bit.ly/3dIUmCo Email: ricardo.ugas.analytics@gmail.com ricardo.ugasgonzalez@postgrad.manchester.ac.uk ricardo.ugas@mail.analytics.hbs.edu ugasra@miamioh.edu Phone: +1 513 526 6598

  7. Cyclistic Bike-Share Google Capstone Project

    • kaggle.com
    zip
    Updated Aug 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dibyajyoti Mehera (2022). Cyclistic Bike-Share Google Capstone Project [Dataset]. https://www.kaggle.com/datasets/dibyajyotimehera/cyclistic-bikeshare-google-capstone-project/code
    Explore at:
    zip(215455465 bytes)Available download formats
    Dataset updated
    Aug 11, 2022
    Authors
    Dibyajyoti Mehera
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains non-confidential information about the trip details of the users of an imaginary bike company called Cyclistic. The dataset collectively contains approximately 5.9 million data. This is quite a huge amount of data to handle through spreadsheet applications such as MS Excel or Google Sheets, etc. Although it is possible to complete the entire analysis on MS Excel, I would recommend to use BigQuery or R to clean and analyze the data effectively.

    The dataset contains following files

    202107-divvy-tripdata.csv 202108-divvy-tripdata.csv 202109-divvy-tripdata.csv 202110-divvy-tripdata.csv 202111-divvy-tripdata.csv 202112-divvy-tripdata.csv 202201-divvy-tripdata.csv 202202-divvy-tripdata.csv 202203-divvy-tripdata.csv 202204-divvy-tripdata.csv 202205-divvy-tripdata.csv 202206-divvy-tripdata.csv

  8. Cyclistic Bike Share: A Case Study

    • kaggle.com
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Kellerhals (2023). Cyclistic Bike Share: A Case Study [Dataset]. https://www.kaggle.com/datasets/caskelle/cyclistic-bike-share-a-case-study/code
    Explore at:
    zip(269575250 bytes)Available download formats
    Dataset updated
    Jul 25, 2023
    Authors
    Casey Kellerhals
    Description

    The Mission Statement

    Cyclistic, a bike sharing company, wants to analyze their user data to find the main differences in behavior between their two types of users. The Casual Riders are those who pay for each ride and the Annual Member who pays a yearly subscription to the service.

    PHASE 1 : ASK

    Key objectives: 1.Identify The Business Task: - Cyclistic wants to analyze the data to find the key differences between Casual Riders and Annual Members. The goal of this project is to reach out to the casual riders and incentivize them into paying for the annual subscription.

    1. Consider Key Stakeholders:
      • The key stakeholders in this project are the executive team and the director of marketing, Lily Moreno.

    PHASE 2 : Prepare

    Key objectives: 1. Download Data And Store It Appropriately - Downloaded the data as .csv files, which were saved in their own folder to keep everything organized. I then uploaded those files into BigQuery for cleaning and analysis. For this project I downloaded all of 2022 and up to May of 2023, as this is the most recent data that I have access to.

    1. Identify How It's Organized

      • The data is organized into months, from 01-2022 to 05-2023.
    2. Sort and Filter The Data and Determine The Credibility of The Data

      • For this data I used BigQuery and SQL in order to sort, filter and analyze the credibility of the data. The data is collected first hand by Cyslistic and there is a lot of information to work with. I filtered out the data that I wanted to work with, the data that I chose were the types of bikes, the types of members and the date the bikes were used.

    PHASE 3 : Process

    Key objectives: 1.Clean The Data and Prepare The Data For Analysis: -I used some simple SQL code in order to determine that no members were missing, that no information was repeated and that there were no misspellings in the data as well.

    --no misspelling in either member or casual. This ensures that all results will not have missing information. SELECT DISTINCT member_casual
    FROM table

    --This shows how many casual riders and members used the service, should add up to the numb of rows in the dataset SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM table GROUP BY member_type

    --Shows that every bike has a distinct ID. SELECT DISTINCT ride_id FROM table

    --Shows that there are no typos in the types of bikes, so no data will be missing from results. SELECT DISTINCT rideable_type FROM table

    PHASE 4 : Analyze

    Key objectives: 1. Aggregate Your Data So It's Useful and Accessible -I had to write some SQL code so that I could combine all the data from the different files I had uploaded onto BigQuery

    select rideable_type, started_at, ended_at, member_casual from table 1 union all select rideable_type, started_at, ended_at, member_casual from table 2 union all select rideable_type, started_at, ended_at, member_casual from table 3 union all select rideable_type, started_at, ended_at, member_casual from table 4 union all select rideable_type, started_at, ended_at, member_casual from table 5 union all select rideable_type, started_at, ended_at, member_casual from table 6 union all select rideable_type, started_at, ended_at, member_casual from table 7 union all select rideable_type, started_at, ended_at, member_casual from table 8 union all select rideable_type, started_at, ended_at, member_casual from table 9 union all select rideable_type, started_at, ended_at, member_casual from table10 union all select rideable_type, started_at, ended_at, member_casual from table 11 union all select rideable_type, started_at, ended_at, member_casual from table 12 union all select rideable_type, started_at, ended_at, member_casual from table 13 union all select rideable_type, started_at, ended_at, member_casual from table 14 union all select rideable_type, started_at, ended_at, member_casual from table 15 union all select rideable_type, started_at, ended_at, member_casual from table 16 union all select rideable_type, started_at, ended_at, member_casual from table 17

    1. Identify trends and relationships -After I had aggregated all of the data I had chosen, I then ran SQL code to determine the trends and relationships contained within the data. After analyzing the data, I uploaded that data into google sheets to make the graphs to express those trends and make it easier to identify the key differences between Casual Riders and Annual Members.

    --This shows how many casual and annual members used bikes SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM Aggregate Data Table GROUP BY member_type

    ![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14378099%2Fe09c3496bf38d323f8323f52f67...

  9. Boston Celtics Shooting Variables

    • kaggle.com
    zip
    Updated Oct 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Mimnaugh (2023). Boston Celtics Shooting Variables [Dataset]. https://www.kaggle.com/datasets/johnmimnaugh/boston-celtics-shooting-variables/data
    Explore at:
    zip(4738 bytes)Available download formats
    Dataset updated
    Oct 2, 2023
    Authors
    John Mimnaugh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Source - All data was collected from the NBA.com website and Basketball-Reference.com - To view the raw data, and steps I took to clean and format it, you can click the link below https://docs.google.com/spreadsheets/d/1bJnc1n-pXVjtqKul1NnjOq0mYl9-7FZy_CbM2gTmTLA/edit?usp=sharing

    Context All data is from the 2022-2023, 82 game regular season

    Inspiration I gathered this data to perform an analysis with the goal being to answer the questions: - From where did the Boston Celtics shoot the highest field goal percent? - When did the Boston Celtics shoot the highest field goal percent? - Under what conditions did the Boston Celtics shoot the highest field goal percent?

  10. Mobile Legend : Bang Bang Draft Picks

    • kaggle.com
    zip
    Updated Jul 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerry Zani (2023). Mobile Legend : Bang Bang Draft Picks [Dataset]. https://www.kaggle.com/datasets/gerryzani/mlbb-draft-breakdown-patch-1768/
    Explore at:
    zip(3921479 bytes)Available download formats
    Dataset updated
    Jul 9, 2023
    Authors
    Gerry Zani
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This data is a transformed version of the following dataset.

    Mobile Legends Match Results Provided by MUHAMMAD RIZQI NUR

    Huge shoutout to Rizqi and his wonderful work, providing this dataset. To see how he get this dataset on the first place, please see the original dataset (Provided by link above)

    Aim : - This dataset is to analyze the draft picks using tableau to answer these questions 1. " If I have to play in a party of 2, which draft has highest chance of winning ? " 2. " If I have to play in a party of 3, which draft has the highest chance of winning ?"
    3. " What are some heroes combinations that I need to avoid ?"

    Additional Aim : 1. I uploaded this dataset as a way for me to get some public reviews of how I clean the data. 2. To share my step by step process to get the final output.

    Notes & Caution : 1. I tried to clean the dataset using python, but get stuck midway on sorting problem. Link , Please excuse my lack of competence. 2. Hence, I start cleaning the data manually using spreadsheet. 3. Due to the google spreadsheet maximum cell limitation, I have to split the work process in 2 different files. 4. A lot of value paste copying to get the final output file 'MLBB Draft Sorted Cleaned.xlsx' so please use the formulas with caution.

    Transformation Steps : 1. Remove duplicates by column 'battleId'. 2. Find and replace "Chang'e" into 'Change' (Changing double quote to single quote, and removing single quote on the original string) because it interferes with the Regex formula. 3. Split the data into winning drafts and losing drafts instead of 'left' and 'right' and 'win status' for each match (Win = Left side is winning draft, Lose = Right side is winning draft). 4. Use Regex to split list into individual cell for each hero pick 5. Create a sheet to list the names of individual heroes. 6. Changing each name into number id (To speed up the calculation, and ensure a better sorting) 7. Transpose into wide data (battleid as the colname, hero pick as rows). 8. Sort each column in ascending order. 9 Transpose back to long data. 10. Change back the number id into respective hero name. 11. Repeat process for the losing draft. 12. Combine both winning draft and losing draft into a single sheet.

  11. GOOGLE Reports & Stock Prices 2004-TODAY

    • kaggle.com
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emre Kaan Yılmaz (2025). GOOGLE Reports & Stock Prices 2004-TODAY [Dataset]. https://www.kaggle.com/datasets/emrekaany/googl-stock-price-and-financials
    Explore at:
    zip(117351 bytes)Available download formats
    Dataset updated
    Nov 21, 2025
    Authors
    Emre Kaan Yılmaz
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🇺🇸 Alphabet Inc. (GOOGL) Comprehensive Financial Dataset

    📌 Overview

    Welcome to the GOOGL Financial Dataset! This dataset provides clear and easy-to-use quarterly financial statements (income statement, balance sheet, and cash flow) along with daily historical stock prices.

    As a data engineer double majored with economics, I'll personally analyze and provide constructive feedback on all your work using this dataset. Let's dive in and explore Google's financial journey together!

    🗃 Files Included

    • googl_daily_prices.csv: Historical daily stock prices.
    • googl_income_statement.csv: Quarterly income statements.
    • googl_balance_sheet.csv: Quarterly balance sheets.
    • googl_cash_flow_statement.csv: Quarterly cash flow statements.

    📘 About This Dataset

    This dataset offers a unique blend of long-term market performance and detailed financial metrics:

    • Time Series of Daily Prices: Track the historical performance of GOOGL stock from its early days up until now.
    • Quarterly Financial Statements: Dive into the income statements, balance sheets, and cash flow statements that reflect the company’s financial evolution.
    • Integrated Insights: Ideal for comprehensive financial analyses, forecasting, model building, and exploring the dynamic interplay between market performance and underlying business operations.

    Whether you're building predictive models, performing deep-dive financial analysis, or exploring the evolution of one of the world's most innovative tech giants, this dataset is your go-to resource for clean, well-organized, and rich financial data.

    💡 Tips for Using the Dataset

    • Visualize Stock Trends: Plot daily prices to quickly understand stock movements.
    • Financial Analysis: Compare income, balance sheet, and cash flow data to spot financial trends and health.
    • Predictive Modeling: Use this dataset to build forecasting models and predict future performance.
    • Combine Data: Merge price data with financial statements to analyze relationships and uncover deeper insights.

    🔗 Works Great with My GOOGL News Dataset!

    For a more comprehensive financial analysis, pair this dataset with my other Kaggle dataset:
    👉 Google (Alphabet Inc.) Daily News — 2000 to 2025

    That dataset includes:

    • Daily news articles from Finnhub
    • Headlines, summaries, sources, and timestamps
    • Covering GOOGL from 2000 to 2025

    Combining both datasets unlocks powerful analysis such as:

    • Correlating news sentiment with stock price movements
    • Studying the impact of earnings reports and product launches
    • Developing event-driven forecasting models

    Together, they give you everything you need for news + financial signal modeling.

    📝 Column Descriptions

    📈 googl_daily_prices.csv

    • date: Trading date.
    • 1. open: Opening stock price on the trading day.
    • 2. high: Highest stock price on the trading day.
    • 3. low: Lowest stock price on the trading day.
    • 4. close: Closing stock price on the trading day.
    • 5. volume: Number of shares traded on the day.

    📊 googl_income_statement.csv

    • fiscalDateEnding: Date marking the end of fiscal quarter.
    • reportedCurrency: Currency used in reporting (USD).
    • grossProfit: Revenue minus the cost of goods sold.
    • totalRevenue: Total income generated from operations.
    • costOfRevenue: Direct costs attributable to the production of goods.
    • costofGoodsAndServicesSold: Costs directly associated with goods sold.
    • operatingIncome: Earnings after operating expenses deducted.
    • sellingGeneralAndAdministrative: Administrative and general sales costs.
    • researchAndDevelopment: Expenses related to research and innovation.
    • operatingExpenses: Total operational costs.
    • investmentIncomeNet: Net income from investments.
    • netInterestIncome: Income earned from interest after deducting interest paid.
    • interestIncome: Income generated from interest-bearing investments.
    • interestExpense: Expenses from interest payments.
    • nonInterestIncome: Income from non-interest-bearing activities.
    • otherNonOperatingIncome: Additional income outside regular operations.
    • depreciation: Reduction in value of assets over time.
    • depreciationAndAmortization: Combined depreciation and amortization costs.
    • incomeBeforeTax: Income before taxation.
    • incomeTaxExpense: Taxes paid on earnings.
    • interestAndDebtExpense: Interest paid on debts.
    • netIncomeFromContinuingOperations: Profit from ongoing operations.
    • comprehensiveIncomeNetOfTax: Income after comprehensive expenses.
    • ebit: Earnings before interest and taxes.
    • ebitda: Earnings before interest, taxes, depreciation, and...
  12. Clean dirty containers in Montevideo

    • kaggle.com
    zip
    Updated Aug 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Laguna (2021). Clean dirty containers in Montevideo [Dataset]. https://www.kaggle.com/rodrigolaguna/clean-dirty-containers-in-montevideo
    Explore at:
    zip(2862653769 bytes)Available download formats
    Dataset updated
    Aug 21, 2021
    Authors
    Rodrigo Laguna
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Montevideo
    Description

    Context

    It all started during last #StayAtHome during 2020's pandemic: some neighbors worried about trash in Montevideo's container.

    The goal is to automatically detect clean from dirty containers to ask for maintenance.

    Want to know more about the entire process? Checkout this thread on how it began, and this other with respect to version 6 update process.

    Content

    Data is splitted in training/testing split, they are independent. However, each split contains several near duplicate images (typicaly, same container from different perspectives or days). Image sizes differ a lot among them.

    There are four major sources: * Images taken from Google Street View, they are 600x600 pixels, automatically collected through its API. * Images contributed by individual persons, most of which I took my self. * Images taken from social networks (Twitter & Facebook) and news. * Images contributed by pormibarrio.uy - 17-11-2020

    Images were took from green containers, the most popular in Montevideo, but also widely used in some other cities.

    Current version (clean-dirty-garbage-containers-V6) is also available here or you can download it as follows: wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1mdfJoOrO6MeTc3eMEjIDkAKlwK9bUFg6' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1 /p')&id=1mdfJoOrO6MeTc3eMEjIDkAKlwK9bUFg6" -O clean-dirty-garbage-containers-V6.zip && rm -rf /tmp/cookies.txt This is specially useful if you want to download it in Google Colab.

    This repo contains the code used during its building and documentation process, including the baselines for the purposed tasks.

    Dataset on news

    Since this is a hot topic in Montevideo, specially nowadays, with elections next week, it catch some attention from local press:

    Acknowledgements

    Thanks to every single person who give me images from their containers. Special thanks to my friend Diego, whose idea of using google street view as a source of data really contributed to increase the dataset. And finally to my wife, who supported me during this project and contributed a lot to this dataset.

    Citation

    If you use these data in a publication, presentation, or other research project or product, please use the following citation:

    Laguna, Rodrigo. 2021. Clean dirty containers in Montevideo - Version 6.1. url: https://www.kaggle.com/rodrigolaguna/clean-dirty-containers-in-montevideo

    @dataset{RLaguna-clean-dirty:2021,
    author = {Rodrigo Laguna},
    title = {Clean dirty containers in Montevideo},
    year = {2021},
    url = {https://www.kaggle.com/rodrigolaguna/clean-dirty-containers-in-montevideo},
    version = {6.1}
    }
    

    Contact

    I'm on twitter, @ro_laguna_ or write to me r.laguna.queirolo at outlook.com

    Future steps:

    • Add images from mapillary, an open source project similar to GoogleStreetView.
    • Keep going on with manually taken images.
    • Add any image from anyone who would like to contribute.
    • Develop & deploy a bot for automatically report container's status.
    • Translate docs to Spanish
    • Crop images to let one and only one container per image, taking most of the image

    Changelog

    • 19-05-2020: V1 - Initial version
    • 20-05-2020: V2 - Include more training samples
    • 12-09-2020: V3 - Include more training (+676) & testing (+64) samples:

      • train/clean from 574 to 1005 (+431)
      • train/dirty from 365 to 610 (+245)
      • test/clean from 100 to 128 (+28)
      • test/dirty from 100 to 136 (+36)
    • 21-12-2020: V4 - Include more training (+367) & testing (+794) samples, including ~400...

  13. Syrian people's interest in social media platforms

    • kaggle.com
    zip
    Updated Aug 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zualfekar Al Janzeer (2021). Syrian people's interest in social media platforms [Dataset]. https://www.kaggle.com/datasets/zualfiqaraljanzir/syrian-peoples-interest-in-social-media-platforms
    Explore at:
    zip(1226 bytes)Available download formats
    Dataset updated
    Aug 4, 2021
    Authors
    Zualfekar Al Janzeer
    Area covered
    Syria
    Description

    Context

    Syria belongs to the third world countries, and it is one of the countries that have recently entered technology (interest in the Internet, social media platforms, scientific research, etc.) I obtained this data through a survey I conducted on Facebook with Syrian citizens, whether residing inside or outside Syrian territory. The number of those who conducted the survey is few, soon the survey will be more comprehensive and communication will be done through all available social media platforms

    Content

    The survey was for 10 days, using Google Sheet, 60 samples were obtained, some answers were not complete, which will force the researcher to clean the data

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    I am still a beginner , so I would like to take some experience from here and help me to improve and increase my skills with data analysis

  14. KPMG Virtual Internship

    • kaggle.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NEHA RAUTELA (2023). KPMG Virtual Internship [Dataset]. https://www.kaggle.com/datasets/neharautela/kpmg-virtual-internship
    Explore at:
    zip(2000950 bytes)Available download formats
    Dataset updated
    Jan 29, 2023
    Authors
    NEHA RAUTELA
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    This project is a Data Analytical assignment to analyze the data of a customer of KPMG, The Spyrocket Central, who deals in the selling of various brands of bicycle in all the four states of Australia. They had issues with stagnating sales and needed help in the following queries: • What are the trends in the underlying data? • Which customer segment has the highest customer value? • What do you propose should be Sprocket Central Pty Ltd ’s marketing and growth strategy? • What additional external datasets may be useful to obtain greater insights into customer preferences and propensity to purchase the products?

    Dataset

    The customers dataset consisted of the following data. • Transaction: Consisted Of data of transactions in the year 2017 along with transaction id, product id, brand, product class, product size, transaction date, product cost etc. • New customer list and Customer Demographics and consisting of addresses, job industry, customer names, job title, gender, wealth segment etc.

    Process

    The dataset was thoroughly cleaned and formatted due to the following data inconsistencies using Spreadsheets. •Transactions Sheet: - | column with issues online order - empty brand- empty product size -empty product class -empty product line - empty standard cost- empty product first sold- empty

    • Customer Demographic Sheet: - column with issues gender- empty DOB- Inconsistent data job industry category- empty

    • Customer Address Sheet: - column with issues states- abbreviations of states in place of state name

    Analysis

    After thoroughly analyzing the clean data, the following major points were paid attention to derive insights and ameliorate the business strategy.
    • State-wise analysis to bring out the states with max and min sales • Most sold bikes according to types i.e., mountain bike, road etc. • Customers in different job industries • Customers in different age groups • Customers from different wealth segments.

    Presentation

    Insights of the analysis are presented in the presentation below. https://docs.google.com/presentation/d/1ECUmK4rGncjPVrRexL4kWPPIOFjdoXIqJkegYtC_wrk/edit?usp=share_link

  15. Data from: Novel Corona Virus 2019 Dataset

    • kaggle.com
    zip
    Updated Jun 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shivan kumar (2020). Novel Corona Virus 2019 Dataset [Dataset]. https://www.kaggle.com/shivan118/covid-19-world-jiteega
    Explore at:
    zip(455904 bytes)Available download formats
    Dataset updated
    Jun 26, 2020
    Authors
    shivan kumar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Johns Hopkins University has made an excellent dashboard using the affected case data. Data is extracted from the google sheets associated and made available here.

    This data is available as CSV files in the Johns Hopkins Github repository. Please refer to the Github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has daily level information on the number of affected cases, deaths, and recovery from 2019 novel coronavirus. Please note that this is a time-series data and so the number of cases on any given day is the cumulative number.

    The data is available from 22 Jan 2020 to 28 May 2020.

    Column Description

    The main file in this dataset is covid_19_data_cleaned.csv and the detailed descriptions are below. covid_19_data_cleaned.csv

    • ObservationDate - Date of the observation in MM/DD/YYYY
    • Province/State - Province or state of the observation (Could be empty when missing)
    • Country/Region - Country of observation
    • Last Update - Time in UTC at which the row is updated for the given province or country. (Not standardized and so please clean before using it)
    • Confirmed - Cumulative number of confirmed cases till that date
    • Deaths - Cumulative number of deaths till that date
    • Active - Cumulative number of Active cases till that date
  16. Covid19-India-Dataset

    • kaggle.com
    zip
    Updated Jun 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Santhoshkumar (2020). Covid19-India-Dataset [Dataset]. https://www.kaggle.com/santhoshkumarv/covid19indiadata
    Explore at:
    zip(51146818 bytes)Available download formats
    Dataset updated
    Jun 1, 2020
    Authors
    Santhoshkumar
    Area covered
    India
    Description

    Context

    From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

    So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

    Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

    Edited: Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

    The data is available from 22 Jan, 2020.

    Column Description

    Province/State - Province or state of the observation (Could be empty when missing) CountryReg - Country of observation Last Update - Time in UTC at which the row is updated for the given province or country. (Not standardised and so please clean before using it) Confirmed - Cumulative number of confirmed cases till that date Deaths - Cumulative number of of deaths till that date Recovered - Cumulative number of recovered cases till that date Lon Lat week - Week Number (1 To 52) Weeks Per Year

    Acknowledgements

    Johns Hopkins University for making the data available for educational and academic research purposes.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  17. Anime Database for Recommendation system

    • kaggle.com
    zip
    Updated Jun 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vishal mane (2020). Anime Database for Recommendation system [Dataset]. https://www.kaggle.com/vishalmane109/anime-recommendations-database
    Explore at:
    zip(3705416 bytes)Available download formats
    Dataset updated
    Jun 20, 2020
    Authors
    vishal mane
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset contains a total of 16737 unique animes. The reason for creating this dataset is the requirement of a clean dataset of Anime. I found a few datasets on anime, most of the datasets had the major anime but some dataset 1) doesn't have 'Genre' or 'Synopsis' of anime. For content-based recommendation, it is helpful if we have more information about anime 2) have duplicate data 3) missing data is represented by different notations.

    Content

    Anime_id :anime Id (as per myanimelist.net) Title : name of anime Genre :Main genre
    Synopsis :Brief Discription Type
    Producer Studio Rating :Rating of anime as pe myanimelist.net/ ScoredBy : Total no user scored given anime Popularity :Rank of anime based on popularity Members :No of members added given anime on their list Episodes : No. of episodes Source
    Aired Link

    Acknowledgements

    This dataset is a combination of 2 datasets

    1. https://docs.google.com/spreadsheets/d/1brguO5nGfXS-Fr1Xcf3pqPTQoBUPGLTYM_EMAA9yJFw/export?format=csv&id=1brguO5nGfXS-Fr1Xcf3pqPTQoBUPGLTYM_EMAA9yJFw&gid=0
    2. https://www.kaggle.com/CooperUnion/anime-recommendations-database
  18. Chicken Republic Lagos Sales Dataset –

    • kaggle.com
    zip
    Updated May 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatolu Peter (2025). Chicken Republic Lagos Sales Dataset – [Dataset]. https://www.kaggle.com/datasets/olagokeblissman/chicken-republic-lagos-sales-dataset
    Explore at:
    zip(132062 bytes)Available download formats
    Dataset updated
    May 31, 2025
    Authors
    Fatolu Peter
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Lagos
    Description

    Chicken Republic Lagos Sales Dataset – Fast Food Sales Analysis (NG)

    📝 Dataset Overview: This dataset captures real-world retail transaction data from Chicken Republic outlets in Lagos, Nigeria. It provides detailed insights into fast food sales performance across different product categories, with columns that track revenue, quantity sold, and profit.

    Ideal for anyone looking to:

    Practice sales analysis

    Build business intelligence dashboards

    Forecast product performance

    Analyze profit margins and pricing

    🔍 Dataset Features: Column Name Description Date Date of each transaction Location Outlet or branch where the sale occurred Product Category Category of the product sold (e.g., Meals, Drinks, Snacks) Product Name of the specific product Quantity Sold Number of units sold Unit Price (NGN) Price per unit in Nigerian Naira Total Sales (NGN) Quantity × Unit Price Profit (NGN) Estimated profit from the sale

    🎯 Use Cases: Build Power BI dashboards with slicers and filters by product category

    Perform profitability analysis per outlet

    Create forecast models to predict sales

    Analyze customer preferences based on high-selling items

    Create data storytelling visuals for retail presentations

    🛠 Tools You Can Use: Excel / Google Sheets

    Power BI / Tableau

    Python (Pandas, Matplotlib, Seaborn)

    SQL for querying sales trends

    👤 Creator: Fatolu Peter (Emperor Analytics) Working actively on real-world retail, healthcare, and social media analytics. This dataset is part of my ongoing data project series (#Project 9 and counting!) 🚀

    ✅ LinkedIn Post: 🚨 New Dataset Drop for Analysts & BI Enthusiasts 📊 Chicken Republic Lagos Sales Dataset – Now on Kaggle! 🔗 Access here

    Whether you’re a student, analyst, or business developer—this dataset gives you a clean structure for performing end-to-end sales analysis:

    ✅ Track daily sales ✅ Visualize profit by product category ✅ Create Power BI dashboards ✅ Forecast best-selling items

    Columns include: Date | Location | Product | Quantity Sold | Unit Price | Total Sales | Profit

    Built with love from Lagos 🧡 Let’s drive real insights with real data. Tag me if you build something amazing—I’d love to see it!

    SalesAnalytics #ChickenRepublic #PowerBI #RetailData #KaggleDataset #NigerianBusiness #BusinessIntelligence #FatoluPeter #EmperorAnalytics #Project9 #DataForPractice

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mark Woychick (2021). Cyclistic_data_visualization [Dataset]. https://www.kaggle.com/markwoychick/cyclistic-data-visualization
Organization logo

Cyclistic_data_visualization

Visualization data set for Google Data Analyst certification, Case Study 1

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mark Woychick
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

I created these files and analysis as part of working on a case study for the Google Data Analyst certificate.

Question investigated: Do annual members and casual riders use Cyclistic bikes differently? Why do we want to know?: Knowing bike usage/behavior by rider type will allow the Marketing, Analytics, and Executive team stakeholders to design, assess, and approve appropriate strategies that drive profitability.

Content

I used the script noted below to clean the files and then added some additional steps to create the visualizations to complete my analysis. The additional steps are noted in corresponding R Markdown file for this data set.

Acknowledgements

Files: most recent 1 year of data available, Divvy_Trips_2019_Q2.csv, Divvy_Trips_2019_Q3.csv, Divvy_Trips_2019_Q4.csv, Divvy_Trips_2020_Q1.csv Source: Downloaded from https://divvy-tripdata.s3.amazonaws.com/index.html

Data cleaning script: followed this script to clean and merge files https://docs.google.com/document/d/1gUs7-pu4iCHH3PTtkC1pMvHfmyQGu0hQBG5wvZOzZkA/copy

Note: Combined data set has 3,876,042 rows, so you will likely need to run R analysis on your computer (e.g., R Console) rather than in the cloud (e.g., RStudio Cloud)

Inspiration

This was my first attempt to conduct an analysis in R and create the R Markdown file. As you might guess, it was an eye-opening experience, with both exciting discoveries and aggravating moments.

One thing I have not yet been able to figure out is how to add a legend to the map. I was able to get a legend to appear on a separate (empty) map, but not on the map you will see here.

I am also interested to see what others did with this analysis - what were the findings and insights you found?

Search
Clear search
Close search
Google apps
Main menu