13 datasets found
  1. SPORTS_DATA_ANALYSIS_ON_EXCEL

    • kaggle.com
    zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
    Explore at:
    zip(1203633 bytes)Available download formats
    Dataset updated
    Dec 12, 2024
    Authors
    Nil kamal Saha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PROJECT OBJECTIVE

    We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

    Questions (KPIs)

    TASK 1: STANDARDIZING THE DATASET

    • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
    • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
    • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
    • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
    • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

    TASK 2: DATA FORMATING

    • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
    • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
    • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
    • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

    TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

    • In COLUMNS; Group : GENDER.
    • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
    • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

    TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

    • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
    • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
    • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

    TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

    • Change the report layout to TABULAR form.
    • Remove expand and collapse buttons.
    • Remove GRAND TOTALs.
    • Allow user to filter the data by SPORT LOCATION.

    Process

    • Verify data for any missing values and anomalies, and sort out the same.
    • Made sure data is consistent and clean with respect to data type, data format and values used.
    • Created pivot tables according to the questions asked.
  2. VLOOKUP & PIVOT TABLE

    • kaggle.com
    zip
    Updated May 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Derrick Mallison (2023). VLOOKUP & PIVOT TABLE [Dataset]. https://www.kaggle.com/datasets/derrickmallison/vlookup-and-pivot-table
    Explore at:
    zip(58855 bytes)Available download formats
    Dataset updated
    May 3, 2023
    Authors
    Derrick Mallison
    Description

    Dataset

    This dataset was created by Derrick Mallison

    Contents

  3. Europe Bike Store Sales

    • kaggle.com
    zip
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PrepInsta Technologies (2023). Europe Bike Store Sales [Dataset]. https://www.kaggle.com/datasets/prepinstaprime/europe-bike-store-sales/versions/1
    Explore at:
    zip(1209546 bytes)Available download formats
    Dataset updated
    Mar 21, 2023
    Authors
    PrepInsta Technologies
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Europe
    Description

    In the Europe bikes dataset, Extract the insight into sales in each country and each state of their countries using Excel.

  4. HUD FHA Single Family Portfolio Snapshot

    • datalumos.org
    • openicpsr.org
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Housing and Urban Development (2025). HUD FHA Single Family Portfolio Snapshot [Dataset]. http://doi.org/10.3886/E220223V2
    Explore at:
    Dataset updated
    Feb 20, 2025
    Dataset authored and provided by
    United States Department of Housing and Urban Developmenthttp://www.hud.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 2010 - Nov 2024
    Area covered
    United States of America
    Description

    The Single-Family Portfolio Snapshot consists of a monthly data table and a report generator (Excel pivot table) that can be used to quickly create new reports of interest to the user from the data records. The data records themselves are loan level records using all of the categorical variables highlighted on the report generator table. Users may download and save the Excel file that contains the data records and the pivot table.The report generator sheet consists of an Excel pivot table that gives individual users some ability to analyze monthly trends on dimensions of interest to them. There are six choice dimensions: property state, property county, loan purpose, loan type, property product type, and downpayment source.Each report generator selection variable has an associated drop-down menu that is accessed by clicking once on the associated arrows. Only single selections can be made from each menu. For example, users must choose one state or all states, one county or all counties. If a county is chosen that does not correspond with the selected state, the result will be null values.The data records include each report generator choice variable plus the property zip code, originating mortgagee (lender) number, sponsor-lender name, sponsor number, nonprofit gift provider tax identification number, interest rate, and FHA insurance endorsement year and month. The report generator only provides output for the dollar amount of loans. Users who desire to analyze other data that are available on the data table, for example, interest rates or sponsor number, must first download the Excel file. See the data definitions (PDF in top folder) for details on each data element.Files switch from .zip to excel in August 2017.

  5. Ambulatory Surgery - Characteristics by Facility (Pivot Profile)

    • data.chhs.ca.gov
    • data.ca.gov
    • +2more
    .xlsx, xlsx, zip
    Updated Nov 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Ambulatory Surgery - Characteristics by Facility (Pivot Profile) [Dataset]. https://data.chhs.ca.gov/dataset/ambulatory-surgery-characteristics-by-facility-pivot-profile
    Explore at:
    xlsx(994170), xlsx, zip, xlsx(1016405), xlsx(1048616), xlsx(1029956), xlsx(996303), xlsx(1053446), .xlsx(993946)Available download formats
    Dataset updated
    Nov 6, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    This dataset contains annual Excel pivot tables that display summaries of the patients treated in each hospital-based and freestanding Ambulatory Surgery Clinic licensed by the California Department of Public Health (CDPH). The summary data includes discharge disposition, expected payer, preferred language spoken, age groups, race groups, sex, principal diagnosis groups, principal procedure groups, and principal external cause of injury/morbidity groups. The data can also be summarized statewide or for a specific facility county, type of control, and/or type of license (hospital or clinic). Note: Physician-owned ambulatory surgery clinics do not report their data to HCAI and, therefore, are not included in the statewide frequencies.

  6. w

    Road Traffic Accident Data

    • data.wu.ac.at
    csv, html, json
    Updated Aug 24, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calderdale Council (2018). Road Traffic Accident Data [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/MDUzYTY1MjktNmM4Yy00MmFjLWFlMWUtNDU1YjI3MDhlNTM1
    Explore at:
    json, html, csvAvailable download formats
    Dataset updated
    Aug 24, 2018
    Dataset provided by
    Calderdale Council
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Information on accidents casualites across Calderdale. Data includes location, number of people and vehicles involved, road surface, weather conditions and severity of any casualties.

    Please note

    • The Eastings and Northings are generated at the roadside where the accident occurred. Sometimes due to poor internet connectivity this data is may not be as accurate as it could be. If you notice any errors please contact accident.studies@leeds.gov.uk.

    Due to the format of the report a number of figures in the columns are repeated, these are:

    • Reference Number
    • Easting
    • Northing
    • Number of Vehicles
    • Accident Date
    • Time (24hr)
    • 1st Road Class
    • Road Surface
    • Lighting Conditions
    • Weather Conditions

    Reference Number

    Grid Ref: Easting

    Grid Ref: Northing

    Number of vehicles

    Accident Date

    Time (24hr)

    21G0539

    427798

    426248

    5

    16/01/2015

    1205

    21G0539

    427798

    426248

    5

    16/01/2015

    1205

    21G1108

    431142

    430087

    1

    16/01/2015

    1732

    21H0565

    434602

    436699>

    1

    17/01/2015

    930

    21H0638

    434254

    434318

    2

    17/01/2015

    1315

    21H0638

    434254

    434318

    2

    17/01/2015

    1315

    Therefore the number of vehicles involved in accident 21G0539 were 5, and in accident 21H0638 were 2. Overall in the example above a total of 9 vehicles were involved in accidents

    A useful tool to analyse the data is Excel pivot tables, these help summarise large amounts of data in a easy to view table, for further information on pivot tables visit here.

  7. e

    Road traffic accidents

    • data.europa.eu
    • gimi9.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leeds City Council, Road traffic accidents [Dataset]. https://data.europa.eu/data/datasets/road-traffic-accidents?locale=da
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    Leeds City Council
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Information on accidents across Leeds. Data includes location, number of people and vehicles involved, road surface, weather conditions and severity of any casualties.

    Please note

    • The Eastings and Northings are generated at the roadside where the accident occurred. Sometimes due to poor internet connectivity this data is may not be as accurate as it could be. If you notice any errors please contact accident.studies@leeds.gov.uk.

    Due to the format of the report a number of figures in the columns are repeated, these are:

    • Reference Number
    • Easting
    • Northing
    • Number of Vehicles
    • Accident Date
    • Time (24hr)
    • 1st Road Class
    • Road Surface
    • Lighting Conditions
    • Weather Conditions

    Reference Number

    Grid Ref: Easting

    Grid Ref: Northing

    Number of vehicles

    Accident Date

    Time (24hr)

    21G0539

    427798

    426248

    5

    16/01/2015

    1205

    21G0539

    427798

    426248

    5

    16/01/2015

    1205

    21G1108

    431142

    430087

    1

    16/01/2015

    1732

    21H0565

    434602

    436699>

    1

    17/01/2015

    930

    21H0638

    434254

    434318

    2

    17/01/2015

    1315

    21H0638

    434254

    434318

    2

    17/01/2015

    1315

    Therefore the number of vehicles involved in accident 21G0539 were 5, and in accident 21H0638 were 2. Overall in the example above a total of 9 vehicles were involved in accidents

    A useful tool to analyse the data is Excel pivot tables, these help summarise large amounts of data in a easy to view table, for further information on pivot table visit here.

    Further Information

    • Please see the guidance document for further information on categories.
  8. Scooter Sales - Excel Project

    • kaggle.com
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ann Truong (2023). Scooter Sales - Excel Project [Dataset]. https://www.kaggle.com/datasets/bvanntruong/scooter-sales-excel-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Kaggle
    Authors
    Ann Truong
    Description

    The link for the Excel project to download can be found on GitHub here. It includes the raw data, Pivot Tables, and an interactive dashboard with Pivot Charts and Slicers. The project also includes business questions and the formulas I used to answer. The image below is included for ease. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12904052%2F61e460b5f6a1fa73cfaaa33aa8107bd5%2FBusinessQuestions.png?generation=1686190703261971&alt=media" alt=""> The link for the Tableau adjusted dashboard can be found here.

    A screenshot of the interactive Excel dashboard is also included below for ease. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12904052%2Fe581f1fce8afc732f7823904da9e4cce%2FScooter%20Dashboard%20Image.png?generation=1686190815608343&alt=media" alt="">

  9. Checklist derived from plant species of Benin downloaded from GBIF site

    • gbif.org
    Updated May 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jean GANGLO; Jean GANGLO (2019). Checklist derived from plant species of Benin downloaded from GBIF site [Dataset]. http://doi.org/10.15468/mid3vk
    Explore at:
    Dataset updated
    May 9, 2019
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Laboratory of Forest Sciences (University of Abomey-Calavi)
    Authors
    Jean GANGLO; Jean GANGLO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1780 - Jan 11, 2019
    Area covered
    Description

    Plant species collected throughout Benin were published on GBIF site. Data concerning those species were downloaded from GBIF site. Using Excel dynamic pivot table we derived and achieved the checklist of plant species of Benin from the dataset downloaded.

  10. Coffee Shop Sales Analysis

    • kaggle.com
    Updated Apr 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monis Amir (2024). Coffee Shop Sales Analysis [Dataset]. https://www.kaggle.com/datasets/monisamir/coffee-shop-sales-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2024
    Dataset provided by
    Kaggle
    Authors
    Monis Amir
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Analyzing Coffee Shop Sales: Excel Insights 📈

    In my first Data Analytics Project, I Discover the secrets of a fictional coffee shop's success with my data-driven analysis. By Analyzing a 5-sheet Excel dataset, I've uncovered valuable sales trends, customer preferences, and insights that can guide future business decisions. 📊☕

    DATA CLEANING 🧹

    • REMOVED DUPLICATES OR IRRELEVANT ENTRIES: Thoroughly eliminated duplicate records and irrelevant data to refine the dataset for analysis.

    • FIXED STRUCTURAL ERRORS: Rectified any inconsistencies or structural issues within the data to ensure uniformity and accuracy.

    • CHECKED FOR DATA CONSISTENCY: Verified the integrity and coherence of the dataset by identifying and resolving any inconsistencies or discrepancies.

    DATA MANIPULATION 🛠️

    • UTILIZED LOOKUPS: Used Excel's lookup functions for efficient data retrieval and analysis.

    • IMPLEMENTED INDEX MATCH: Leveraged the Index Match function to perform advanced data searches and matches.

    • APPLIED SUMIFS FUNCTIONS: Utilized SumIFs to calculate totals based on specified criteria.

    • CALCULATED PROFITS: Used relevant formulas and techniques to determine profit margins and insights from the data.

    PIVOTING THE DATA 𝄜

    • CREATED PIVOT TABLES: Utilized Excel's PivotTable feature to pivot the data for in-depth analysis.

    • FILTERED DATA: Utilized pivot tables to filter and analyze specific subsets of data, enabling focused insights. Specially used in “PEAK HOURS” and “TOP 3 PRODUCTS” charts.

    VISUALIZATION 📊

    • KEY INSIGHTS: Unveiled the grand total sales revenue while also analyzing the average bill per person, offering comprehensive insights into the coffee shop's performance and customer spending habits.

    • SALES TREND ANALYSIS: Used Line chart to compute total sales across various time intervals, revealing valuable insights into evolving sales trends.

    • PEAK HOUR ANALYSIS: Leveraged Clustered Column chart to identify peak sales hours, shedding light on optimal operating times and potential staffing needs.

    • TOP 3 PRODUCTS IDENTIFICATION: Utilized Clustered Bar chart to determine the top three coffee types, facilitating strategic decisions regarding inventory management and marketing focus.

    *I also used a Timeline to visualize chronological data trends and identify key patterns over specific times.

    While it's a significant milestone for me, I recognize that there's always room for growth and improvement. Your feedback and insights are invaluable to me as I continue to refine my skills and tackle future projects. I'm eager to hear your thoughts and suggestions on how I can make my next endeavor even more impactful and insightful.

    THANKS TO: WsCube Tech Mo Chen Alex Freberg

    TOOLS USED: Microsoft Excel

    DataAnalytics #DataAnalyst #ExcelProject #DataVisualization #BusinessIntelligence #SalesAnalysis #DataAnalysis #DataDrivenDecisions

  11. d

    GP Practice Prescribing Presentation-level Data - July 2014

    • digital.nhs.uk
    csv, zip
    Updated Oct 31, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). GP Practice Prescribing Presentation-level Data - July 2014 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/practice-level-prescribing-data
    Explore at:
    csv(1.4 GB), zip(257.7 MB), csv(1.7 MB), csv(275.8 kB)Available download formats
    Dataset updated
    Oct 31, 2014
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Jul 1, 2014 - Jul 31, 2014
    Area covered
    United Kingdom
    Description

    Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.

  12. e

    Stream Temperature - Gwynns Falls at Gwynnbrook (GFGB) Water year 2006-2007

    • portal.edirepository.org
    • search.dataone.org
    csv
    Updated Apr 1, 2006
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emma Noonan (2006). Stream Temperature - Gwynns Falls at Gwynnbrook (GFGB) Water year 2006-2007 [Dataset]. http://doi.org/10.6073/pasta/f3575da80207f7003c4987ff3901d39b
    Explore at:
    csv(10 kilobyte)Available download formats
    Dataset updated
    Apr 1, 2006
    Dataset provided by
    EDI
    Authors
    Emma Noonan
    Time period covered
    Oct 1, 2006 - Sep 30, 2007
    Area covered
    Description

    Metadata for BES site - Stream Temperature:

       Gwynns Falls at Gwynnbrook (GFGB):
    
    
       In the Baltimore urban long-term ecological research (LTER) project, (Baltimore Ecosystem Study, BES) we use the watershed approach to evaluate integrated ecosystem function. The LTER research is centered on the Gwynns Falls watershed, a 17,150 ha catchment that traverses a gradient from the urban core of Baltimore, through older urban residential (1900 - 1950) and suburban (1950- 1980) zones, rapidly suburbanizing areas and a rural/suburban fringe.
    
    
       Stream temperature is continuously measured throughout the Gwynns Falls watershed along with supplemental sites around Baltimore County/City. A total of 22 sites contain sensors (HOBO Pro v2 Water Temperature Data Logger - U22-001) that take an instantaneous temperature reading every 2 minutes. These data are downloaded on a monthly basis. 
    
    
       This dataset is for the Gwynns Falls at Gwynnbrook/Delight. This site samples drainage from approximately 1,000 ha of old and new suburban and suburbanizing land use. 
    
    
       A detailed description of this site is posted at: http://md.water.usgs.gov/BES/ 01589197/ 
    
    
       Streamflow data for this site are posted at: http://waterdata.usgs.gov/md/nwis/nwisman?site_no=01589197
    
    
       Purpose: Long-term monitoring of stream temperature in a suburban catchment. 
    
    
       Theme keywords: stream, watershed, temperature, suburban, Baltimore Ecosystem Study
    
    
       Coordinates: Lat/Long
    
    
       39.4430 (39 26 35)  (-)76.7834 (-76 47 00)
    
    
       Review process for BES stream temperature data:
    
       Raw data were recorded and logged every 2-minutes using HOBO Pro v2 Water Temperature Data Logger - U22-001. 
    
       Data are exported into Microsoft Excel documents. 
    
       Then organized by site and by month
    
       Each month's data were entered into a pivot table in Microsoft Excel and daily means and counts of daily data points were calculated. 
    
       Plots were graphed of sites with close geographic proximity on the same graph to illustrate possible outlier data. 
    
       Missing and odd data were flagged, and notes taken from the field visits are provided where applicable.
    
  13. 2022 Bikeshare Data -Reduced File Size -All Months

    • kaggle.com
    zip
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kendall Marie (2023). 2022 Bikeshare Data -Reduced File Size -All Months [Dataset]. https://www.kaggle.com/datasets/kendallmarie/2022-bikeshare-data-all-months-combined
    Explore at:
    zip(98884 bytes)Available download formats
    Dataset updated
    Mar 8, 2023
    Authors
    Kendall Marie
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This is a condensed version of the raw data obtained through the Google Data Analytics Course, made available by Lyft and the City of Chicago under this license (https://ride.divvybikes.com/data-license-agreement).

    I originally did my study in another platform, and the original files were too large to upload to Posit Cloud in full. Each of the 12 monthly files contained anywhere from 100k to 800k rows. Therefore, I decided to reduce the number of rows drastically by performing grouping, summaries, and thoughtful omissions in Excel for each csv file. What I have uploaded here is the result of that process.

    Data is grouped by: month, day, rider_type, bike_type, and time_of_day. total_rides represent the sum of the data in each grouping as well as the total number of rows that were combined to make the new summarized row, avg_ride_length is the calculated average of all data in each grouping.

    Be sure that you use weighted averages if you want to calculate the mean of avg_ride_length for different subgroups as the values in this file are already averages of the summarized groups. You can include the total_rides value in your weighted average calculation to weigh properly.

    9 Columns:

    date - year, month, and day in date format - includes all days in 2022 day_of_week - Actual day of week as character. Set up a new sort order if needed. rider_type - values are either 'casual', those who pay per ride, or 'member', for riders who have annual memberships. bike_type - Values are 'classic' (non-electric, traditional bikes), or 'electric' (e-bikes). time_of_day - this divides the day into 6 equal time frames, 4 hours each, starting at 12AM. Each individual ride was placed into one of these time frames using the time they STARTED their rides, even if the ride was long enough to end in a later time frame. This column was added to help summarize the original dataset. total_rides - Count of all individual rides in each grouping (row). This column was added to help summarize the original dataset. avg_ride_length - The calculated average of all rides in each grouping (row). Look to total_rides to know how many original rides length values were included in this average. This column was added to help summarize the original dataset. min_ride_length - Minimum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset. max_ride_length - Maximum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset.

    Please note: the time_of_day column has inconsistent spacing. Use mutate(time_of_day = gsub(" ", "", time_of _day)) to remove all spaces.

    Revisions

    Below is the list of revisions I made in Excel before uploading the final csv files to the R environment:

    • Deleted station location columns and lat/long as much of this data was already missing.

    • Deleted ride id column since each observation was unique and I would not be joining with another table on this variable.

    • Deleted rows pertaining to "docked bikes" since there were no member entries for this type and I could not compare member vs casual rider data. I also received no information in the project details about what constitutes a "docked" bike.

    • Used ride start time and end time to calculate a new column called ride_length (by subtracting), and deleted all rows with 0 and 1 minute results, which were explained in the project outline as being related to staff tasks rather than users. An example would be taking a bike out of rotation for maintenance.

    • Placed start time into a range of times (time_of_day) in order to group more observations while maintaining general time data. time_of_day now represents a time frame when the bike ride BEGAN. I created six 4-hour time frames, beginning at 12AM.

    • Added a Day of Week column, with Sunday = 1 and Saturday = 7, then changed from numbers to the actual day names.

    • Used pivot tables to group total_rides, avg_ride_length, min_ride_length, and max_ride_length by date, rider_type, bike_type, and time_of_day.

    • Combined into one csv file with all months, containing less than 9,000 rows (instead of several million)

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
Organization logo

SPORTS_DATA_ANALYSIS_ON_EXCEL

Explore at:
zip(1203633 bytes)Available download formats
Dataset updated
Dec 12, 2024
Authors
Nil kamal Saha
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

PROJECT OBJECTIVE

We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

Questions (KPIs)

TASK 1: STANDARDIZING THE DATASET

  • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
  • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
  • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
  • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
  • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

TASK 2: DATA FORMATING

  • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
  • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
  • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
  • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

  • In COLUMNS; Group : GENDER.
  • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
  • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

  • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
  • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
  • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

  • Change the report layout to TABULAR form.
  • Remove expand and collapse buttons.
  • Remove GRAND TOTALs.
  • Allow user to filter the data by SPORT LOCATION.

Process

  • Verify data for any missing values and anomalies, and sort out the same.
  • Made sure data is consistent and clean with respect to data type, data format and values used.
  • Created pivot tables according to the questions asked.
Search
Clear search
Close search
Google apps
Main menu