13 datasets found
  1. SPORTS_DATA_ANALYSIS_ON_EXCEL

    • kaggle.com
    zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
    Explore at:
    zip(1203633 bytes)Available download formats
    Dataset updated
    Dec 12, 2024
    Authors
    Nil kamal Saha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PROJECT OBJECTIVE

    We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

    Questions (KPIs)

    TASK 1: STANDARDIZING THE DATASET

    • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
    • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
    • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
    • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
    • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

    TASK 2: DATA FORMATING

    • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
    • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
    • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
    • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

    TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

    • In COLUMNS; Group : GENDER.
    • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
    • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

    TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

    • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
    • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
    • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

    TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

    • Change the report layout to TABULAR form.
    • Remove expand and collapse buttons.
    • Remove GRAND TOTALs.
    • Allow user to filter the data by SPORT LOCATION.

    Process

    • Verify data for any missing values and anomalies, and sort out the same.
    • Made sure data is consistent and clean with respect to data type, data format and values used.
    • Created pivot tables according to the questions asked.
  2. C

    Hospital Emergency Department - Characteristics by Facility (Pivot Profile)

    • data.chhs.ca.gov
    • data.ca.gov
    • +2more
    .xlsx, xlsm, xlsx +1
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Hospital Emergency Department - Characteristics by Facility (Pivot Profile) [Dataset]. https://data.chhs.ca.gov/dataset/hospital-emergency-department-characteristics-by-facility-pivot-profile
    Explore at:
    zip, xlsx, xlsx(556712), xlsx(561869), xlsx(1341306), xlsx(1351305), xlsx(592486), xlsx(558673), xlsx(1377749), xlsx(551027), xlsx(1333357), xlsx(1347217), xlsm(1346583), xlsx(572109), xlsx(585517), xlsx(1301355), .xlsx(1305598)Available download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    This dataset contains annual Excel pivot tables that display summaries of the patients treated in each Emergency Department (ED). The Emergency Department data is sourced from two databases, the ED Treat-and-Release Database and the Inpatient Database (i.e. patients treated in the ED and then formally admitted to the hospital). The summary data include number of visits, expected payer, discharge disposition, age groups, sex, preferred language spoken, race groups, principal diagnosis groups, and principal external cause of injury/morbidity groups. The data can also be summarized statewide or for a specific hospital county, ED service level, teaching/rural status, and/or type of control.

  3. Import Excel to Power BI

    • kaggle.com
    zip
    Updated May 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ntemis Tontikopoulos (2022). Import Excel to Power BI [Dataset]. https://www.kaggle.com/datasets/ntemistonti/excel-to-power-bi/versions/1
    Explore at:
    zip(614154 bytes)Available download formats
    Dataset updated
    May 15, 2022
    Authors
    Ntemis Tontikopoulos
    Description

    HOW TO: - Hierarchy using the category, subcategory & product fields (columns “Product Category” “Product SubCategory”, & “Product Name”). - Group the values ​​of the column "Region" into 2 groups, alphabetically, based on the name of each region.

    1. Display a table, which shows, for each value of the product hierarchy you created above, the total amount of sales ("Sales") and profitability ("Profit").
    2. The same information as the previous point (2) in a bar chart illustration.
    3. Display columns with the total sales amount ("Sales") for each value of the alphabetical grouping of the Region field you created. The color of each column should be derived from the corresponding total shipping cost (“Shipping Cost”). In the Tooltip of the illustration all numeric values ​​should have a currency format.
    4. The same diagram as above (3), with the addition of a data filter at visual level filter that will display only the data subset related to sales with positive values ​​for the field "Profit".
    5. The same diagram with the above point (3), with the addition of a data filter at visual level filter that will display only the subset of data related to sales with negative values ​​for the field "Profit".
    6. Map showing the total amount of sales (size of each point), as well as the total profitability (color of each point). Change the dimensions of the image
  4. RAAAP-123 Main Datasets (for Workshop).xlsx

    • figshare.com
    xlsx
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Kerridge; Melinda Fischer (2024). RAAAP-123 Main Datasets (for Workshop).xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.26123668.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Simon Kerridge; Melinda Fischer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This spreadsheet contains a number of sheets.Three sheets contain the main datasets from each of the first three RAAAP surveys.In addition there is a combined sheet containing data from all three sheets (where the semantics are the same) with an additional field indicating which survey it is from. This sheet has fewer colunms as it only has the shared variables.There is also a sheet for each survey listing the variables, and one showing the mappings between surveys, and one showing the common variables.Finally there is an example pivot table to show how the data can be easily visualised.This spreasheet was developed for the RAAAP workshop delivered at the 2023 INORMS Conference in May 2023 in Durban, South Africa.This spreadsheet contains all of the common data from the first 3 RAAAP surveys.These data are presented on separate

  5. g

    Hospital Emergency Department - Characteristics by Facility (Pivot Profile)...

    • gimi9.com
    Updated Jun 30, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Hospital Emergency Department - Characteristics by Facility (Pivot Profile) | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_hospital-emergency-department-characteristics-by-facility-pivot-profile-cd43a/
    Explore at:
    Dataset updated
    Jun 30, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains annual Excel pivot tables that display summaries of the patients treated in each Emergency Department (ED). The Emergency Department data is sourced from two databases, the ED Treat-and-Release Database and the Inpatient Database (i.e. patients treated in the ED and then formally admitted to the hospital). The summary data include number of visits, expected payer, discharge disposition, age groups, sex, preferred language spoken, race groups, principal diagnosis groups, and principal external cause of injury/morbidity groups. The data can also be summarized statewide or for a specific hospital county, ED service level, teaching/rural status, and/or type of control.

  6. FiveThirtyEight Mad Men Dataset

    • kaggle.com
    zip
    Updated Apr 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FiveThirtyEight (2019). FiveThirtyEight Mad Men Dataset [Dataset]. https://www.kaggle.com/datasets/fivethirtyeight/fivethirtyeight-mad-men-dataset
    Explore at:
    zip(16691 bytes)Available download formats
    Dataset updated
    Apr 26, 2019
    Dataset authored and provided by
    FiveThirtyEighthttps://abcnews.go.com/538
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    Mad Men

    This directory contains the data behind the story ‘Mad Men’ Is Ending. What’s Next For The Cast?

    The primary file show-data.csv contains data of actors who appeared on at least half the episodes of television shows that were nominated for an Emmy for Outstanding Drama since the year 2000. It contains the following variables:

    HeaderDefinition
    PerformerThe name of the actor, according to IMDb. This is not a unique identifier - two performers appeared in more than one program
    ShowThe television show where this actor appeared in more than half the episodes
    Show StartThe year the television show began
    Show EndThe year the television show ended, "PRESENT" if the show remains on the air as of May 10.
    Status?Why the actor is no longer on the program: "END" if the show has concluded, "LEFT" if the show remains on the air.
    CharEndThe year the character left the show. Equal to "Show End" if the performer stayed on until the final season.
    Years Since2015 minus CharEnd
    #LEADThe number of leading roles in films the performer has appeared in since and including "CharEnd", according to OpusData
    #SUPPORTThe number of leading roles in films the performer has appeared in since and including "CharEnd", according to OpusData
    #ShowsThe number of seasons of television of which the performer appeared in at least half the episodes since and including "CharEnd", according to OpusData
    Score#LEAD + #Shows + 0.25*(#SUPPORT)
    Score/Y"Score" divided by "Years Since"
    lead_notesThe list of films counted in #LEAD
    support_notesThe list of films counted in #SUPPORT
    show_notesThe seasons of shows counted in #Shows

    The supplemental file performer-scores.csv is the consolidated data from show-data.csv made into a pivot table.

    Context

    This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using GitHub's API and Kaggle's API.

    This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.

  7. Sales Dashboard in Microsoft Excel

    • kaggle.com
    zip
    Updated Apr 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavana Joshi (2023). Sales Dashboard in Microsoft Excel [Dataset]. https://www.kaggle.com/datasets/bhavanajoshij/sales-dashboard-in-microsoft-excel/discussion
    Explore at:
    zip(253363 bytes)Available download formats
    Dataset updated
    Apr 14, 2023
    Authors
    Bhavana Joshi
    Description

    This interactive sales dashboard is designed in Excel for B2C type of Businesses like Dmart, Walmart, Amazon, Shops & Supermarkets, etc. using Slicers, Pivot Tables & Pivot Chart.

    Dashboard Overview

    1. Sales dashboard ==> basically, it is designed for the B2C type of business. like Dmart, Walmart, Amazon, Shops & supermarkets, etc.
    2. Slices ==> slices are used to drill down the data, on the basis of yearly, monthly, by sales type, and by mode of payment.
    3. Total Sales/Total Profits ==> here is, the total sales, total profit, and profit percentage these all are combined into a monthly format and we can hide or unhide it to view it as individually or comparative.
    4. Product Visual ==> the visual indicates product-wise sales for the selected period. Only 10 products are visualized at a glance, and you can scroll up & down to view other products in the list.
    5. Daily Sales ==> It shows day-wise sales. (Area Chart)
    6. Sales Type/Payment Mode ==> It shows sales percentage contribution based on the type of selling and mode of payment.
    7. Top Product & Category ==> this is for the top-selling product and product category.
    8. Category ==> the final one is the category-wise sales contribution.

    Datasheets Overview

    1. The dataset has the master data sheet or you can call it a catalog. It is added in the table form.
    2. The first column is the product ID the list of items in this column is unique.
    3. Then we have the product column instead of these two columns, we can manage with only one also but I kept it separate because sometimes product names can be the same, but some parameters will be different, like price, supplier, etc.
    4. The next column is the category column, which is the product category. like cosmetics, foods, drinks, electronics, etc.
    5. Then we have 4th column which is the unit of measure (UOM) you can update it also, based on the products you have.
    6. And the last two columns are buying price and selling price, which means unit purchasing price and unit selling price.

    Input Sheet

    The first column is the date of Selling. The second column is the product ID. The third column is quantity. The fourth column is sales types, like direct selling, are purchased by a wholesaler or ordered online. The fifth column is a mode of payment, which is online or in cash. You can update these two as per requirements. The last one is a discount percentage. if you want to offer any discount, you can add it here.

    Analysis Sheet: where all backend calculations are performed.

    So, basically these are the four sheets mentioned above with different tasks.

    However, a sales dashboard enables organizations to visualize their real-time sales data and boost productivity.

    A dashboard is a very useful tool that brings together all the data in the forms of charts, graphs, statistics and many more visualizations which lead to data-driven and decision making.

    Questions & Answers

    1. What percentage of profit ratio of sales are displayed in the year 2021 and year 2022? ==> Total profit ratio of sales in the year 2021 is 19% with large sales of PRODUCT42, whereas profit ratio of sales for 2022 is 22% with large sales of PRODUCT30.
    2. Which is the top product that have large number of sales in year 2021-2022? ==> The top product in the year 2021 is PRODUCT42 with the total sales of $12,798 whereas in the year 2022 the top product is PRODUCT30 with the total sales of $13,888.
    3. In Area Chart which product is highly sold on 28th April 2022? ==> The large number of sales on 28th April 2022 is for PRODUCT14 with a 24% of profit ratio.
    4. What is the sales type and payment mode present? ==> The sale type and payment modes show the sales percentage contribution based on the type of selling and mode of payment. Here, the sale types are Direct Sales with 52%, Online Sales with 33% and Wholesaler with 15%. Also, the payment modes are Online mode and Cash equally distributed with 50%.
    5. In which month the direct sales are highest in the year 2022? ==> The highest direct sales can be easily identified which is designed by monthly format and it’s the November month where direct sales are highest with 28% as compared with other months.
    6. Which payment mode is highly received in the year 2021 and year 2022? ==> The payments received in the year 2021 are the cash payments with 52% as compared with online transactions which are 48%. Also, the cash payment highly received is in the month of March, July and October with direct sales of 42%, Online with 45% and wholesaler with 13% with large sales of PRODUCT24. ==> The payments received in the year 2022 are the Online payments with 52% as compared with cash payments which are 48%. Also, the online payment highly received is in the month of Jan, Sept and December with direct sales of 45%, Online with 37% and whole...
  8. Cyclistic Trip Data Analysis

    • kaggle.com
    zip
    Updated Jan 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatima Gulraiz (2025). Cyclistic Trip Data Analysis [Dataset]. https://www.kaggle.com/fatimagulraiz/cyclistictripdata
    Explore at:
    zip(46869261 bytes)Available download formats
    Dataset updated
    Jan 1, 2025
    Authors
    Fatima Gulraiz
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    About Cyclistic

    Cyclistic is a bike-share program that features more than 5,800 bicycles and 600 docking stations. They offer making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

    Problem Statement

    The target/aim of marketing team is to convert casual riders into annual riders. In order to convert the causal riders into annual members need to understand the behavior of the users that how the annual members are using this service differently than causal riders. Need to understand how often this service is being used by annual members and casual riders.

    Solution

    For the analysis of this project, we picked/chose Excel with the mutual consent of our team to show our work. To help with our analysis we started with Ask, then we prepare our data according to what client was asking to provide then we process the data to make it clean, organize and easy to accessible and at the end we analyze that data to get the results.

    As per the requirement of our client, they wanted to increase the number of their annual members. To increase their annual members, they wanted to know How do annual members and casual riders use Cyclistic bike differently?

    After having company’s requirement now, it was the time to Prepare and Process the data. For this analysis we been told to use only previous 12 months of Cyclistic trip data. The data has been made available online by Motivational International Inc. we checked the integrity and credibility of data by making sure that online source is safe and secure through which the data is available to use.

    While preparing the data, we started with downloading the files on our machine. We saved the files and unzip them. Then we created the subfolders for the .csv and the .xls sheets. Before further analysis we cleaned the data. We used Filter option on our required columns to see if there are any NULLS or any data that it supposed to be not here.

    While cleaning the data in some of the monthly files we found that start_at and end_at columns had the custom format of mm: ss.0. For consistency with all other spreadsheets we changed the custom format to m/d/yy h:mm. We also found that some spreadsheets had the data from other months but after further analysis we figured it out that the ride was starting in that month and ending in the next month so that data supposed to belong from that worksheet.

    After cleaning the data, we created 2 new columns in each worksheet to perform our calculations. To perform our calculations, we made 2 new columns and named them: a) ride_length
    b) day_of _week

    To create ride_length column we used Subtraction Formula by choosing stareted_at and ended_at columns. That gave us the ride length of each ride for everyday of the month. To create day_of_week we used WEEKDAY command. After cleaning the data on monthly basis, it was the time to merge all 12 months into a single spreadsheet. After merging the whole data into a new sheet, it was time to Analyze! Before analyzing our team made sure one more time that the data is properly organize, formatted and there is no error or bug in our data to get the correct results. We made sure on more time that all the Formatting are correct.To analyze the data we ran few calculations to get a better sense of the data layout that we were using. We calculated: a) mean of ride_length b) max of ride_length c) mode of day_of_week

    To find out mean of ride_length, we used Average Formula, to get an estimate/ overview of how long rides usually last. By doing Max calculation we found out the longest ride length. Last but not the least mode function we calculate the most frequent day of the week when riders were using that service.

    To Support the requirement/ question that been asked by our client to identify the trends and relationship we made a Pivot Table in Excel so that we can show/ present our work/ insights/ results in an easy way to the client. By using Pivot Table its clearer to see the trend that annual members are using this service more than the casual riders and it’s also giving the good picture of the relation that how often annual members are using this service. By using the Pivot Table, we analyzed that total number of rides for annual members are more than the causal riders. On the basis of our analysis, we found out that the average length of ride is more for casual riders than the annual members, it means that casual members are riding for longer period of time than the annual members. But annual members are using more often than casual ri...

  9. New York State Unemployment Insurance Average

    • kaggle.com
    zip
    Updated Jan 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). New York State Unemployment Insurance Average [Dataset]. https://www.kaggle.com/datasets/thedevastator/new-york-state-unemployment-insurance-average-du
    Explore at:
    zip(92067 bytes)Available download formats
    Dataset updated
    Jan 7, 2023
    Authors
    The Devastator
    Area covered
    New York
    Description

    New York State Unemployment Insurance Average Duration (2002-Present)

    Regional and County Level Trends

    By State of New York [source]

    About this dataset

    This dataset provides crucial insights on the unemployment benefits of New York State residents, unveiling the average duration of unemployment insurance security they receive during their benefit year. From January 2002 to present, discover trends related to ten labor market regions, recapping intricate information gathered from 62 counties and subdivisions. With a simple download of data including columns such as Year, Month, Region, County and Average Duration who insight can be provided with proper understanding and interpretation.

    As each region has distinct characteristics this dataset contains a broad spectrum of data types ranging from regular unemployment insurance (UI) cases not associated with Federal Employees (UCFE), Veterans (UCX), Self Employment Assistance Program (SEAP) or other situations to Shared Work programs including 599.2 training or Federal extensions recipients all adding tremendous value for users leveraging it responsibly. Before using the data make sure you read the Terms of Service in order to understand any legal requirements related executing use right upon installation! Last updated at 2020-09-16 this dataset is an April Fools gift not just for passionate researchers but also community impact leaders seeking direction when addressing prevalent social problems!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains data on the average duration of unemployment insurance benefits in New York state from 2002 to present. This data can be useful for analyzing trends in unemployment rates, understanding regional differences, and evaluating labor market changes over time. In this guide we will explore how to use this dataset for your own research and analysis.

    Firstly, you'll need to download the dataset from Kaggle. Once downloaded, you can open it with a spreadsheet program such as Microsoft Excel or Google Sheets to begin exploring the data.

    The columns of the dataset that are available include Year, Month, Region, County, and Average Duration. Year indicates what year the related month's data falls under while Month shows which month that number corresponds with. Region and County represent the geographic areas these numbers are describing within New York State whereas Average Duration provides an indication of how long beneficiaries received their unemployment insurance benefits within their benefit year period on average within each given area.

    Using these columns as your guide you can start analyzing different aspects of state-level unemployment trends in New York over time or compare counties’ benefit level information against each other during any given year or specific month by filtering accordingly using Pivot Tables or Visualizations tools such as Microsoft Power BI and Tableau Desktop/Desktop Server depending on what type of analysis you want to conduct further down (e..g clustering/kmeans algorithms etc). You may also consider combining this with other macroeconomic datasets such as GDP growth rate per county/region etc., if applicable for further insight into factors influencing unemployed benefit duration levels over time etc.. Depending upon your objective make sure to review reference material cited at bottom part & ensure that all applicable terms & conditions have been read & accepted prior to proceeding further on research at hand!

    In conclusion ,this is a comprehensive yet easy-to-use source if you're looking for a detailed overview when examining Unemployment Insurance Average Duration across various geographic regions within New York State between 2002 up until present day! We hope that this guide outlined has been helpful in getting started with understanding insights relevant behind utilizing this powerful yet versatile dataset made available courtesy via Kaggle platform today!

    Research Ideas

    • Comparing current to historical unemployment insurance average duration trends (e.g. year over year, month to month).
    • Analyzing correlations between unemployment insurance average duration and other economic factors such as housing prices or wage growth in a particular county or region.
    • Mapping the distributions of unemployment insurance average duration across different regions and counties in New York State, providing useful insights into regional economic differences within the state that can inform policy decision-making by local governments

    Acknowledgements

    If you use this data...

  10. Google Data Analytics Capstone Project: Netflix

    • kaggle.com
    zip
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doga Celik (2024). Google Data Analytics Capstone Project: Netflix [Dataset]. https://www.kaggle.com/datasets/dogacelik/google-data-analytics-capstone-project-netflix
    Explore at:
    zip(59851 bytes)Available download formats
    Dataset updated
    Jan 25, 2024
    Authors
    Doga Celik
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction:

    In this case study the skills that I acquired from Google Data Analytics Professional Certificate Course is demonstrated. These skills will be used to complete the imagined task which was given by Netflix. The analysis process of this task will be consisted of following steps. Ask, Prepare, Process, Analyze, Share and Act.

    Scenario:

    The Netflix Chief Content Officer, Bela Bajaria, believes that companies success depends on to provide the customers what they want. Bajaria stated that the goal of this task is to find most wanted contents of the movies which will be added to the portfolio. Most of the movie contracts are signed before they come to the theaters, and it is hard to know if the customers really want to watch that movie and if the movie will be successful. There for my team wants to understand what type of content a movies success depends on. From these insights my team will design an investment strategy to choose the most popular movies that are expected to be in theaters in the near future. But first, Netflix executives must approve our recommendations. To be able to do that we must provide satisfying data insights along with professional data visualizations.

    About the Company:

    At Netflix, we want to entertain the world. Whatever your taste, and no matter where you live, we give you access to best-in-class TV series, documentaries, feature films and games. Our members control what they want to watch, when they want it, in one simple subscription. We’re streaming in more than 30 languages and 190 countries, because great stories can come from anywhere and be loved everywhere. We are the world’s biggest fans of entertainment, and we’re always looking to help you find your next favorite story.

    As a company Netflix knows that it is important to acquire or produce movies that people want to watch.

    There for Bajaria has set a clear goal: Define an investment strategy that will allow Netflix to provide customers the movies what they want to watch which will maximize the Sales.

    Ask:

    Business Task: To find out what kind of movie customers wants to watch and if the content type really has a correlation with the movie success. Stakeholders:

    Bela Bajaria: She joined Netflix in 2016 to oversee unscripted and scripted series. Bajaria also responsible from the content selection and strategy for different regions.

    Netflix content analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Netflix content strategy.

    Netflix executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended content program.

    Prepare:

    I start my preparation procedure by downloading every piece of data I'll need for the study. Top 1000 Highest-Grossing Movies of All Time.csv will be used. Additionally, 15 Lowest-Grossing Movies of All Time.csv was found during the data research and this dataset will be analyst as well. The data has been made available by IMDB and shared this two following URL addresses: https://www.imdb.com/list/ls098063263/ and https://www.imdb.com/list/ls069238222/ .

    Process:

    Data Cleaning:

    SQL: To begin the data cleaning process, I opened both csv file in SQL and conducted following operations:

    • Checked for and removed any duplicates. • Checked if there any null values. • Removed the columns that are not necessary. • Trim the Description column to have only gross profit in it. (This cleaning procedure only used for 1000 Highest-Grossing Movies of All Time.csv dataset.)

    • Renamed the Description column as Gross_Profit. (This cleaning procedure only used for 1000 Highest-Grossing Movies of All Time.csv dataset.)

    Follwing SQL codes were used during the data cleaning:

    SQL CODE used for Highest Grossing Movies DATASET

    SELECT Position, SUBSTR(Description,34,12) as Gross_Profit, Title, IMDb_Rating, Runtime_mins_, Year, Genres, Num_Votes, Release_Date FROM even-electron-400301.Highest_Gross_Movies.1

    SQL CODE used for Lowest Grossing Movies DATASET

    SELECT Position, Title, IMDb_Rating, Runtime_mins_, Year, Genres, Num_Votes, Release_Date FROM even-electron-400301.Lowest_Grossing_Movies.2 Order By Position

    Analyze:

    As a starter, I want to reemphasize the business task once again. Is content has a big impact on a movie’s success?

    To answer this question, there were a few information that I projected that I could pull of and use it during my analysis.

    • Average gross profit • Number of Genres • Total Gross Profit of the most popular genres • The distribution of the Gross income on Genres

    I used Microsoft Excel for the bullet points above. The operations to achieve the values above are as follows:

    • Average function for Average Gross profit in 1000 Highest-Grossing Movies of All Time. • Created a pivot table to work on Genres and Gross_Pr...

  11. Cleaned WJP Rule of Law Index 2021 Dataset

    • kaggle.com
    zip
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hammad Farooq (2025). Cleaned WJP Rule of Law Index 2021 Dataset [Dataset]. https://www.kaggle.com/datasets/hammadfarooq470/cleaned-wjp-rule-of-law-index-2021-dataset/code
    Explore at:
    zip(413876 bytes)Available download formats
    Dataset updated
    Nov 6, 2025
    Authors
    Hammad Farooq
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains the complete country-level data from the World Justice Project Rule of Law Index for two benchmark years:

    • 2012–2013 (97 countries, original methodology)
    • 2021 (129 countries, updated methodology)

    The WJP Rule of Law Index is the world’s leading source for original, independent data on the rule of law. It measures how the rule of law is experienced and perceived by the general public across 8 factors and 44–47 sub-factors using household and expert surveys.

    This cleaned CSV combines both years in the original wide format (one column per country) while preserving every single indicator — perfect for longitudinal analysis, ranking, clustering, or visualization.

    Key Features

    • 120+ countries covered
    • Two comparable snapshots almost a decade apart
    • All 8 main factors + every sub-factor (0–1 scale, higher = stronger rule of law)
    • Regional classification included
    • Ready-to-use for research, policy analysis, and machine learning

    The 8 Core Factors

    FactorWhat it Measures
    1. Constraints on Government PowersChecks and balances, sanctions for misconduct
    2. Absence of CorruptionCorruption in executive, judiciary, police, legislature
    3. Order and SecurityCrime control, civil conflict, vigilante justice
    4. Fundamental RightsEqual treatment, life & security, due process, freedoms of expression, religion, assembly, privacy, labor rights
    5. Open GovernmentPublicized laws, right to information, civic participation
    6. Regulatory EnforcementEffective, impartial, timely enforcement without improper influence
    7. Civil JusticeAccessible, affordable, impartial, timely, discrimination-free, corruption-free
    8. Criminal JusticeEffective investigation, adjudication, correction; impartiality, no corruption, due process

    Column Descriptions

    Column NameDescriptionData TypeExample Values
    sheet_nameSource sheet / indicator name (includes year)stringWJP ROL Index 2012-2013 Scores
    CountryFull name of the indicator (factor or sub-factor)stringFactor 1: Limited Government Powers
    RegionWJP regional grouping (only filled for country rows)stringEU + EFTA + North America, Sub-Saharan Africa
    Albania, Argentina, Australia, … ZimbabweScore (0–1) for that country on that indicator in the corresponding yearfloat0.883, 0.457, 0.724

    *The file is in wide format: each country has its own column, each row is a different indicator/year. This matches the official WJP layout and is ideal for quick country comparisons, pivot tables, and time-series analysis.*

    Data Sources & Citation

    License

    Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

    Recommended Kaggle Tags

    rule-of-law governance democracy institutions political-science development-economics corruption justice-system human-rights longitudinal-data global WJP open-government criminal-justice

    Perfect for academic papers, policy reports, interactive dashboards, or exploring how rule of law has changed over the past decade!

  12. Credit Card Spending Habits in India

    • kaggle.com
    zip
    Updated Mar 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2024). Credit Card Spending Habits in India [Dataset]. http://www.kaggle.com/datasets/thedevastator/analyzing-credit-card-spending-habits-in-India
    Explore at:
    zip(326253 bytes)Available download formats
    Dataset updated
    Mar 10, 2024
    Authors
    The Devastator
    Area covered
    India
    Description

    Credit Card Spending Habits in India

    Gender, Location, and Transaction Trends

    By Sadat Akash [source]

    About this dataset

    This dataset contains insights into a collection of credit card transactions made in India, offering a comprehensive look at the spending habits of Indians across the nation. From the Gender and Card type used to carry out each transaction, to which city saw the highest amount of spending and even what kind of expenses were made, this dataset paints an overall picture about how money is being spent in India today. With its variety in variables, researchers have an opportunity to uncover deeper trends in customer spending as well as interesting correlations between data points that can serve as invaluable business intelligence. Whether you're interested in learning more about customer preferences or simply exploring unbiased data analysis techniques, this data is sure to provide insight beyond what one could anticipate

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • To get started with this dataset, you first need to select the columns you want to analyze. Once your columns are selected, use pivot tables to create a summary of the total amount spent by month or city or other parameters of analysis. Some suggested analysis would include factors such as gender, seasonality/timing of spending etc which can help to better understand Indian consumer behaviour related to credit cards as well as provide insights into personal finance management that could be useful for improved financial decisions.
    • Once a summary table is created from the selected columns it could be useful to add more detailed breakdowns by combining multiple criteria such as ‘amount’ with ‘expense type’ or ‘date’ etc., this way more informative visuals and summaries can be generated which could then again help in forming better conclusions about financial habits within India related to Credit Card usage trends and recommendations for future improvement measures if needed .
    • Additionally , if available other external information (i.e population size/density/income levels etc.)could also be compared with these findings so further actionable areas of focus can be identified on an overall level or credited towards specific buyer personas / cities etc.

    Research Ideas

    • To analyze consumer trends and interests by looking at the type of purchases people make based on their gender and city.
    • To detect potential credit card fraud or malicious activity, such as by analyzing changes in spending habits or unusual purchases, by city and gender.
    • To predict spending patterns for promotional campaigns, such as during festivals or holidays, in order to better target customer segments according to city and gender based spending habits

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: Credit card transactions - India - Simple.csv | Column name | Description | |:--------------|:--------------------------------------------------------------| | City | The city in which the transaction took place. (String) | | Date | The date of the transaction. (Date) | | Card Type | The type of credit card used for the transaction. (String) | | Exp Type | The type of expense associated with the transaction. (String) | | Gender | The gender of the cardholder. (String) | | Amount | The amount of the transaction. (Number) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Sadat Akash.

    -

  13. Timac Fuel Distribution & Sales Dataset –

    • kaggle.com
    zip
    Updated May 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatolu Peter (2025). Timac Fuel Distribution & Sales Dataset – [Dataset]. https://www.kaggle.com/datasets/olagokeblissman/timac-fuel-distribution-and-sales-dataset
    Explore at:
    zip(41988 bytes)Available download formats
    Dataset updated
    May 31, 2025
    Authors
    Fatolu Peter
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📝 Dataset Overview: This dataset represents real-world, enhanced transactional data from Timac Global Concept, one of Nigeria’s prominent players in fuel and petroleum distribution. It includes comprehensive sales records across multiple stations and product categories (AGO, PMS, Diesel, Lubricants, LPG), along with revenue and shift-based operational tracking.

    The dataset is ideal for analysts, BI professionals, and data science students aiming to explore fuel economy trends, pricing dynamics, and operational analytics.

    🔍 Dataset Features: Column Name Description Date Transaction date Station_Name Name of the fuel station AGO_Sales (L) Automotive Gas Oil sold in liters PMS_Sales (L) Premium Motor Spirit sold in liters Lubricant_Sales (L) Lubricant sales in liters Diesel_Sales (L) Diesel sold in liters LPG_Sales (kg) Liquefied Petroleum Gas sold in kilograms Total_Revenue (₦) Total revenue generated in Nigerian Naira AGO_Price Price per liter of AGO PMS_Price Price per liter of PMS Lubricant_Price Unit price of lubricants Diesel_Price Price per liter of diesel LPG_Price Price per kg of LPG Product_Category Fuel product type Shift Work shift (e.g., Morning, Night) Supervisor Supervisor in charge during shift Weekday Day of the week for each transaction

    🎯 Use Cases: Build Power BI dashboards to track fuel sales trends and shifts

    Perform revenue forecasting using time series models

    Analyze price dynamics vs sales volume

    Visualize station-wise performance and weekday sales patterns

    Conduct operational audits per supervisor or shift

    🧰 Best Tools for Analysis: Power BI, Tableau

    Python (Pandas, Matplotlib, Plotly)

    Excel for pivot tables and summaries

    SQL for fuel category insights

    👤 Created By: Fatolu Peter (Emperor Analytics) Data analyst focused on real-life data transformation in Nigeria’s petroleum, healthcare, and retail sectors. This is Project 11 in my growing portfolio of end-to-end analytics challenges.

    ✅ LinkedIn Post: ⛽ New Dataset Alert – Fuel Economy & Sales Data Now on Kaggle! 📊 Timac Fuel Distribution & Revenue Dataset (Nigeria – 500 Records) 🔗 Explore the data here

    Looking to practice business analytics, revenue forecasting, or operational dashboards?

    This dataset contains:

    Daily sales of AGO, PMS, Diesel, LPG & Lubricants

    Revenue breakdowns by station

    Shift & supervisor tracking

    Fuel prices across product categories

    You can use this to: ✅ Build Power BI sales dashboards ✅ Create fuel trend visualizations ✅ Analyze shift-level profitability ✅ Forecast revenue using Python or Excel

    Let’s put real Nigerian data to real analytical work. Tag me when you build with it—I’d love to celebrate your work!

    FuelAnalytics #KaggleDatasets #PowerBI #PetroleumIndustry #NigeriaData #RevenueForecasting #EmperorAnalytics #FatoluPeter #Project11 #TimacGlobal #RealWorldData

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
Organization logo

SPORTS_DATA_ANALYSIS_ON_EXCEL

Explore at:
zip(1203633 bytes)Available download formats
Dataset updated
Dec 12, 2024
Authors
Nil kamal Saha
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

PROJECT OBJECTIVE

We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

Questions (KPIs)

TASK 1: STANDARDIZING THE DATASET

  • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
  • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
  • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
  • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
  • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

TASK 2: DATA FORMATING

  • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
  • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
  • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
  • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

  • In COLUMNS; Group : GENDER.
  • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
  • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

  • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
  • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
  • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

  • Change the report layout to TABULAR form.
  • Remove expand and collapse buttons.
  • Remove GRAND TOTALs.
  • Allow user to filter the data by SPORT LOCATION.

Process

  • Verify data for any missing values and anomalies, and sort out the same.
  • Made sure data is consistent and clean with respect to data type, data format and values used.
  • Created pivot tables according to the questions asked.
Search
Clear search
Close search
Google apps
Main menu