Facebook
TwitterHR Analysis BI Dashboard performance appraisal and employee wellbeing The HR department plays several roles in the enterprise. HR acts as a mediator or a bridge between the employees and the management or enterprise. It’s no surprise that the HR department is already burdened with work. Providing them access to the latest technology and the means to derive insights in real time will help reduce the workload and create a healthy organizational environment.
Problem Statement: Market fluctuations and rapidly changing technology have affected the global market. Many published reports showed that around half of the employees wanted to change jobs. While some market researchers said that flexible working and job security were their primary factors, few admitted that a higher salary was their aim.
Different regions saw an increase and a decrease in salaries over the years. While the increase was to retain top-level professional employees, the pay cuts were due to market fluctuations and were resorted after the market conditions improved. HR people across the globe are hiring new employees, trying to retain and understand the needs of employees who got separated (those who left the company).
So, how does the HR department make these decisions in volatile market conditions? They rely on HR analytics to understand the existing situation and develop a new modern approach. For this requirement, you have been asked in your company to build a dashboard in Power BI considering the following challenges of HR people and provide an effective way to find the answers to their day-to-day questions.
Tasks: Use the HR data set for this project and analyze that to understand the data and terms.
Load data into the Power BI Query Editor and perform the required actions.
Establish the required relationships.
Create the required DAX columns and measures for calculation
Facebook
Twitterhttps://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Conduct a thorough examination of the dataset to identify any anomalies or inconsistencies.
Duplicate Removal:
Identify and remove duplicate rows within the dataset.
Ensure data integrity by eliminating redundant entries.
Standardization of Marital Status and Gender:
Replace 'M' with 'Married' and 'S' with 'Single' in the Marital Status column.
Standardize gender data by replacing 'M' with 'Male' and 'F' with 'Female'.
Commute Distance Standardization:
Modify "10+ Miles" to "Above 10 Miles" for uniformity.
Arrange Commute Distance in ascending order to facilitate analysis.
Age Group Classification:
Introduce an additional column named "Age Group" for age categorization.
Calculate ages from existing data, categorizing: - Below 30 years as "Young Adults". - Between 31 and 45 years as "Middle-aged Adults". - Above 45 years as "Old-aged Adults".
6.Verification and Data Loading: - Validate all transformations to ensure accuracy and coherence. - Load the refined dataset back into Excel for further analysis.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset represents a large tertiary-care hospital with 25 clinical departments and a total capacity of 5500 inpatient beds, including 1080 ICU beds. All data is fully synthetic and created for educational and analytical purposes. It is designed to support:
The dataset approximates real-world bed allocation patterns across major clinical specialties such as Emergency Care, Surgery, Pediatrics, ICU, Oncology, and Long-Term Care.
To maintain realism, departments have varying occupancy levels: - some are under low load (free capacity), - some operate under normal/medium load, - several are intentionally modeled as overloaded/high occupancy, to reflect real hospital dynamics.
All metrics simulate plausible hospital operations.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, I had to ask questions that could help extract and explore information that would help decision-makers improve and evaluate performance. But before that, I did some operations in the data to help me to analyze it accurately: 1- Understand the data. 2- Clean the data “By power query”. 3- insert some calculation and columns like “COGS” cost of goods sold by power query. 4- Modeling the data and adding some measures and other columns to help me in analysis. Then I asked these questions: To Enhance Customer Loyalty What is the most used ship mode by our customer? Who are our top 5 customers in terms of sales and order frequency? To monitor our strength and weak points Which segment of clients generates the most sales? Which city has the most sales value? Which state generates the most sales value? Performance measurement What are the top performing product categories in terms of sales and profit? What is the most profitable product that we sell? What is the lowest profitable product that we sell? Customer Experience On Average how long does it take the orders to reach our clients? Based on each Shipping Mode
Then started extracting her summaries and answers from the pivot tables and designing the data graphics in a dashboard for easy communication and reading of the information as well. And after completing these operations, I made some calculations related to the KPI to calculate the extent to which sales officials achieved and the extent to which they achieved the target.
Facebook
TwitterIn the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, I had to ask questions that could help extract and explore information that would help decision-makers improve and evaluate performance. But before that, I did some operations in the data to help me to analyze it accurately: 1- Understand the data. 2- Clean the data “By power query”. 3- insert some calculation and columns by power query. 4- Analysis to the data and Ask some Questions About Distribution What is the Number of Bikes Sold? What is the most region purchasing bikes? What is the Ave. income by gender & purchasing bikes? The Miles with Purchasing bikes? What is situation to age by purchasing & Count of bikes sold? About Consumer Behavior Home Owner by purchasing? Single or married & Age by purchasing? Having cars by purchasing? Education By purchasing? Occupation By purchasing?
And I notice the Most Situations Purchasing Bikes is: - North America “Region”. - Commute Distance 0-1 Miles. - The people who are in the middle age and single "169 Bikes". - People that having Bachelor's degree. - The Males who have the average income 60,124$. - People that having Professional occupation. - Home owners “325 Bikes”. - People who having 0 or 1 car. So, I Advise The give those slices more offers to increase the sell value.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 999,999 rows of synthetic movie data designed to simulate real-world movie industry metrics and characteristics, with a variety of numeric, categorical, and date fields.
The dataset is ideal for: - Analytical reporting and dashboarding (Power BI, Tableau, Excel) - Exploratory data analysis (EDA), machine learning model development, and visualisation exercises - Understanding relationships between movie metrics, ratings, release timing, and personnel - Building interactive dashboards with filters like genre, release year, and country
| Column Name | Description |
|---|---|
| MovieID | Unique identifier for each movie (integer from 1 to 999,999) |
| Title | Synthetic movie title with natural language style |
| Genre | Primary movie genre (Drama, Action, Comedy, etc.) |
| ReleaseYear | Year of release (1950 to 2025) |
| ReleaseDate | Randomised full release date within the release year (YYYY-MM-DD) |
| Country | Country of production origin |
| BudgetUSD | Estimated production budget in US dollars (range $100k to $300 million) |
| US_BoxOfficeUSD | Gross box office revenue from the US market |
| Global_BoxOfficeUSD | Total global box office revenue |
| Opening_Day_SalesUSD | Estimated US ticket sales revenue on opening day |
| One_Week_SalesUSD | Estimated US ticket sales revenue in first week |
| IMDbRating | IMDb rating on a 1.0 to 10.0 scale |
| RottenTomatoesScore | Rotten Tomatoes rating (percentage between 0 and 100) |
| NumVotesIMDb | Number of user votes on IMDb platform |
| NumVotesRT | Number of user votes on Rotten Tomatoes platform |
| Director | Synthetic name of movie director |
| LeadActor | Synthetic name of lead actor |
Load the dataset in your preferred data analysis tool to: - Explore trends in movie production, box office, and ratings over time - Analyze the impact of budget and talent on movie success - Segment movies by genre, decade, or country - Build predictive models or dashboards highlighting key performance indicators
This dataset was synthetically generated for educational and demonstration purposes, inspired by real-world movie industry datasets like IMDb and Box Office Mojo.
Feel free to contact the author for questions or collaboration!
Facebook
TwitterBy data.world's Admin [source]
This dataset contains data used to analyze the uniquely popular business types in the neighborhoods of Seattle and New York City. We used publically available neighborhood-level shapefiles to identify neighborhoods, and then crossed that information against Yelp's Business Category API to find businesses operating within each neighborhood. The ratio of businesses from each category was studied in comparison to their ratios in the entire city to determine any significant differences between each borough.
Any single business with more than one category was repeated for each one, however none of them were ever recorded twice for any single category. Moreover, if a certain business type didn't make up at least 1% of a particular neighborhood's businesses overall it was removed from the analysis altogether.
The data available here is free to use under MIT license, with appropriate attribution given back to Yelp for providing this information. It is an invaluable resource for researchers across different disciplines looking into consumer behavior or clustering within urban areas!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use This Dataset
To get started using this dataset: - Download the appropriate file for the area you’re researching - either salt5_Seattle.csv or top5_NewYorkCity.csv - from the Kaggle site which hosts this dataset (https://www.kaggle.com/puddingmagazine/uniquely-popular-businesses). - Read through each columns information available under Columns section associated with this kaggle description (above).
- Take note of columns that are relevant to your analysis such as nCount which indicates the number of businesses in a neighborhood, rank which shows how popular that business type is overall and neighborhoodTotal which specifies total number of businesses in a particular neighborhood etc.,
- ) Load your selected file into an application designed for data analysis such as Jupyter Notebook, Microsoft Excel, Power BI etc.,
- ) Begin performing various analyses related to understanding where certain types of unique business are most common by subsetting rows based on specific neighborhoods; alternatively perform regressions-based analyses related to trends similar unique type's ranks over multiple neighborhoods etc.,If you have any questions about interpreting data from this source please reach out if needed!
- Analyzing the unique business trends in Seattle and New York City to identify potential investment opportunities.
- Creating a tool that helps businesses understand what local competitions they face by neighborhood.
- Exploring the distinctions between neighborhoods by plotting out the different businesses they have in comparison with each other and other cities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: top5_Seattle.csv | Column name | Description | |:----------------------|:----------------------------------------------------------------------------------------------------------------------------------| | neighborhood | Name of the neighborhood. (String) | | yelpAlias | The Yelp-specified Alias for the business type. (String) | | yelpTitle | The Title given to this business type by Yelp. (String) | | nCount | Number of businesses with this type within a particular neighborhood. (Integer) | | neighborhoodTotal | Total number of businesses located within that particular region. (Integer) | | cCount | Number of businesses with this storefront within an entire city. (Integer) | | cityTotal | Total number of all types of storefronts within an entire city. (Integer) ...
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a synthetic dataset generated to mimic real-world e-commerce return management scenarios. Since actual return data is often confidential and unavailable, this dataset was created with realistic assumptions around orders, products, customers, and return behaviors.
It can be used for:
Predictive modeling of return likelihood (classification problems).
Business analytics on profitability loss due to returns.
Sustainability analysis (CO₂ emissions and waste impact from reverse logistics).
📌 Dataset Features (Columns)
Order_ID → Unique order identifier.
Product_ID → Unique product identifier.
User_ID → Unique customer identifier.
Order_Date → Date when the order was placed.
Return_Date → Date when the product was returned (if returned).
Product_Category → Category of the product (e.g., Clothing, Electronics, Books, Toys, etc.).
Product_Price → Price of the product per unit.
Order_Quantity → Number of units purchased in the order.
Discount_Applied → Discount percentage applied on the product.
Return_Status → Whether the order was Returned or Not Returned.
Return_Reason → Reason for return (e.g., Damaged, Wrong Item, Changed Mind).
Days_to_Return → Number of days taken by customer to return (0 if not returned).
User_Age → Age of the customer.
User_Gender → Gender of the customer (Male/Female).
User_Location → City/region of the customer.
Payment_Method → Mode of payment (Credit Card, Debit Card, PayPal, Gift Card, etc.).
Shipping_Method → Chosen shipping type (Standard, Express, Next-Day).
Return_Cost → Estimated logistics cost incurred when a return happens.
Profit_Loss → Net profit or loss for the order, considering product price, discount, and return cost.
CO2_Saved → Estimated CO₂ emissions saved (if return avoided).
Waste_Avoided → Estimated physical waste avoided (in units/items).
💡 Use Cases
MBA & academic projects in Business Analytics and Supply Chain Management.
Training predictive models for return forecasting.
Measuring sustainability KPIs (CO₂ reduction, waste avoidance).
Dashboards in Power BI/Tableau for business decision-making.
Quick Start Example:
import pandas as pd
df = pd.read_csv("/kaggle/input/synthetic-ecommerce-returns/returns_sustainability_dataset.csv")
print(df.head())
print(df.info())
print(df['Return_Status'].value_counts(normalize=True))
category_returns = df.groupby('Product_Category')['Return_Status'].mean().sort_values(ascending=False) print(category_returns)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
*****Documentation Process***** 1. Data Preparation: - Upload the data into Power Query to assess quality and identify duplicate values, if any. - Verify data quality and types for each column, addressing any miswriting or inconsistencies. 2. Data Management: - Duplicate the original data sheet for future reference and label the new sheet as the "Working File" to preserve the integrity of the original dataset. 3. Understanding Metrics: - Clarify the meaning of column headers, particularly distinguishing between Impressions and Reach, and comprehend how Engagement Rate is calculated. - Engagement Rate formula: Total likes, comments, and shares divided by Reach. 4. Data Integrity Assurance: - Recognize that Impressions should outnumber Reach, reflecting total views versus unique audience size. - Investigate discrepancies between Reach and Impressions to ensure data integrity, identifying and resolving root causes for accurate reporting and analysis. 5. Data Correction: - Collaborate with the relevant team to rectify data inaccuracies, specifically addressing the discrepancy between Impressions and Reach. - Engage with the concerned team to understand the root cause of discrepancies between Impressions and Reach. - Identify instances where Impressions surpass Reach, potentially attributable to data transformation errors. - Following the rectification process, meticulously adjust the dataset to reflect the corrected Impressions and Reach values accurately. - Ensure diligent implementation of the corrections to maintain the integrity and reliability of the data. - Conduct a thorough recalculation of the Engagement Rate post-correction, adhering to rigorous data integrity standards to uphold the credibility of the analysis. 6. Data Enhancement: - Categorize Audience Age into three groups: "Senior Adults" (45+ years), "Mature Adults" (31-45 years), and "Adolescent Adults" (<30 years) within a new column named "Age Group." - Split date and time into separate columns using the text-to-columns option for improved analysis. 7. Temporal Analysis: - Introduce a new column for "Weekend and Weekday," renamed as "Weekday Type," to discern patterns and trends in engagement. - Define time periods by categorizing into "Morning," "Afternoon," "Evening," and "Night" based on time intervals. 8. Sentiment Analysis: - Populate blank cells in the Sentiment column with "Mixed Sentiment," denoting content containing both positive and negative sentiments or ambiguity. 9. Geographical Analysis: - Group countries and obtain additional continent data from an online source (e.g., https://statisticstimes.com/geography/countries-by-continents.php). - Add a new column for "Audience Continent" and utilize XLOOKUP function to retrieve corresponding continent data.
*****Drawing Conclusions and Providing a Summary*****
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains detailed information about companies listed on the Pakistan Stock Exchange (PSX). The PSX is the premier stock exchange in Pakistan, where companies from various sectors are publicly listed for trading. The data was scraped from the official PSX website and includes essential information about each listed company, its representative, and contact details. This dataset can be valuable for anyone interested in financial markets, business research, or investment opportunities within Pakistan.
The dataset contains the following columns:
The dataset includes companies from a wide variety of sectors, reflecting the diversity of industries on the PSX. Some key sectors include: - Automobile Assembler - Cement - Commercial Banks - Fertilizer - Food & Personal Care Products - Pharmaceuticals - Technology & Communication - Textile Composite
And many more, totaling 37 different sectors.
This dataset can be used for multiple purposes: 1. Financial Analysis: Explore the performance of different sectors and companies listed on the PSX. 2. Investment Research: Identify key players in different industries for investment opportunities. 3. Business Development: Build contact lists for companies within a specific sector. 4. Data Science & Machine Learning Projects: Use this dataset for clustering, classification, or sentiment analysis in financial markets.
The dataset is available in CSV format, making it easy to load into data analysis tools like Pandas, Excel, or Power BI. It's structured for easy exploration and can be integrated into financial models or research projects.
The data was scraped from the official PSX website using a custom Python script. Special thanks to the open-source community for tools like Selenium, BeautifulSoup, and Pandas, which made this project possible.
This dataset is provided for educational and research purposes. Please give proper attribution when using this dataset in your work.
Feel free to explore, analyze, and share your insights!
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains information about malls and retail stores in Riyadh, Saudi Arabia. It includes key details such as names, categories, number of ratings, average ratings, and geographical coordinates. The dataset is useful for businesses, researchers, and developers working on market analysis, geospatial applications, and retail business strategies.
The dataset consists of the following columns:
| Column Name | Data Type | Description |
|---|---|---|
| Name | string | Name of the mall or retail store |
| Type_of_Utility | string | Category of the place (e.g., shopping mall, clothing store) |
| Number_of_Ratings | integer | Total number of reviews received |
| Rating | float | Average rating score (scale: 0-5) |
| Longitude | float | Geographical longitude coordinate |
| Latitude | float | Geographical latitude coordinate |
For questions or collaboration, reach out via Kaggle comments or email.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Info
This is a dataset I was given to solve for an interview for a transactions company. It is perfect to practice DAX measures. Dataset: anonymized sample of credit card deposit attempts over a 12-month period. Main problem: It shows a longitudinally decreasing approval rate from 10/1/2020 to 9/26/2021. Note: This means that the approval rate for credit card deposit attempts has been declining over this time period
TOOL You can do this with any tool you like. I used PowerBI and I consider it to be one of the best to solve this exercise.
PARAMETER DESCRIPTIONS
Appr? = Deposit attempts '1' or '0' = approved or declined. CustomerID Co Website = online divisions to which the deposit attempt is directed. Processing Co = credit card processing company that is processing the transaction. (nb: besides processing companies, a few fraud risk filters are also included here). Issuing Bank = bank that has issued the customer's credit card. Amount Attempt Timestamp
QUESTIONS (Qs 1-5 & 8 worth 10 points. Qs 6-7 worth 20 points. Total = 100 points)
1) What is the dataset's approval rate by quarter?
2)How many customers attempted a deposit of $50 in Sept 2021?
3)How much did the group identified in QUESTION 2 successfully deposit during the month?
4)Of the Highest Approval Rate for top 10 banks with the most deposit attempts between $150.00 and $999.99 in 2021?
5)Without performing any analysis, which two parameters would you suspect of causing the successive quarterly decrease in approval rate? Why?
6)Identify and describe 2 main causal factors of the decline in approval rates seen in Q3 2021 vs Q4 2020?
7)Choose one of the main factors identified in QUESTION 6. How much of the approval rate decline seen in Q3 2021 vs Q4 2020 is explained by this factor?
8) If you had more time, which other analyses would you like to perform on this dataset to identify additional causal factors to those identified in QUESTION 6
POWERBI TIPS:
• Try to add the least number of columns. There is no problem with this data but with big datasets more data means slower performance. Make DAX measures instead. 2 • Redefine each question: Picture how to display and make it on the PowerBI. Write what you´ll do. Ex: 1) What is the dataset's approval rate by quarter? = line graph, title = “Approval rate by quarter”. X axis= quarters, y axis = approval rate. • Define each column data type on the PowerBI not the query. This error persists over the years, you may define the type on the query but once you load it changes to the default. • In most datasets add the calendar table. Very useful • GREAT TIP: try to apply the less amount of filters to the visual and use calculated measures instead. You will need them in the future. As the questions start to be more complex • I use this rule for all my reports. Measures starting with "Total" are unfiltered. This means, no matter what the filter they should always be the same. You will use them a lot.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a beginner-friendly SQLite database designed to help users practice SQL and relational database concepts. The dataset represents a basic business model inspired by NVIDIA and includes interconnected tables covering essential aspects like products, customers, sales, suppliers, employees, and projects. It's perfect for anyone new to SQL or data analytics who wants to learn and experiment with structured data.
Includes details of 15 products (e.g., GPUs, AI accelerators). Attributes: product_id, product_name, category, release_date, price.
Lists 20 fictional customers with their industry and contact information. Attributes: customer_id, customer_name, industry, contact_email, contact_phone.
Contains 100 sales records tied to products and customers. Attributes: sale_id, product_id, customer_id, sale_date, region, quantity_sold, revenue.
Features 50 suppliers and the materials they provide. Attributes: supplier_id, supplier_name, material_supplied, contact_email.
Tracks materials supplied to produce products, proportional to sales. Attributes: supply_chain_id, supplier_id, product_id, supply_date, quantity_supplied.
Lists 5 departments within the business. Attributes: department_id, department_name, location.
Contains data on 30 employees and their roles in different departments. Attributes: employee_id, first_name, last_name, department_id, hire_date, salary.
Describes 10 projects handled by different departments. Attributes: project_id, project_name, department_id, start_date, end_date, budget.
Number of Tables: 8 Total Rows: Around 230 across all tables, ensuring quick queries and easy exploration.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This Hospital Patient Treatment Dataset contains simulated data representing patient treatments in a multi-specialty hospital. It includes 200 records with both categorical and numerical fields, designed for data visualization and analysis using tools like Power BI, Tableau, or Python.
The dataset features 3–4 categorical fields: Department (e.g., Cardiology, Neurology, Orthopedics), Treatment Type (Surgery, Medication, Therapy, Observation), Doctor Name, and Gender. It also includes numerical fields such as Age, Treatment Cost, Hospital Stay (Days), and Recovery Score (ranging from 0 to 100).
To get more information about Columns visit: https://colorstech.net/practice-datasets/hospital-patient-treatment-dataset-for-analysis/
This dataset is ideal for healthcare analysts and data enthusiasts who want to practice analyzing treatment efficiency, patient demographics, cost effectiveness, and healthcare outcomes. Potential analyses include cost comparisons by department, gender-based treatment patterns, doctor performance based on recovery scores, and identifying which treatments lead to faster recovery.
It can help simulate real-world healthcare reporting scenarios, such as understanding hospital load, cost optimization, or patient well-being tracking. The inclusion of unique Patient ID allows for easy referencing and segmentation.
This dataset is well-suited for creating KPIs, dashboards, and advanced visualizations to gain insights into hospital operations and patient care outcomes. No real patient data is used—this is a synthetic dataset for educational use only.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains information about supermarkets, grocery stores, and convenience stores in Riyadh, Saudi Arabia. It includes key details such as store names, types, ratings, and geographic coordinates. The dataset is useful for market analysis, business intelligence, and location-based services.
The dataset consists of the following columns:
| Column Name | Data Type | Description |
|---|---|---|
| Name | string | Name of the supermarket or grocery store |
| Type_of_Utility | string | Category of the store (e.g., supermarket, grocery store, convenience store) |
| Number_of_Ratings | integer | Total number of reviews received |
| Rating | float | Average rating score (scale: 0-5) |
| Longitude | float | Geographical longitude coordinate |
| Latitude | float | Geographical latitude coordinate |
For questions or collaboration, reach out via Kaggle comments or email.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains information about metro stations in Riyadh, Saudi Arabia. It includes details such as station names, types, ratings, and geographic coordinates. The dataset is valuable for transportation analysis, urban planning, and navigation applications.
The dataset consists of the following columns:
| Column Name | Data Type | Description |
|---|---|---|
| Name | string | Name of the metro station |
| Type_of_Utility | string | Type of station (Metro Station) |
| Number_of_Ratings | float | Total number of reviews received (some values may be missing) |
| Rating | float | Average rating score (scale: 0-5, some values may be missing) |
| Longitude | float | Geographical longitude coordinate |
| Latitude | float | Geographical latitude coordinate |
For questions or collaboration, reach out via Kaggle comments or email.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dive into the ultimate treasure trove for football enthusiasts, data analysts, and gaming aficionados! The Football Manager Players Dataset is a comprehensive collection of player data extracted from a popular football management simulation game, offering an unparalleled look into the virtual world of football talent. This dataset includes detailed attributes for thousands of players across multiple leagues worldwide, making it a goldmine for analyzing player profiles, scouting virtual stars, and building predictive models for football strategies.
Whether you're a data scientist exploring sports analytics, a football fan curious about your favorite virtual players, or a game developer seeking inspiration, this dataset is your ticket to unlocking endless possibilities!
This dataset is a meticulously curated compilation of player statistics from five CSV files, merged into a single, unified dataset (merged_players.csv). It captures a diverse range of attributes for players from various clubs, nations, and leagues, including top-tier competitions like the English Premier Division, Argentina's Premier Division, and lower divisions across the globe.
merged_players.csv (UTF-8 encoded for compatibility with special characters).merged_players.csv and load it into your favorite tool (Python/pandas, R, Excel, etc.).Transfer Value, Position, and Media Description to start your analysis.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Regarding this dataset, Netflix is among the most popular websites for streaming movies and videos. They have more than 200 million members globally as of the middle of 2021, and their platform offers over 8,000 movies and TV shows. This tabular dataset contains listings of all the movies and TV shows available on Netflix, together with details about the actors, directors, ratings, length, year of release, and other details.
Content Trends Over Time - Examine the annual changes in Netflix's movie and TV show counts. 2. Genre Popularity - Discover the most popular genres and how their popularity changes by location or year. 3. Country Insights - Find out which nations produce the most shows and what kinds of content they contribute. 4. Ratings Distribution - Show how the mature ratings (G, PG, R, TV-MA) are distributed throughout Netflix material. 5. Best Directors & Actors - Find the actors or directors who show up on Netflix the most.
Create a content-based recommender by utilizing genres and title descriptions in the Recommendation System Prototype. 2. Text Analysis on Descriptions - Apply natural language processing (NLP) to identify trends in the way Netflix characterizes its material using terms like "crime," "adventure," and "love." 3. Classification Models - Use metadata to determine if a title is a movie or a TV show. Using genres, lengths, and descriptions, group films and television series into clusters. 5. Trend Forecasting - Forecast future growth in the Netflix library using time-series analysis.
Understand the Data (Initial Exploration)
Data Cleaning & Preprocessing
date_added).Exploratory Data Analysis (EDA)
Visualization & Storytelling
Advanced Analysis / Data Science Tasks
Insights & Reporting
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains information about restaurants and cafes in Riyadh, Saudi Arabia. It includes details such as business names, types, ratings, and geographic coordinates. The dataset is useful for food industry analysis, customer preference insights, and location-based recommendations.
The dataset consists of the following columns:
| Column Name | Data Type | Description |
|---|---|---|
| Name | string | Name of the restaurant or cafe |
| Type_of_Utility | string | Category of the place (e.g., restaurant, cafe) |
| Number_of_Ratings | float | Total number of reviews received |
| Rating | float | Average rating score (scale: 0-5) |
| Longitude | float | Geographical longitude coordinate |
| Latitude | float | Geographical latitude coordinate |
For questions or collaboration, reach out via Kaggle comments or email.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterHR Analysis BI Dashboard performance appraisal and employee wellbeing The HR department plays several roles in the enterprise. HR acts as a mediator or a bridge between the employees and the management or enterprise. It’s no surprise that the HR department is already burdened with work. Providing them access to the latest technology and the means to derive insights in real time will help reduce the workload and create a healthy organizational environment.
Problem Statement: Market fluctuations and rapidly changing technology have affected the global market. Many published reports showed that around half of the employees wanted to change jobs. While some market researchers said that flexible working and job security were their primary factors, few admitted that a higher salary was their aim.
Different regions saw an increase and a decrease in salaries over the years. While the increase was to retain top-level professional employees, the pay cuts were due to market fluctuations and were resorted after the market conditions improved. HR people across the globe are hiring new employees, trying to retain and understand the needs of employees who got separated (those who left the company).
So, how does the HR department make these decisions in volatile market conditions? They rely on HR analytics to understand the existing situation and develop a new modern approach. For this requirement, you have been asked in your company to build a dashboard in Power BI considering the following challenges of HR people and provide an effective way to find the answers to their day-to-day questions.
Tasks: Use the HR data set for this project and analyze that to understand the data and terms.
Load data into the Power BI Query Editor and perform the required actions.
Establish the required relationships.
Create the required DAX columns and measures for calculation