20 datasets found
  1. HR Analysis - Power BI Dashboard

    • kaggle.com
    zip
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramy Elbouhy (2025). HR Analysis - Power BI Dashboard [Dataset]. https://www.kaggle.com/datasets/ramyelbouhy/hr-analysis-power-bi-dashboard/discussion
    Explore at:
    zip(21491703 bytes)Available download formats
    Dataset updated
    Apr 11, 2025
    Authors
    Ramy Elbouhy
    Description

    HR Analysis BI Dashboard performance appraisal and employee wellbeing The HR department plays several roles in the enterprise. HR acts as a mediator or a bridge between the employees and the management or enterprise. It’s no surprise that the HR department is already burdened with work. Providing them access to the latest technology and the means to derive insights in real time will help reduce the workload and create a healthy organizational environment.

    Problem Statement: Market fluctuations and rapidly changing technology have affected the global market. Many published reports showed that around half of the employees wanted to change jobs. While some market researchers said that flexible working and job security were their primary factors, few admitted that a higher salary was their aim.

    Different regions saw an increase and a decrease in salaries over the years. While the increase was to retain top-level professional employees, the pay cuts were due to market fluctuations and were resorted after the market conditions improved. HR people across the globe are hiring new employees, trying to retain and understand the needs of employees who got separated (those who left the company).

    So, how does the HR department make these decisions in volatile market conditions? They rely on HR analytics to understand the existing situation and develop a new modern approach. For this requirement, you have been asked in your company to build a dashboard in Power BI considering the following challenges of HR people and provide an effective way to find the answers to their day-to-day questions.

    Tasks: Use the HR data set for this project and analyze that to understand the data and terms.

    Load data into the Power BI Query Editor and perform the required actions.

    Establish the required relationships.

    Create the required DAX columns and measures for calculation

  2. d

    GP Practice Prescribing Presentation-level Data - July 2014

    • digital.nhs.uk
    csv, zip
    Updated Oct 31, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). GP Practice Prescribing Presentation-level Data - July 2014 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/practice-level-prescribing-data
    Explore at:
    csv(1.4 GB), zip(257.7 MB), csv(1.7 MB), csv(275.8 kB)Available download formats
    Dataset updated
    Oct 31, 2014
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Jul 1, 2014 - Jul 31, 2014
    Area covered
    United Kingdom
    Description

    Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.

  3. Bike Sales

    • kaggle.com
    zip
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Reda Elblgihy (2024). Bike Sales [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/bike-sales
    Explore at:
    zip(207203 bytes)Available download formats
    Dataset updated
    Feb 24, 2024
    Authors
    Ali Reda Elblgihy
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    1. Data Import and Quality Check:
    2. Utilize Power Query to import the dataset.
    3. Conduct a thorough examination of the dataset to identify any anomalies or inconsistencies.

    4. Duplicate Removal:

    5. Identify and remove duplicate rows within the dataset.

    6. Ensure data integrity by eliminating redundant entries.

    7. Standardization of Marital Status and Gender:

    8. Replace 'M' with 'Married' and 'S' with 'Single' in the Marital Status column.

    9. Standardize gender data by replacing 'M' with 'Male' and 'F' with 'Female'.

    10. Commute Distance Standardization:

    11. Modify "10+ Miles" to "Above 10 Miles" for uniformity.

    12. Arrange Commute Distance in ascending order to facilitate analysis.

    13. Age Group Classification:

    14. Introduce an additional column named "Age Group" for age categorization.

    15. Calculate ages from existing data, categorizing: - Below 30 years as "Young Adults". - Between 31 and 45 years as "Middle-aged Adults". - Above 45 years as "Old-aged Adults".

    6.Verification and Data Loading: - Validate all transformations to ensure accuracy and coherence. - Load the refined dataset back into Excel for further analysis.

  4. Hospital Bed Capacity and ICU Load Dataset

    • kaggle.com
    zip
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endernel (2025). Hospital Bed Capacity and ICU Load Dataset [Dataset]. https://www.kaggle.com/datasets/endernel/capacity-of-hospitals-bed
    Explore at:
    zip(1526 bytes)Available download formats
    Dataset updated
    Nov 14, 2025
    Authors
    Endernel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset represents a large tertiary-care hospital with 25 clinical departments and a total capacity of 5500 inpatient beds, including 1080 ICU beds. All data is fully synthetic and created for educational and analytical purposes. It is designed to support:

    • hospital capacity dashboards,
    • healthcare data analytics,
    • emergency surge simulations,
    • opеrational decision-making models,
    • business intelligence (Power BI, Tableau, Looker) projects.

    The dataset approximates real-world bed allocation patterns across major clinical specialties such as Emergency Care, Surgery, Pediatrics, ICU, Oncology, and Long-Term Care.

    To maintain realism, departments have varying occupancy levels: - some are under low load (free capacity), - some operate under normal/medium load, - several are intentionally modeled as overloaded/high occupancy, to reflect real hospital dynamics.

    All metrics simulate plausible hospital operations.

  5. Project Data analysis using excel

    • kaggle.com
    zip
    Updated Jul 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Samir (2023). Project Data analysis using excel [Dataset]. https://www.kaggle.com/datasets/ahmedsamir11111/project-data-analysis-using-excel/discussion
    Explore at:
    zip(4912987 bytes)Available download formats
    Dataset updated
    Jul 2, 2023
    Authors
    Ahmed Samir
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    In the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, I had to ask questions that could help extract and explore information that would help decision-makers improve and evaluate performance. But before that, I did some operations in the data to help me to analyze it accurately: 1- Understand the data. 2- Clean the data “By power query”. 3- insert some calculation and columns like “COGS” cost of goods sold by power query. 4- Modeling the data and adding some measures and other columns to help me in analysis. Then I asked these questions: To Enhance Customer Loyalty What is the most used ship mode by our customer? Who are our top 5 customers in terms of sales and order frequency? To monitor our strength and weak points Which segment of clients generates the most sales? Which city has the most sales value? Which state generates the most sales value? Performance measurement What are the top performing product categories in terms of sales and profit? What is the most profitable product that we sell? What is the lowest profitable product that we sell? Customer Experience On Average how long does it take the orders to reach our clients? Based on each Shipping Mode

    Then started extracting her summaries and answers from the pivot tables and designing the data graphics in a dashboard for easy communication and reading of the information as well. And after completing these operations, I made some calculations related to the KPI to calculate the extent to which sales officials achieved and the extent to which they achieved the target.

  6. Bikes Buyer Data Analysis using Excel

    • kaggle.com
    zip
    Updated Aug 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Samir (2023). Bikes Buyer Data Analysis using Excel [Dataset]. https://www.kaggle.com/datasets/ahmedsamir11111/bikes-buyer-data-analysis-using-excel
    Explore at:
    zip(2569195 bytes)Available download formats
    Dataset updated
    Aug 12, 2023
    Authors
    Ahmed Samir
    Description

    In the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, I had to ask questions that could help extract and explore information that would help decision-makers improve and evaluate performance. But before that, I did some operations in the data to help me to analyze it accurately: 1- Understand the data. 2- Clean the data “By power query”. 3- insert some calculation and columns by power query. 4- Analysis to the data and Ask some Questions About Distribution What is the Number of Bikes Sold? What is the most region purchasing bikes? What is the Ave. income by gender & purchasing bikes? The Miles with Purchasing bikes? What is situation to age by purchasing & Count of bikes sold? About Consumer Behavior Home Owner by purchasing? Single or married & Age by purchasing? Having cars by purchasing? Education By purchasing? Occupation By purchasing?

    And I notice the Most Situations Purchasing Bikes is: - North America “Region”. - Commute Distance 0-1 Miles. - The people who are in the middle age and single "169 Bikes". - People that having Bachelor's degree. - The Males who have the average income 60,124$. - People that having Professional occupation. - Home owners “325 Bikes”. - People who having 0 or 1 car. So, I Advise The give those slices more offers to increase the sell value.

  7. Movie Dataset for Analytics & Visualization

    • kaggle.com
    zip
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubham R Pawar (2025). Movie Dataset for Analytics & Visualization [Dataset]. https://www.kaggle.com/datasets/mjshubham21/movie-dataset-for-analytics-and-visualization/data
    Explore at:
    zip(65136772 bytes)Available download formats
    Dataset updated
    Sep 8, 2025
    Authors
    Shubham R Pawar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic Movie Dataset for Analytics and Visualisation

    Dataset Overview

    This dataset contains 999,999 rows of synthetic movie data designed to simulate real-world movie industry metrics and characteristics, with a variety of numeric, categorical, and date fields.

    • Movies span from 1950 through 2025, with random release dates generated within each release year.
    • Contains 8 major genres (Drama, Action, Comedy, Thriller, Romance, Sci-Fi, Horror, Documentary) distributed realistically.
    • Detailed financial data including production budgets, US box office revenue, and global box office revenue, alongside opening day and first week US sales.
    • Audience engagement data reflected by IMDb ratings, Rotten Tomatoes scores, and corresponding user vote counts on both platforms.
    • Contains realistic correlations such as budget impacting box office earnings, box office affecting sales figures, and ratings correlating with vote counts.
    • Includes synthetic but realistic director and lead actor names to enable people-centric analyses.

    The dataset is ideal for: - Analytical reporting and dashboarding (Power BI, Tableau, Excel) - Exploratory data analysis (EDA), machine learning model development, and visualisation exercises - Understanding relationships between movie metrics, ratings, release timing, and personnel - Building interactive dashboards with filters like genre, release year, and country

    Column Descriptions

    Column NameDescription
    MovieIDUnique identifier for each movie (integer from 1 to 999,999)
    TitleSynthetic movie title with natural language style
    GenrePrimary movie genre (Drama, Action, Comedy, etc.)
    ReleaseYearYear of release (1950 to 2025)
    ReleaseDateRandomised full release date within the release year (YYYY-MM-DD)
    CountryCountry of production origin
    BudgetUSDEstimated production budget in US dollars (range $100k to $300 million)
    US_BoxOfficeUSDGross box office revenue from the US market
    Global_BoxOfficeUSDTotal global box office revenue
    Opening_Day_SalesUSDEstimated US ticket sales revenue on opening day
    One_Week_SalesUSDEstimated US ticket sales revenue in first week
    IMDbRatingIMDb rating on a 1.0 to 10.0 scale
    RottenTomatoesScoreRotten Tomatoes rating (percentage between 0 and 100)
    NumVotesIMDbNumber of user votes on IMDb platform
    NumVotesRTNumber of user votes on Rotten Tomatoes platform
    DirectorSynthetic name of movie director
    LeadActorSynthetic name of lead actor

    How to Use

    Load the dataset in your preferred data analysis tool to: - Explore trends in movie production, box office, and ratings over time - Analyze the impact of budget and talent on movie success - Segment movies by genre, decade, or country - Build predictive models or dashboards highlighting key performance indicators

    Citation

    This dataset was synthetically generated for educational and demonstration purposes, inspired by real-world movie industry datasets like IMDb and Box Office Mojo.

    Feel free to contact the author for questions or collaboration!

  8. Uniquely Popular Businesses

    • kaggle.com
    zip
    Updated Jan 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Uniquely Popular Businesses [Dataset]. https://www.kaggle.com/datasets/thedevastator/uniquely-popular-businesses
    Explore at:
    zip(48480 bytes)Available download formats
    Dataset updated
    Jan 22, 2023
    Authors
    The Devastator
    Description

    Uniquely Popular Businesses

    Rankings of Business Categories in Seattle & NYC Neighborhoods

    By data.world's Admin [source]

    About this dataset

    This dataset contains data used to analyze the uniquely popular business types in the neighborhoods of Seattle and New York City. We used publically available neighborhood-level shapefiles to identify neighborhoods, and then crossed that information against Yelp's Business Category API to find businesses operating within each neighborhood. The ratio of businesses from each category was studied in comparison to their ratios in the entire city to determine any significant differences between each borough.

    Any single business with more than one category was repeated for each one, however none of them were ever recorded twice for any single category. Moreover, if a certain business type didn't make up at least 1% of a particular neighborhood's businesses overall it was removed from the analysis altogether.

    The data available here is free to use under MIT license, with appropriate attribution given back to Yelp for providing this information. It is an invaluable resource for researchers across different disciplines looking into consumer behavior or clustering within urban areas!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to Use This Dataset

    To get started using this dataset: - Download the appropriate file for the area you’re researching - either salt5_Seattle.csv or top5_NewYorkCity.csv - from the Kaggle site which hosts this dataset (https://www.kaggle.com/puddingmagazine/uniquely-popular-businesses). - Read through each columns information available under Columns section associated with this kaggle description (above).
    - Take note of columns that are relevant to your analysis such as nCount which indicates the number of businesses in a neighborhood, rank which shows how popular that business type is overall and neighborhoodTotal which specifies total number of businesses in a particular neighborhood etc.,
    - ) Load your selected file into an application designed for data analysis such as Jupyter Notebook, Microsoft Excel, Power BI etc.,
    - ) Begin performing various analyses related to understanding where certain types of unique business are most common by subsetting rows based on specific neighborhoods; alternatively perform regressions-based analyses related to trends similar unique type's ranks over multiple neighborhoods etc.,

    If you have any questions about interpreting data from this source please reach out if needed!

    Research Ideas

    • Analyzing the unique business trends in Seattle and New York City to identify potential investment opportunities.
    • Creating a tool that helps businesses understand what local competitions they face by neighborhood.
    • Exploring the distinctions between neighborhoods by plotting out the different businesses they have in comparison with each other and other cities

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: top5_Seattle.csv | Column name | Description | |:----------------------|:----------------------------------------------------------------------------------------------------------------------------------| | neighborhood | Name of the neighborhood. (String) | | yelpAlias | The Yelp-specified Alias for the business type. (String) | | yelpTitle | The Title given to this business type by Yelp. (String) | | nCount | Number of businesses with this type within a particular neighborhood. (Integer) | | neighborhoodTotal | Total number of businesses located within that particular region. (Integer) | | cCount | Number of businesses with this storefront within an entire city. (Integer) | | cityTotal | Total number of all types of storefronts within an entire city. (Integer) ...

  9. Synthetic E-Commerce Returns Management Dataset

    • kaggle.com
    zip
    Updated Sep 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sowmihari (2025). Synthetic E-Commerce Returns Management Dataset [Dataset]. https://www.kaggle.com/datasets/sowmihari/returns-management
    Explore at:
    zip(203957 bytes)Available download formats
    Dataset updated
    Sep 3, 2025
    Authors
    Sowmihari
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a synthetic dataset generated to mimic real-world e-commerce return management scenarios. Since actual return data is often confidential and unavailable, this dataset was created with realistic assumptions around orders, products, customers, and return behaviors.

    It can be used for:

    Predictive modeling of return likelihood (classification problems).

    Business analytics on profitability loss due to returns.

    Sustainability analysis (CO₂ emissions and waste impact from reverse logistics).

    📌 Dataset Features (Columns)

    Order_ID → Unique order identifier.

    Product_ID → Unique product identifier.

    User_ID → Unique customer identifier.

    Order_Date → Date when the order was placed.

    Return_Date → Date when the product was returned (if returned).

    Product_Category → Category of the product (e.g., Clothing, Electronics, Books, Toys, etc.).

    Product_Price → Price of the product per unit.

    Order_Quantity → Number of units purchased in the order.

    Discount_Applied → Discount percentage applied on the product.

    Return_Status → Whether the order was Returned or Not Returned.

    Return_Reason → Reason for return (e.g., Damaged, Wrong Item, Changed Mind).

    Days_to_Return → Number of days taken by customer to return (0 if not returned).

    User_Age → Age of the customer.

    User_Gender → Gender of the customer (Male/Female).

    User_Location → City/region of the customer.

    Payment_Method → Mode of payment (Credit Card, Debit Card, PayPal, Gift Card, etc.).

    Shipping_Method → Chosen shipping type (Standard, Express, Next-Day).

    Return_Cost → Estimated logistics cost incurred when a return happens.

    Profit_Loss → Net profit or loss for the order, considering product price, discount, and return cost.

    CO2_Saved → Estimated CO₂ emissions saved (if return avoided).

    Waste_Avoided → Estimated physical waste avoided (in units/items).

    💡 Use Cases

    MBA & academic projects in Business Analytics and Supply Chain Management.

    Training predictive models for return forecasting.

    Measuring sustainability KPIs (CO₂ reduction, waste avoidance).

    Dashboards in Power BI/Tableau for business decision-making.

    Quick Start Example:

    Import libraries

    import pandas as pd

    Load dataset (replace with Kaggle path)

    df = pd.read_csv("/kaggle/input/synthetic-ecommerce-returns/returns_sustainability_dataset.csv")

    View first 5 rows

    print(df.head())

    Summary of columns

    print(df.info())

    Check return distribution

    print(df['Return_Status'].value_counts(normalize=True))

    Example: Average return rate by product category

    category_returns = df.groupby('Product_Category')['Return_Status'].mean().sort_values(ascending=False) print(category_returns)

  10. Social Media Engagement Report

    • kaggle.com
    zip
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Reda Elblgihy (2024). Social Media Engagement Report [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/social-media-engagement-report
    Explore at:
    zip(49114657 bytes)Available download formats
    Dataset updated
    Apr 13, 2024
    Authors
    Ali Reda Elblgihy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    *****Documentation Process***** 1. Data Preparation: - Upload the data into Power Query to assess quality and identify duplicate values, if any. - Verify data quality and types for each column, addressing any miswriting or inconsistencies. 2. Data Management: - Duplicate the original data sheet for future reference and label the new sheet as the "Working File" to preserve the integrity of the original dataset. 3. Understanding Metrics: - Clarify the meaning of column headers, particularly distinguishing between Impressions and Reach, and comprehend how Engagement Rate is calculated. - Engagement Rate formula: Total likes, comments, and shares divided by Reach. 4. Data Integrity Assurance: - Recognize that Impressions should outnumber Reach, reflecting total views versus unique audience size. - Investigate discrepancies between Reach and Impressions to ensure data integrity, identifying and resolving root causes for accurate reporting and analysis. 5. Data Correction: - Collaborate with the relevant team to rectify data inaccuracies, specifically addressing the discrepancy between Impressions and Reach. - Engage with the concerned team to understand the root cause of discrepancies between Impressions and Reach. - Identify instances where Impressions surpass Reach, potentially attributable to data transformation errors. - Following the rectification process, meticulously adjust the dataset to reflect the corrected Impressions and Reach values accurately. - Ensure diligent implementation of the corrections to maintain the integrity and reliability of the data. - Conduct a thorough recalculation of the Engagement Rate post-correction, adhering to rigorous data integrity standards to uphold the credibility of the analysis. 6. Data Enhancement: - Categorize Audience Age into three groups: "Senior Adults" (45+ years), "Mature Adults" (31-45 years), and "Adolescent Adults" (<30 years) within a new column named "Age Group." - Split date and time into separate columns using the text-to-columns option for improved analysis. 7. Temporal Analysis: - Introduce a new column for "Weekend and Weekday," renamed as "Weekday Type," to discern patterns and trends in engagement. - Define time periods by categorizing into "Morning," "Afternoon," "Evening," and "Night" based on time intervals. 8. Sentiment Analysis: - Populate blank cells in the Sentiment column with "Mixed Sentiment," denoting content containing both positive and negative sentiments or ambiguity. 9. Geographical Analysis: - Group countries and obtain additional continent data from an online source (e.g., https://statisticstimes.com/geography/countries-by-continents.php). - Add a new column for "Audience Continent" and utilize XLOOKUP function to retrieve corresponding continent data.

    *****Drawing Conclusions and Providing a Summary*****

    • The data is equally distributed across different categories, platforms, and over the years.
    • Most of our audience comprises senior adults (aged 45 and above).
    • Most of our audience exhibit mixed sentiments about our posts. However, an equal portion expresses consistent sentiments.
    • The majority of our posts were located in Africa.
    • The number of posts increased from the first year to the second year and remained relatively consistent for the third year.
    • The optimal time for posting is during the night on weekdays.
    • The highest engagement rates were observed in Croatia then Malawi.
    • The number of posts targeting senior adults is significantly higher than the other two categories. However, the engagement rates for mature and adolescent adults are also noteworthy, based on the number of targeted posts.
  11. Pakistan Stock Exchange Listed Companies Dataset

    • kaggle.com
    zip
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nauman Ali Murad (2024). Pakistan Stock Exchange Listed Companies Dataset [Dataset]. https://www.kaggle.com/datasets/naumanalimurad/psx-listed-companies-dataset
    Explore at:
    zip(95145 bytes)Available download formats
    Dataset updated
    Oct 10, 2024
    Authors
    Nauman Ali Murad
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    PSX Listed Companies Dataset

    Overview

    This dataset contains detailed information about companies listed on the Pakistan Stock Exchange (PSX). The PSX is the premier stock exchange in Pakistan, where companies from various sectors are publicly listed for trading. The data was scraped from the official PSX website and includes essential information about each listed company, its representative, and contact details. This dataset can be valuable for anyone interested in financial markets, business research, or investment opportunities within Pakistan.

    Data Features

    The dataset contains the following columns:

    • Representative: Name of the company representative.
    • Designation: The representative's role or title within the company.
    • Company: Name of the listed company.
    • Address: Official mailing address of the company.
    • Phone: Primary contact number.
    • Phone 2: Additional contact number (if available).
    • Fax: Fax number (if available).
    • Date of Listing: The date when the company was listed on the PSX.
    • Email: The contact email address of the representative or company.
    • URL: The company's website.
    • Registrar Details: Information about the company's registrar, which handles the company's share-related matters.

    Sectors Represented

    The dataset includes companies from a wide variety of sectors, reflecting the diversity of industries on the PSX. Some key sectors include: - Automobile Assembler - Cement - Commercial Banks - Fertilizer - Food & Personal Care Products - Pharmaceuticals - Technology & Communication - Textile Composite

    And many more, totaling 37 different sectors.

    Usage

    This dataset can be used for multiple purposes: 1. Financial Analysis: Explore the performance of different sectors and companies listed on the PSX. 2. Investment Research: Identify key players in different industries for investment opportunities. 3. Business Development: Build contact lists for companies within a specific sector. 4. Data Science & Machine Learning Projects: Use this dataset for clustering, classification, or sentiment analysis in financial markets.

    Data Format

    The dataset is available in CSV format, making it easy to load into data analysis tools like Pandas, Excel, or Power BI. It's structured for easy exploration and can be integrated into financial models or research projects.

    Acknowledgements

    The data was scraped from the official PSX website using a custom Python script. Special thanks to the open-source community for tools like Selenium, BeautifulSoup, and Pandas, which made this project possible.

    License

    This dataset is provided for educational and research purposes. Please give proper attribution when using this dataset in your work.

    Feel free to explore, analyze, and share your insights!

  12. Riyadh Mall

    • kaggle.com
    zip
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meshal Alsanari (2025). Riyadh Mall [Dataset]. https://www.kaggle.com/datasets/meshalalsanari/riyadh-mall
    Explore at:
    zip(61304 bytes)Available download formats
    Dataset updated
    Mar 20, 2025
    Authors
    Meshal Alsanari
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Riyadh
    Description

    Riyadh Malls Dataset

    Dataset Overview

    This dataset contains information about malls and retail stores in Riyadh, Saudi Arabia. It includes key details such as names, categories, number of ratings, average ratings, and geographical coordinates. The dataset is useful for businesses, researchers, and developers working on market analysis, geospatial applications, and retail business strategies.

    Dataset Contents

    The dataset consists of the following columns:

    Column NameData TypeDescription
    NamestringName of the mall or retail store
    Type_of_UtilitystringCategory of the place (e.g., shopping mall, clothing store)
    Number_of_RatingsintegerTotal number of reviews received
    RatingfloatAverage rating score (scale: 0-5)
    LongitudefloatGeographical longitude coordinate
    LatitudefloatGeographical latitude coordinate

    Potential Use Cases

    • Retail Market Analysis: Identify popular shopping malls and stores based on ratings and reviews.
    • Business Intelligence: Analyze retail performance and customer preferences.
    • Geospatial Applications: Develop shopping assistant and navigation services.
    • Investment Opportunities: Find strategic locations for new retail businesses.

    How to Use

    1. Load the dataset into a data analysis tool like Python (pandas), R, or Excel.
    2. Filter or group data based on ratings, store categories, or locations.
    3. Use visualization tools like matplotlib, seaborn, or Power BI for insights.
    4. Integrate with GIS software for geospatial mapping.

    License & Acknowledgments

    • Data sourced from publicly available platforms.
    • This dataset is open for non-commercial research and analysis purposes.
    • Proper attribution is required when using this dataset in research or publications.

    Contact Information

    For questions or collaboration, reach out via Kaggle comments or email.

  13. Transactions Exercise

    • kaggle.com
    zip
    Updated Apr 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Israel (2023). Transactions Exercise [Dataset]. https://www.kaggle.com/datasets/pedroisrael/transactions-exercise
    Explore at:
    zip(3290813 bytes)Available download formats
    Dataset updated
    Apr 1, 2023
    Authors
    Pedro Israel
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Info

    This is a dataset I was given to solve for an interview for a transactions company. It is perfect to practice DAX measures. Dataset: anonymized sample of credit card deposit attempts over a 12-month period. Main problem: It shows a longitudinally decreasing approval rate from 10/1/2020 to 9/26/2021. Note: This means that the approval rate for credit card deposit attempts has been declining over this time period

    TOOL You can do this with any tool you like. I used PowerBI and I consider it to be one of the best to solve this exercise.

    PARAMETER DESCRIPTIONS

    Appr? = Deposit attempts '1' or '0' = approved or declined. CustomerID Co Website = online divisions to which the deposit attempt is directed. Processing Co = credit card processing company that is processing the transaction. (nb: besides processing companies, a few fraud risk filters are also included here). Issuing Bank = bank that has issued the customer's credit card. Amount Attempt Timestamp

    QUESTIONS (Qs 1-5 & 8 worth 10 points. Qs 6-7 worth 20 points. Total = 100 points)

    1) What is the dataset's approval rate by quarter?

    2)How many customers attempted a deposit of $50 in Sept 2021?

    3)How much did the group identified in QUESTION 2 successfully deposit during the month?

    4)Of the Highest Approval Rate for top 10 banks with the most deposit attempts between $150.00 and $999.99 in 2021?

    5)Without performing any analysis, which two parameters would you suspect of causing the successive quarterly decrease in approval rate? Why?

    6)Identify and describe 2 main causal factors of the decline in approval rates seen in Q3 2021 vs Q4 2020?

    7)Choose one of the main factors identified in QUESTION 6. How much of the approval rate decline seen in Q3 2021 vs Q4 2020 is explained by this factor?

    8) If you had more time, which other analyses would you like to perform on this dataset to identify additional causal factors to those identified in QUESTION 6

    POWERBI TIPS:

    • Try to add the least number of columns. There is no problem with this data but with big datasets more data means slower performance. Make DAX measures instead. 2 • Redefine each question: Picture how to display and make it on the PowerBI. Write what you´ll do. Ex: 1) What is the dataset's approval rate by quarter? = line graph, title = “Approval rate by quarter”. X axis= quarters, y axis = approval rate. • Define each column data type on the PowerBI not the query. This error persists over the years, you may define the type on the query but once you load it changes to the default. • In most datasets add the calendar table. Very useful • GREAT TIP: try to apply the less amount of filters to the visual and use calculated measures instead. You will need them in the future. As the questions start to be more complex • I use this rule for all my reports. Measures starting with "Total" are unfiltered. This means, no matter what the filter they should always be the same. You will use them a lot.

  14. Nvidia Database

    • kaggle.com
    zip
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajay Tom (2025). Nvidia Database [Dataset]. https://www.kaggle.com/datasets/ajayt0m/nvidia-database
    Explore at:
    zip(8712 bytes)Available download formats
    Dataset updated
    Jan 30, 2025
    Authors
    Ajay Tom
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a beginner-friendly SQLite database designed to help users practice SQL and relational database concepts. The dataset represents a basic business model inspired by NVIDIA and includes interconnected tables covering essential aspects like products, customers, sales, suppliers, employees, and projects. It's perfect for anyone new to SQL or data analytics who wants to learn and experiment with structured data.

    Tables and Their Contents:

    Products:

    Includes details of 15 products (e.g., GPUs, AI accelerators). Attributes: product_id, product_name, category, release_date, price.

    Customers:

    Lists 20 fictional customers with their industry and contact information. Attributes: customer_id, customer_name, industry, contact_email, contact_phone.

    Sales:

    Contains 100 sales records tied to products and customers. Attributes: sale_id, product_id, customer_id, sale_date, region, quantity_sold, revenue.

    Suppliers:

    Features 50 suppliers and the materials they provide. Attributes: supplier_id, supplier_name, material_supplied, contact_email.

    Supply Chain:

    Tracks materials supplied to produce products, proportional to sales. Attributes: supply_chain_id, supplier_id, product_id, supply_date, quantity_supplied.

    Departments:

    Lists 5 departments within the business. Attributes: department_id, department_name, location.

    Employees:

    Contains data on 30 employees and their roles in different departments. Attributes: employee_id, first_name, last_name, department_id, hire_date, salary.

    Projects:

    Describes 10 projects handled by different departments. Attributes: project_id, project_name, department_id, start_date, end_date, budget.

    Why Use This Dataset?

    • Perfect for Beginners: The dataset is simple and easy to understand.
    • Interconnected Tables: Provides a basic introduction to relational database concepts like joins and foreign keys.
    • SQL Practice: Run basic queries, filter data, and perform simple aggregations or calculations.
    • Learning Tool: Great for small projects and understanding business datasets.

    Potential Use Cases:

    • Practice SQL queries (SELECT, INSERT, UPDATE, DELETE, JOIN).
    • Understand how to design and query relational databases.
    • Analyze basic sales and supply chain data for patterns and trends.
    • Learn how to use databases in analytics tools like Excel, Power BI, or Tableau.

    Data Size:

    Number of Tables: 8 Total Rows: Around 230 across all tables, ensuring quick queries and easy exploration.

  15. Hospital Patient Treatment Dataset

    • kaggle.com
    zip
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Slidescope (2025). Hospital Patient Treatment Dataset [Dataset]. https://www.kaggle.com/datasets/slidescope/hospital-patient-treatment-dataset/data
    Explore at:
    zip(3675 bytes)Available download formats
    Dataset updated
    Jun 26, 2025
    Authors
    Slidescope
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This Hospital Patient Treatment Dataset contains simulated data representing patient treatments in a multi-specialty hospital. It includes 200 records with both categorical and numerical fields, designed for data visualization and analysis using tools like Power BI, Tableau, or Python.

    The dataset features 3–4 categorical fields: Department (e.g., Cardiology, Neurology, Orthopedics), Treatment Type (Surgery, Medication, Therapy, Observation), Doctor Name, and Gender. It also includes numerical fields such as Age, Treatment Cost, Hospital Stay (Days), and Recovery Score (ranging from 0 to 100).

    To get more information about Columns visit: https://colorstech.net/practice-datasets/hospital-patient-treatment-dataset-for-analysis/

    This dataset is ideal for healthcare analysts and data enthusiasts who want to practice analyzing treatment efficiency, patient demographics, cost effectiveness, and healthcare outcomes. Potential analyses include cost comparisons by department, gender-based treatment patterns, doctor performance based on recovery scores, and identifying which treatments lead to faster recovery.

    It can help simulate real-world healthcare reporting scenarios, such as understanding hospital load, cost optimization, or patient well-being tracking. The inclusion of unique Patient ID allows for easy referencing and segmentation.

    This dataset is well-suited for creating KPIs, dashboards, and advanced visualizations to gain insights into hospital operations and patient care outcomes. No real patient data is used—this is a synthetic dataset for educational use only.

  16. Riyadh SuperMarket and Groceries

    • kaggle.com
    zip
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meshal Alsanari (2025). Riyadh SuperMarket and Groceries [Dataset]. https://www.kaggle.com/datasets/meshalalsanari/riyadh-supermarket-and-groceries
    Explore at:
    zip(118229 bytes)Available download formats
    Dataset updated
    Mar 20, 2025
    Authors
    Meshal Alsanari
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Riyadh
    Description

    Riyadh Supermarkets & Grocery Stores Dataset

    Dataset Overview

    This dataset contains information about supermarkets, grocery stores, and convenience stores in Riyadh, Saudi Arabia. It includes key details such as store names, types, ratings, and geographic coordinates. The dataset is useful for market analysis, business intelligence, and location-based services.

    Dataset Contents

    The dataset consists of the following columns:

    Column NameData TypeDescription
    NamestringName of the supermarket or grocery store
    Type_of_UtilitystringCategory of the store (e.g., supermarket, grocery store, convenience store)
    Number_of_RatingsintegerTotal number of reviews received
    RatingfloatAverage rating score (scale: 0-5)
    LongitudefloatGeographical longitude coordinate
    LatitudefloatGeographical latitude coordinate

    Potential Use Cases

    • Retail Market Analysis: Identify high-rated supermarkets and grocery stores.
    • Consumer Preference Insights: Study customer behavior based on store ratings.
    • Business Expansion: Find potential areas for new supermarkets or convenience stores.
    • Geospatial Applications: Develop store locator and delivery services.

    How to Use

    1. Load the dataset into a data analysis tool like Python (pandas), R, or Excel.
    2. Filter or group data based on rating scores, store types, or locations.
    3. Use visualization tools like matplotlib, seaborn, or Power BI for insights.
    4. Apply machine learning for demand forecasting and recommendation systems.

    License & Acknowledgments

    • Data sourced from publicly available platforms.
    • This dataset is open for non-commercial research and analysis purposes.
    • Proper attribution is required when using this dataset in research or publications.

    Contact Information

    For questions or collaboration, reach out via Kaggle comments or email.

  17. Riyadh Metro Stations

    • kaggle.com
    zip
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meshal Alsanari (2025). Riyadh Metro Stations [Dataset]. https://www.kaggle.com/meshalalsanari/riyadh-metro-stations
    Explore at:
    zip(15843 bytes)Available download formats
    Dataset updated
    Mar 20, 2025
    Authors
    Meshal Alsanari
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Riyadh
    Description

    Riyadh Metro Stations Dataset

    Dataset Overview

    This dataset contains information about metro stations in Riyadh, Saudi Arabia. It includes details such as station names, types, ratings, and geographic coordinates. The dataset is valuable for transportation analysis, urban planning, and navigation applications.

    Dataset Contents

    The dataset consists of the following columns:

    Column NameData TypeDescription
    NamestringName of the metro station
    Type_of_UtilitystringType of station (Metro Station)
    Number_of_RatingsfloatTotal number of reviews received (some values may be missing)
    RatingfloatAverage rating score (scale: 0-5, some values may be missing)
    LongitudefloatGeographical longitude coordinate
    LatitudefloatGeographical latitude coordinate

    Potential Use Cases

    • Urban Mobility Analysis: Study metro station distribution and accessibility.
    • Transportation Planning: Analyze station usage based on ratings and reviews.
    • Navigation & Mapping: Enhance public transit applications with station locations.
    • Service Optimization: Identify areas needing better metro services.

    How to Use

    1. Load the dataset into a data analysis tool like Python (pandas), R, or Excel.
    2. Filter or group data based on ratings, locations, or number of reviews.
    3. Use visualization tools like matplotlib, seaborn, or Power BI for insights.
    4. Integrate with GIS software for geospatial mapping.

    License & Acknowledgments

    • Data sourced from publicly available platforms.
    • This dataset is open for non-commercial research and analysis purposes.
    • Proper attribution is required when using this dataset in research or publications.

    Contact Information

    For questions or collaboration, reach out via Kaggle comments or email.

  18. Football Manager 2023: 90k+ Player Stats

    • kaggle.com
    zip
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siddhraj Thakor (2025). Football Manager 2023: 90k+ Player Stats [Dataset]. https://www.kaggle.com/datasets/siddhrajthakor/football-manager-2023-dataset
    Explore at:
    zip(9373378 bytes)Available download formats
    Dataset updated
    Oct 1, 2025
    Authors
    Siddhraj Thakor
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Football Manager Players Dataset

    Overview

    Dive into the ultimate treasure trove for football enthusiasts, data analysts, and gaming aficionados! The Football Manager Players Dataset is a comprehensive collection of player data extracted from a popular football management simulation game, offering an unparalleled look into the virtual world of football talent. This dataset includes detailed attributes for thousands of players across multiple leagues worldwide, making it a goldmine for analyzing player profiles, scouting virtual stars, and building predictive models for football strategies.

    Whether you're a data scientist exploring sports analytics, a football fan curious about your favorite virtual players, or a game developer seeking inspiration, this dataset is your ticket to unlocking endless possibilities!

    Dataset Description

    This dataset is a meticulously curated compilation of player statistics from five CSV files, merged into a single, unified dataset (merged_players.csv). It captures a diverse range of attributes for players from various clubs, nations, and leagues, including top-tier competitions like the English Premier Division, Argentina's Premier Division, and lower divisions across the globe.

    Key Features

    • Rich Player Attributes: Over 70 columns covering essential metrics such as:
      • Basic Info: UID, Name, Date of Birth (DOB), Nationality, Height, Weight, Age
      • Club & Position: Club, Position (e.g., AM, DM, GK), Based (league/division)
      • Performance Stats: Caps, Appearances (AT Apps), Goals (AT Gls), League Appearances, League Goals
      • Technical Skills: Acceleration, Passing, Dribbling, Finishing, Tackling, and more
      • Mental Attributes: Work Rate, Vision, Leadership, Determination
      • Physical Attributes: Pace, Strength, Stamina, Agility
      • Market Value: Transfer Value (e.g., $0 to millions)
      • Miscellaneous: Preferred Foot, Media Handling, Injury Proneness
    • Global Coverage: Players from diverse regions, including Europe (England, Spain, Italy), South America (Argentina, Brazil), Asia (South Korea, China), Africa (Ivory Coast, Burkina Faso), and North America (USA, Mexico).
    • Varied Player Types: From young prospects (15–18 years old) to veteran stars (up to 45 years old), including amateurs, youth players, and professionals.
    • Realistic Insights: Includes attributes like Media Description (e.g., "Young winger," "Veteran striker") and injury status, mirroring real-world football dynamics.

    Dataset Size

    • Rows: Thousands of player records (exact count depends on deduplication).
    • Columns: 70+ attributes per player.
    • File: merged_players.csv (UTF-8 encoded for compatibility with special characters).

    Potential Use Cases

    • Sports Analytics:
      • Analyze player attributes to identify key traits for success by position (e.g., what makes a top goalkeeper?).
      • Predict transfer values based on skills, age, and performance stats.
      • Cluster players by playing style or potential using machine learning.
    • Scouting & Strategy:
      • Build a dream team by filtering players based on specific attributes (e.g., high Pace and Dribbling for wingers).
      • Compare young talents vs. experienced veterans for team-building strategies.
    • Gaming & Modding:
      • Create custom Football Manager databases or mods.
      • Analyze game balance by studying attribute distributions.
    • Visualization:
      • Develop interactive dashboards to explore player stats by league, nationality, or position.
      • Map player origins to visualize global football talent distribution.
    • Education & Research:
      • Use as a teaching tool for data science, exploring data cleaning, merging, and analysis.
      • Study correlations between mental/physical attributes and in-game performance.

    Why This Dataset Stands Out

    • Comprehensive: Covers every aspect of a player's profile, from technical skills to personality traits.
    • Diverse: Includes players from top-tier to lower divisions, offering a broad spectrum of talent.
    • Engaging: Perfect for football fans and data enthusiasts alike, blending gaming with real-world analytics.
    • Ready-to-Use: Merged and cleaned for immediate analysis, with consistent column structure across all records.

    Getting Started

    1. Download: Grab merged_players.csv and load it into your favorite tool (Python/pandas, R, Excel, etc.).
    2. Explore: Check out columns like Transfer Value, Position, and Media Description to start your analysis.
    3. Analyze: Use Python (e.g., pandas, scikit-learn) or visualization tools (e.g., Tableau, Power BI) to uncover insights.
    4. Share: Build models, visualizations, or scouting reports and share your findings with the Kaggle community!

    Example Questions to Explore

    • Which young players (<18 years) have the highest poten...
  19. Netflix Movies & TV Shows dataset

    • kaggle.com
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zubaira Maimona (2025). Netflix Movies & TV Shows dataset [Dataset]. https://www.kaggle.com/datasets/zubairamuti/netflix-movies-and-tv-shows-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2025
    Dataset provided by
    Kaggle
    Authors
    Zubaira Maimona
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    Regarding this dataset, Netflix is among the most popular websites for streaming movies and videos. They have more than 200 million members globally as of the middle of 2021, and their platform offers over 8,000 movies and TV shows. This tabular dataset contains listings of all the movies and TV shows available on Netflix, together with details about the actors, directors, ratings, length, year of release, and other details.

    Interesting Ideas to do Tasks for the people from different backgrounds

    For Analysts of Data

    Content Trends Over Time - Examine the annual changes in Netflix's movie and TV show counts. 2. Genre Popularity - Discover the most popular genres and how their popularity changes by location or year. 3. Country Insights - Find out which nations produce the most shows and what kinds of content they contribute. 4. Ratings Distribution - Show how the mature ratings (G, PG, R, TV-MA) are distributed throughout Netflix material. 5. Best Directors & Actors - Find the actors or directors who show up on Netflix the most.

    For Data Scientists

    Create a content-based recommender by utilizing genres and title descriptions in the Recommendation System Prototype. 2. Text Analysis on Descriptions - Apply natural language processing (NLP) to identify trends in the way Netflix characterizes its material using terms like "crime," "adventure," and "love." 3. Classification Models - Use metadata to determine if a title is a movie or a TV show. Using genres, lengths, and descriptions, group films and television series into clusters. 5. Trend Forecasting - Forecast future growth in the Netflix library using time-series analysis.

    For Students (Study Assignments)

    1. Data Cleaning & Preprocessing - Standardize formats and deal with missing variables (such as directors/countries).
    2. Exploratory Data Analysis (EDA): Make notebooks or dashboards with a ton of graphics that illustrate Netflix trends.
    3. Data Visualization Practice - Create imaginative graphics such as word clouds or heatmaps using Matplotlib, Seaborn, or Plotly. Storytelling with Data: Compose a data tale on how Netflix changed from renting out DVDs to becoming a major worldwide streaming service.
    4. Beginner Machine Learning – Start small: use genre or description to forecast maturity rating.

    Approach to the Netflix Dataset

    1. Understand the Data (Initial Exploration)

      • Load the dataset and check its size, columns, and data types.
      • Get a sense of the key fields: title, type, country, release_year, rating, etc.
      • Look for unique values (e.g., how many genres, countries, ratings).
    2. Data Cleaning & Preprocessing

      • Handle missing values (some entries don’t have directors or countries).
      • Standardize inconsistent formats (e.g., dates in date_added).
      • Split multi-valued columns (like genres or cast) if needed.
      • Convert durations into numeric values (minutes or seasons).
    3. Exploratory Data Analysis (EDA)

      • Compare Movies vs. TV Shows count.
      • Analyze content growth trend by release year or date added.
      • Study genre popularity across different countries.
      • Explore rating distribution (family-friendly vs. mature content).
      • Identify most frequent directors, actors, and countries.
    4. Visualization & Storytelling

      • Create bar charts, pie charts, heatmaps, and timelines.
      • Use word clouds for descriptions and genres.
      • Highlight interesting trends (e.g., rise of international TV shows).
    5. Advanced Analysis / Data Science Tasks

      • Build a recommendation system (based on genres & descriptions).
      • Perform sentiment/keyword analysis on descriptions.
      • Apply clustering to group similar shows/movies.
      • Predict whether a title is a movie or TV show from metadata.
    6. Insights & Reporting

      • Summarize key findings (e.g., “TV shows are growing faster than movies,” “US and India dominate Netflix content”).
      • Create dashboards (Tableau, Power BI, or Python libraries like Plotly).
      • Share a story rather than just numbers—make it human and relatable.
  20. Riyadh Restaurants

    • kaggle.com
    zip
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meshal Alsanari (2025). Riyadh Restaurants [Dataset]. https://www.kaggle.com/meshalalsanari/riyadh-restaurants
    Explore at:
    zip(1149091 bytes)Available download formats
    Dataset updated
    Mar 20, 2025
    Authors
    Meshal Alsanari
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Riyadh
    Description

    Riyadh Restaurants & Cafes Dataset

    Dataset Overview

    This dataset contains information about restaurants and cafes in Riyadh, Saudi Arabia. It includes details such as business names, types, ratings, and geographic coordinates. The dataset is useful for food industry analysis, customer preference insights, and location-based recommendations.

    Dataset Contents

    The dataset consists of the following columns:

    Column NameData TypeDescription
    NamestringName of the restaurant or cafe
    Type_of_UtilitystringCategory of the place (e.g., restaurant, cafe)
    Number_of_RatingsfloatTotal number of reviews received
    RatingfloatAverage rating score (scale: 0-5)
    LongitudefloatGeographical longitude coordinate
    LatitudefloatGeographical latitude coordinate

    Potential Use Cases

    • Food Industry Analysis: Identify top-rated restaurants and cafes.
    • Customer Sentiment Analysis: Study customer preferences based on ratings.
    • Business Expansion: Find potential areas for new restaurants or cafes.
    • Geospatial Applications: Develop food recommendation and delivery services.

    How to Use

    1. Load the dataset into a data analysis tool like Python (pandas), R, or Excel.
    2. Filter or group data based on rating scores, restaurant types, or locations.
    3. Use visualization tools like matplotlib, seaborn, or Power BI for insights.
    4. Apply machine learning for trend analysis and recommendation systems.

    License & Acknowledgments

    • Data sourced from publicly available platforms.
    • This dataset is open for non-commercial research and analysis purposes.
    • Proper attribution is required when using this dataset in research or publications.

    Contact Information

    For questions or collaboration, reach out via Kaggle comments or email.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ramy Elbouhy (2025). HR Analysis - Power BI Dashboard [Dataset]. https://www.kaggle.com/datasets/ramyelbouhy/hr-analysis-power-bi-dashboard/discussion
Organization logo

HR Analysis - Power BI Dashboard

Power BI Project: Employee HR Analysis

Explore at:
zip(21491703 bytes)Available download formats
Dataset updated
Apr 11, 2025
Authors
Ramy Elbouhy
Description

HR Analysis BI Dashboard performance appraisal and employee wellbeing The HR department plays several roles in the enterprise. HR acts as a mediator or a bridge between the employees and the management or enterprise. It’s no surprise that the HR department is already burdened with work. Providing them access to the latest technology and the means to derive insights in real time will help reduce the workload and create a healthy organizational environment.

Problem Statement: Market fluctuations and rapidly changing technology have affected the global market. Many published reports showed that around half of the employees wanted to change jobs. While some market researchers said that flexible working and job security were their primary factors, few admitted that a higher salary was their aim.

Different regions saw an increase and a decrease in salaries over the years. While the increase was to retain top-level professional employees, the pay cuts were due to market fluctuations and were resorted after the market conditions improved. HR people across the globe are hiring new employees, trying to retain and understand the needs of employees who got separated (those who left the company).

So, how does the HR department make these decisions in volatile market conditions? They rely on HR analytics to understand the existing situation and develop a new modern approach. For this requirement, you have been asked in your company to build a dashboard in Power BI considering the following challenges of HR people and provide an effective way to find the answers to their day-to-day questions.

Tasks: Use the HR data set for this project and analyze that to understand the data and terms.

Load data into the Power BI Query Editor and perform the required actions.

Establish the required relationships.

Create the required DAX columns and measures for calculation

Search
Clear search
Close search
Google apps
Main menu