Facebook
TwitterThe HR dataset contains employee-related information, such as personal details, job roles, salaries, and performance metrics. It's used by organizations to manage human resources, make informed staffing decisions, and analyze workforce trends. The dataset aids in optimizing employee satisfaction, productivity, and organizational growth.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15826402%2F6f621dd7a72a2d8c6d0df659c6604189%2FHR%20Dashboard.jpg?generation=1692882310646646&alt=media" alt="">
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .
We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments
--View dataset
SELECT *
FROM netflix;
--The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
SELECT show_id, COUNT(*)
FROM netflix
GROUP BY show_id
ORDER BY show_id DESC;
--No duplicates
--Check null values across columns
SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
FROM netflix;
We can see that there are NULLS.
director_nulls = 2634
movie_cast_nulls = 825
country_nulls = 831
date_added_nulls = 10
rating_nulls = 4
duration_nulls = 3
The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column
-- Below, we find out if some directors are likely to work with particular cast
WITH cte AS
(
SELECT title, CONCAT(director, '---', movie_cast) AS director_cast
FROM netflix
)
SELECT director_cast, COUNT(*) AS count
FROM cte
GROUP BY director_cast
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;
With this, we can now populate NULL rows in directors
using their record with movie_cast
UPDATE netflix
SET director = 'Alastair Fothergill'
WHERE movie_cast = 'David Attenborough'
AND director IS NULL ;
--Repeat this step to populate the rest of the director nulls
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET director = 'Not Given'
WHERE director IS NULL;
--When I was doing this, I found a less complex and faster way to populate a column which I will use next
Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column
--Populate the country using the director column
SELECT COALESCE(nt.country,nt2.country)
FROM netflix AS nt
JOIN netflix AS nt2
ON nt.director = nt2.director
AND nt.show_id <> nt2.show_id
WHERE nt.country IS NULL;
UPDATE netflix
SET country = nt2.country
FROM netflix AS nt2
WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id
AND netflix.country IS NULL;
--To confirm if there are still directors linked to country that refuse to update
SELECT director, country, date_added
FROM netflix
WHERE country IS NULL;
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET country = 'Not Given'
WHERE country IS NULL;
The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization
--Show date_added nulls
SELECT show_id, date_added
FROM netflix_clean
WHERE date_added IS NULL;
--DELETE nulls
DELETE F...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
📊 Sales & Customer Analytics – Tableau Dashboard (PDF & Interactive) 🔍 Overview This dataset includes a Tableau project analysing sales trends & customer insights with an interactive dashboard switch.
The dashboards provide actionable insights into: ✅ Sales performance & revenue trends 📈 ✅ Top-performing products & regions 🌍 ✅ Customer segmentation & behavior analysis 🛍️ ✅ Retention strategies & marketing impact 🎯
📂 Files Included 📄 Sales & Customer Analytics Dashboard (PDF Report) – A full summary of insights. 🎨 Tableau Workbook (.twbx) – The interactive dashboards (requires Tableau). 🖼️ Screenshots – Previews of the dashboards.
🔗 Explore the Interactive Dashboards on Tableau Public :
Sales Dashboard:[https://public.tableau.com/app/profile/egbe.grace/viz/SalesCustomerDashboardsDynamic_17385906491570/CustomerDashboard] Customer Dashboard: [https://public.tableau.com/app/profile/egbe.grace/viz/SalesCustomerDashboardsDynamic_17385906491570/CustomerDashboard]
📌 Key Insights from the Dashboards ✅ Revenue trends show peak sales periods & seasonal demand shifts. ✅ Top-selling products & regions help businesses optimize their strategies. ✅ Customer segmentation identifies high-value buyers for targeted marketing. ✅ Retention analysis provides insights into repeat customer behaviour.
💡 How This Can Help: This dataset and Tableau project can help businesses & analysts uncover key patterns in sales and customer behavior, allowing them to make data-driven decisions to improve growth and customer retention.
💬 Would love to hear your feedback! Let’s discuss the impact of sales analytics in business strategy.
📢 #DataAnalytics #Tableau #SalesAnalysis #CustomerInsights #BusinessIntelligence #DataVisualization
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
About Datasets:
Domain : Finance Project: Bank loan of customers Datasets: Finance_1.xlsx & Finance_2.xlsx Dataset Type: Excel Data Dataset Size: Each Excel file has 39k+ records
KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revol_bal 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data
Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results
This data contains bar chart, text, stacked bar chart, dashboard, horizontal bars, donut chart, area chart, treemap, slicers, table, image.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description:
The myusabank.csv dataset contains daily financial data for a fictional bank (MyUSA Bank) over a two-year period. It includes various key financial metrics such as interest income, interest expense, average earning assets, net income, total assets, shareholder equity, operating expenses, operating income, market share, and stock price. The data is structured to simulate realistic scenarios in the banking sector, including outliers, duplicates, and missing values for educational purposes.
Potential Student Tasks:
Data Cleaning and Preprocessing:
Exploratory Data Analysis (EDA):
Calculating Key Performance Indicators (KPIs):
Building Tableau Dashboards:
Forecasting and Predictive Modeling:
Business Insights and Reporting:
Educational Goals:
The dataset aims to provide hands-on experience in data preprocessing, analysis, and visualization within the context of banking and finance. It encourages students to apply data science techniques to real-world financial data, enhancing their skills in data-driven decision-making and strategic analysis.
Facebook
TwitterIn this project, I have made a dashboard about the world's libraries and their expenses
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
https://github.com/ssrAiLab/IMDB-2020-Tableau-Dashboard/blob/main/Dashboard%20Screenshot.png?raw=true" alt="Dashboard Preview">
The IMDB Top 1000 Movies of 2020 dataset provides a rich canvas for exploring the world of cinema — and this Tableau project transforms that data into stunning visuals and insights.
I’ve designed a dynamic and visually appealing dashboard using Tableau that highlights movie trends, ratings, genres, and key metrics from 2020’s cinematic landscape.
✅ Top 20 Movies by IMDB Rating
✅ Distribution of Movies by Genre
✅ Top Directors with Most Hits
✅ Language & Country-wise Movie Count
✅ Gross Earnings vs Ratings
✅ Runtime Distribution Analysis
✅ Certificate-wise Movie Breakdown
✅ Year-wise Trend in Popularity
| File | Description |
|---|---|
IMDB_2020_Dashboard.twb | Tableau workbook file |
imdb_top_1000.csv | Cleaned dataset used |
Dashboard Screenshot.png | Snapshot of the final dashboard |
archive.zip | Contains all the files in one place |
.twb file from this dataset Sahil Raj
Data Analyst | Tableau Storyteller | Movie Enthusiast 🎥
🔗 LinkedIn | GitHub | Kaggle
“Cinema is more than entertainment — it’s culture, storytelling, and data waiting to be visualized.”
📌 This project is for educational and portfolio purposes only. IMDB data is publicly available and curated for non-commercial use.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
The Superstore Sales Data dataset, available in an Excel format as "Superstore.xlsx," is a comprehensive collection of sales and customer-related information from a retail superstore. This dataset comprises* three distinct tables*, each providing specific insights into the store's operations and customer interactions.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This interactive Tableau dashboard provides a detailed analysis of car sales trends from 2022 to 2023. It explores key metrics such as total sales, average car prices, and sales distribution by car type, color, and region.
Key Features: 📊 Sales Overview: Total sales, quantity, and price analysis. 📈 Monthly Trends: A time-series visualization of sales growth. 🎨 Car Color Preferences: Pie chart showing distribution by color. 🌍 Regional Sales Breakdown: Geospatial analysis of sales across the U.S. 🏆 Model-wise Performance: Sales comparison across different car brands. ⚙️ Engine & Transmission Impact: Filtering options to analyze impact by car type. This dashboard is ideal for automotive industry analysts, data enthusiasts, and business decision-makers interested in sales performance insights.
📌 Tools Used: Tableau, Data Cleaning & Preparation.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The reference for the dataset and the dashboard was Youtube Channel codebasics. I have used a fictitious company called Atlix where the Sales Director want the sales data to be in a proper format which can help in decision making.
We have a total of 5 tables namely customers, products, markets, date & transactions. The data is exported from Mysql to Tableau.
In tableau , inner joins were used.
In the transactions table, we notice that sum sales amount figures are either negative or zero while the sales qty is either 1 or more. This cannot be right. Therefore, we filter the sales amount table in Tableau by having the least sales amount as minimum 1.
When currency column from transactions table was grouped in MySql, we could see ‘USD’ and ‘INR’ showing up. We cannot have a sales data showing two currencies. This was rectified by converting the USD sales amount into INR by taking the latest exchange rate at Rs.81.
We make the above change in tableau by creating a new calculated field called ‘Normalised Sales Amount’. If [Sales Amount] == ‘USD’ then [Sales Amount] * 81 else [Sales Amount] End.
Conclusion: The dashboard prepared is an interactive dashboard with filters. For eg. By Clicking on Mumbai under “Sales by Markets” we will see the results change in the other charts as well as they Will now show the results pertaining only to Mumbai. This can be done by year , month, customers , products etc. Parameter with filter has also been created for top customers and top products. This produces a slider which can be used to view the top 10 customers and products and slide it accordingly.
Following information can be passed on to the sales team or director.
Total Sales: from Jun’17 to Feb’20 has been INR 12.83 million. There is a drop of 57% in the sales revenue from 2018 to 2019. The year 2020 has not been considered as it only account for 2 months data. Markets: Mumbai which is the top most performing market and accounts for 51% of the total sales market has seen a drop in sales of almost 64% from 2018 to 2019. Top Customers: Path was on 2nd position in terms of sales in the year 2018. It accounted for 19% of the total sales after Electricalslytical which accounted for 21% of the total sales. But in year 2019, both Electricalslytical and Path were the 2nd and 4th highest customers by sales. By targeting the specific markets and customers through new ideas such as promotions, discounts etc we can look to reverse the trend of decreasing sales.
Facebook
TwitterThe different leaders at Airbnb want to understand some important insights based on various attributes in the dataset so as to increase the revenue such as -
Which type of hosts to acquire more and where? The categorization of customers based on their preferences. What are the neighborhoods they need to target? What are the pricing ranges preferred by customers? The various kinds of properties that exist w.r.t. customer preferences. Adjustments in the existing properties to make it m more customer-oriented. What are the most famous localities and properties in New York currently? How to get unpopular properties more traction? and so on...
To prepare for the next best steps Airbnb needs to take as a business, you have been asked to analyze a dataset of various Airbnb listings in New York. Based on this analysis, Two presentations to the following groups need to be given. 1. Data Analysis Managers and Lead Data Analyst 2. Head of Acquisitions and Operations, NYC, and Head of User Experience, NYC.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Brand Affiliate Dataset is a comprehensive collection which emphasizes on a 3 months feedback analysis from January to March 2024 for five(5) Brands(Nesine, Bilyoner, Idda, Betboo, and CommissionLounge)
These brands provided various insights into Affiliate Impressions, Clicks, Signups, Earnings; offering valuable analysis into a Paid Advertising/Marketing Campaign
Several Measures were generated to ascertain the performance KPIs of each of the products: - Return on Investment (ROI) - New Customer Acquisition Rate - Earnings per Click (EPC) - Net Revenue per Click (RPC) - Earnings per Click (EPC) - Conversion Rate (CR) - Click-Through Rate (CTR) - Net Revenue
Metrics/Columns:
Brand
Brand ID
Month and Year
Affiliate
Impressions
Clicks
Signups NDC
fdt
Net Revenue
Earnings
Facebook
TwitterUpdated 30 January 2023
There has been some confusion around licensing for this data set. Dr. Carla Patalano and Dr. Rich Huebner are the original authors of this dataset.
We provide a license to anyone who wishes to use this dataset for learning or teaching. For the purposes of sharing, please follow this license:
CC-BY-NC-ND This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
https://rpubs.com/rhuebner/hrd_cb_v14
PLEASE NOTE -- I recently updated the codebook - please use the above link. A few minor discrepancies were identified between the codebook and the dataset. Please feel free to contact me through LinkedIn (www.linkedin.com/in/RichHuebner) to report discrepancies and make requests.
HR data can be hard to come by, and HR professionals generally lag behind with respect to analytics and data visualization competency. Thus, Dr. Carla Patalano and I set out to create our own HR-related dataset, which is used in one of our graduate MSHRM courses called HR Metrics and Analytics, at New England College of Business. We created this data set ourselves. We use the data set to teach HR students how to use and analyze the data in Tableau Desktop - a data visualization tool that's easy to learn.
This version provides a variety of features that are useful for both data visualization AND creating machine learning / predictive analytics models. We are working on expanding the data set even further by generating even more records and a few additional features. We will be keeping this as one file/one data set for now. There is a possibility of creating a second file perhaps down the road where you can join the files together to practice SQL/joins, etc.
Note that this dataset isn't perfect. By design, there are some issues that are present. It is primarily designed as a teaching data set - to teach human resources professionals how to work with data and analytics.
We have reduced the complexity of the dataset down to a single data file (v14). The CSV revolves around a fictitious company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, and performance score.
Recent additions to the data include: - Absences - Most Recent Performance Review Date - Employee Engagement Score
Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over 200 Human Resource Management students at the college. Students in the course learn data visualization techniques with Tableau Desktop and use this data set to complete a series of assignments.
We've included some open-ended questions that you can explore and try to address through creating Tableau visualizations, or R or Python analyses. Good luck and enjoy the learning!
There are so many other interesting questions that could be addressed through this interesting data set. Dr. Patalano and I look forward to seeing what we can come up with.
If you have any questions or comments about the dataset, please do not hesitate to reach out to me on LinkedIn: http://www.linkedin.com/in/RichHuebner
You can also reach me via email at: Richard.Huebner@go.cambridgecollege.edu
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
In the case study titled "Blinkit: Grocery Product Analysis," a dataset called 'Grocery Sales' contains 12 columns with information on sales of grocery items across different outlets. Using Tableau, you as a data analyst can uncover customer behavior insights, track sales trends, and gather feedback. These insights will drive operational improvements, enhance customer satisfaction, and optimize product offerings and store layout. Tableau enables data-driven decision-making for positive outcomes at Blinkit.
The table Grocery Sales is a .CSV file and has the following columns, details of which are as follows:
• Item_Identifier: A unique ID for each product in the dataset. • Item_Weight: The weight of the product. • Item_Fat_Content: Indicates whether the product is low fat or not. • Item_Visibility: The percentage of the total display area in the store that is allocated to the specific product. • Item_Type: The category or type of product. • Item_MRP: The maximum retail price (list price) of the product. • Outlet_Identifier: A unique ID for each store in the dataset. • Outlet_Establishment_Year: The year in which the store was established. • Outlet_Size: The size of the store in terms of ground area covered. • Outlet_Location_Type: The type of city or region in which the store is located. • Outlet_Type: Indicates whether the store is a grocery store or a supermarket. • Item_Outlet_Sales: The sales of the product in the particular store. This is the outcome variable that we want to predict.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Context The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.
Content The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.
Indicators/Factors Explain: 1. Rank, is the country ranking 2. Score, is the happiness score of the country 3. GDP, is the gross domestic product of the country 4. Family, is the indicator that shows family support to each citizen in the country 5. Life Expectancy, shows the healthiness level of the country 6. Freedom, is an indicator that shows the citizen freedom to choose their life path, job or etc 7. Trust, shows the level of trust from the citizen in the government (influenced by the corruption level and performance of the government) 8. Generosity, an indicator that shows the generosity level of the citizen of the country
Source: The World Happiness Report is a publication of the Sustainable Development Solutions Network, powered by the Gallup World Poll data.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
Tourism and travel holds more than 10% of the GDP worldwide, and is trending towards capturing higher stakes of the global pie. At the same time, it's an industry that generates huge volume of data and getting advantage of it could help businesses to stand out from the crowd.
Content
The dataset provides reservations data for two consecutive seasons (2021 - 2023) of a luxury hotel.
Source
ChatGPT 3.5 (OpenAI) is the main creator of the dataset. Minor adjustments were performed by myself to ensure that the dataset contains the desired fields and values.
Inspiration
• How effectively is the hotel performing across key metrics? • How are bookings distributed across different channels (e.g., Booking Platform, Phone, Walk-in, and Website)? • What is the current occupancy rate and how does it compare to the same period last year? • What are the demographics of the current guests (e.g., nationality)? • What is the average daily rate (ADR) per room?
These are examples of interesting questions that could be answered by analyzing this dataset.
If you are interested, please have a look at the Tableau dashboard that I have created to help answer the above questions. Tableau dashboard: https://public.tableau.com/app/profile/dimitris.angelides/viz/HotelExecutiveDashboards/HotelExecutiveSummaryReport?publish=yes
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a cleaned version of the Chicago Crime Dataset, which can be found here. All rights for the dataset go to the original owners. The purpose of this dataset is to display my skills in visualizations and creating dashboards. To be specific, I will attempt to create a dashboard that will allow users to see metrics for a specific crime within a given year using filters and metrics. Due to this, there will not be much of a focus on the analysis of the data, but there will be portions discussing the validity of the dataset, the steps I took to clean the data, and how I organized it. The cleaned datasets can be found below, the Query (which utilized BigQuery) can be found here and the Tableau dashboard can be found here.
The dataset comes directly from the City of Chicago's website under the page "City Data Catalog." The data is gathered directly from the Chicago Police's CLEAR (Citizen Law Enforcement Analysis and Reporting) and is updated daily to present the information accurately. This means that a crime on a specific date may be changed to better display the case. The dataset represents crimes starting all the way from 2001 to seven days prior to today's date.
Using the ROCCC method, we can see that: * The data has high reliability: The data covers the entirety of Chicago from a little over 2 decades. It covers all the wards within Chicago and even gives the street names. While we may not have an idea for how big the sample size is, I do believe that the dataset has high reliability since it geographically covers the entirety of Chicago. * The data has high originality: The dataset was gained directly from the Chicago Police Dept. using their database, so we can say this dataset is original. * The data is somewhat comprehensive: While we do have important information such as the types of crimes committed and their geographic location, I do not think this gives us proper insights as to why these crimes take place. We can pinpoint the location of the crime, but we are limited by the information we have. How hot was the day of the crime? Did the crime take place in a neighborhood with low-income? I believe that these key factors prevent us from getting proper insights as to why these crimes take place, so I would say that this dataset is subpar with how comprehensive it is. * The data is current: The dataset is updated frequently to display crimes that took place seven days prior to today's date and may even update past crimes as more information comes to light. Due to the frequent updates, I do believe the data is current. * The data is cited: As mentioned prior, the data is collected directly from the polices CLEAR system, so we can say that the data is cited.
The purpose of this step is to clean the dataset such that there are no outliers in the dashboard. To do this, we are going to do the following: * Check for any null values and determine whether we should remove them. * Update any values where there may be typos. * Check for outliers and determine if we should remove them.
The following steps will be explained in the code segments below. (I used BigQuery for this so the coding will follow BigQuery's syntax) ```
SELECT
*
FROM
portfolioproject-350601.ChicagoCrime.Crime
LIMIT 1000;
SELECT
*
FROM
portfolioproject-350601.ChicagoCrime.Crime
WHERE
unique_key IS NULL OR
case_number IS NULL OR
date IS NULL OR
primary_type IS NULL OR
location_description IS NULL OR
arrest IS NULL OR
longitude IS NULL OR
latitude IS NULL;
DELETE FROM
portfolioproject-350601.ChicagoCrime.Crime
WHERE
unique_key IS NULL OR
case_number IS NULL OR
date IS NULL OR
primary_type IS NULL OR
location_description IS NULL OR
arrest IS NULL OR
longitude IS NULL OR
latitude IS NULL;
SELECT unique_key, COUNT(unique_key) FROM `portfolioproject-350601.ChicagoCrime....
Facebook
TwitterINTRODUCTION This is my first data analysis project . This project is aim to find the average life expectancy in each country .The dataset used in this is life expectancy which freely available on Kaggle. I used R and Tableau in this project.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data set is perfect for practicing your analytical skills for Power BI, Tableau, Excel, or transform it into a CSV to practice SQL.
This use case mimics transactions for a fictional eCommerce website named EverMart Online. The 3 tables in this data set are all logically connected together with IDs.
My Power BI Use Case Explanation - Using Microsoft Power BI, I made dynamic data visualizations for revenue reporting and customer behavior reporting.
Revenue Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total Sales, Product Sales, or Categorical Sales. - Line Graph Visual that shows Total Revenue by Month of the entire year. This graph also changes to calculate Total Revenue by Month for the Total Sales by Product and Total Sales by Category if selected. - Bar Graph Visual showcasing Total Sales by Product. - Donut Chart Visual showcasing Total Sales by Category of Product.
Customer Behavior Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total or by continent selected on the map. - Interactive Map Visual showing key statistics for the continent selected. - The key statistics are presented on the tool tip when you select a continent, and the following statistics show for that continent: - Continent Name - Customer Total - Percentage of Products Sold - Percentage of Total Customers - Percentage of Total Transactions - Percentage of Total Revenue
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A dataset I generated to showcase a sample set of user data for a fictional streaming service. This data is great for practicing SQL, Excel, Tableau, or Power BI.
1000 rows and 25 columns of connected data.
See below for column descriptions.
Enjoy :)
Facebook
TwitterThe HR dataset contains employee-related information, such as personal details, job roles, salaries, and performance metrics. It's used by organizations to manage human resources, make informed staffing decisions, and analyze workforce trends. The dataset aids in optimizing employee satisfaction, productivity, and organizational growth.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15826402%2F6f621dd7a72a2d8c6d0df659c6604189%2FHR%20Dashboard.jpg?generation=1692882310646646&alt=media" alt="">