Facebook
TwitterThis Power BI dashboard shows the COVID-19 vaccination rate by key demographics including age groups, race and ethnicity, and sex for Tempe zip codes.Data Source: Maricopa County GIS Open Data weekly count of COVID-19 vaccinations. The data were reformatted from the source data to accommodate dashboard configuration. The Maricopa County Department of Public Health (MCDPH) releases the COVID-19 vaccination data for each zip code and city in Maricopa County at ~12:00 PM weekly on Wednesdays via the Maricopa County GIS Open Data website (https://data-maricopa.opendata.arcgis.com/). More information about the data is available on the Maricopa County COVID-19 Vaccine Data page (https://www.maricopa.gov/5671/Public-Vaccine-Data#dashboard). The dashboard’s values are refreshed at 3:00 PM weekly on Wednesdays. The most recent date included on the dashboard is available by hovering over the last point on the right-hand side of each chart. Please note that the times when the Maricopa County Department of Public Health (MCDPH) releases weekly data for COVID-19 vaccines may vary. If data are not released by the time of the scheduled dashboard refresh, the values may appear on the dashboard with the next data release, which may be one or more days after the last scheduled release.Dates: Updated data shows publishing dates which represents values from the previous calendar week (Sunday through Saturday). For more details on data reporting, please see the Maricopa County COVID-19 data reporting notes at https://www.maricopa.gov/5460/Coronavirus-Disease-2019.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains anonymized customer complaint records used to build a Complaint Tracking and Analytics Dashboard in Power BI. The data can be used for learning SQL data cleaning and Power BI visualization. It simulates real-world customer complaints from a banking context and enables analysis of complaint trends, categories, and resolutions. Features:- * SQL- Power BI linking * Refresh Of Data * Multiple Charts * Story-telling
Facebook
TwitterPower BI Dataflow: rs_lginform_metricsLG Inform is the local area benchmarking tool from the Local Government Association LG Inform Plus makes available a large number of metrics about a wide range of areas from different data sources in one place accessible through an API.This dataflow contains the metric values for metric types within the WMCA Types of interest view of LG Inform Plus Metric Types, covering areas of interest at a regional comparison level (regions and local authorities in England) and at MSOA, LSOA and Ward level within the West Midlands metropolitan area.It contains the associated dimensional tables for metric types, datasets, collections and sources that have been queried at source from LG Inform Plus API web services at https://home.esd.org.uk/.The Dataflow is manually refreshed upon new data metrics available. Last refresh 04/10/2023
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .
We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments
--View dataset
SELECT *
FROM netflix;
--The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
SELECT show_id, COUNT(*)
FROM netflix
GROUP BY show_id
ORDER BY show_id DESC;
--No duplicates
--Check null values across columns
SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
FROM netflix;
We can see that there are NULLS.
director_nulls = 2634
movie_cast_nulls = 825
country_nulls = 831
date_added_nulls = 10
rating_nulls = 4
duration_nulls = 3
The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column
-- Below, we find out if some directors are likely to work with particular cast
WITH cte AS
(
SELECT title, CONCAT(director, '---', movie_cast) AS director_cast
FROM netflix
)
SELECT director_cast, COUNT(*) AS count
FROM cte
GROUP BY director_cast
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;
With this, we can now populate NULL rows in directors
using their record with movie_cast
UPDATE netflix
SET director = 'Alastair Fothergill'
WHERE movie_cast = 'David Attenborough'
AND director IS NULL ;
--Repeat this step to populate the rest of the director nulls
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET director = 'Not Given'
WHERE director IS NULL;
--When I was doing this, I found a less complex and faster way to populate a column which I will use next
Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column
--Populate the country using the director column
SELECT COALESCE(nt.country,nt2.country)
FROM netflix AS nt
JOIN netflix AS nt2
ON nt.director = nt2.director
AND nt.show_id <> nt2.show_id
WHERE nt.country IS NULL;
UPDATE netflix
SET country = nt2.country
FROM netflix AS nt2
WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id
AND netflix.country IS NULL;
--To confirm if there are still directors linked to country that refuse to update
SELECT director, country, date_added
FROM netflix
WHERE country IS NULL;
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET country = 'Not Given'
WHERE country IS NULL;
The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization
--Show date_added nulls
SELECT show_id, date_added
FROM netflix_clean
WHERE date_added IS NULL;
--DELETE nulls
DELETE F...
Facebook
TwitterDA_Avocado_PJ is a personal data analysis project, was create based on the original Avocado Prices data from the Hass Avocado Board (an U.S avocado database) posted on Kaggle by Justin Kiggins (2018) and updated to 2020 by TIMOFEI KORNEV. Finally updated to 2022 by me.
In this project, I will conduct an analysis of the avocado market in the US, helping businesses understand the avocado market in the US over the years and development orientation for business in the future by analyzing Price, Volume Sold, Revenue of avocado in U.S.
In this analysis I will solve 3 main problems:
date: The date of the observation
geography: The city or region of the observation
total_volume: Total number of avocados sold
average_price: The average price of a single avocado
_4046,_4225,_4770: Total number of avocados with PLU 4046,4225,4770 sold
type : Conventional or organic
First, I need to update this data to 2022. Because the original data is only updated from 2015 to 2020.
After that, I categorize the dataset into 2 types:
avocado_isUS_2022: Is a dataset representing totals across the United States
avocado_notUS_2022: Is a dataset showing only cities and regions in the United States
But
After looking through the data, I recognize that the geography column in avocado_notUs_2022 was mixed between region and city,
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12751256%2F5544e85f498a2583158a1dd041b4a61f%2Fz4389182271817_9741866e8d44933bdbc8bf09c225adb4.jpg?generation=1685428625135895&alt=media" alt="">
midsouth : is a region include many cities but Hartford/Springfield is two big cities in Connecticut
So I decided to separate it.
Then I reviewed and removed some blank, negative values in two final dataset.
And Finally, we have
avocado_isUS_2022: Overall data on the US, used to analyze the assessment of the avocado market in the US from 2015 - 2022
avocado_detail: Data only includes cities from 2015 - 2022
The results show that the US avocado market has just gone through a major crisis in 2020 and is showing signs of recovery. This sign of recovery is strongly expressed in Organic avocados, especially in the 4770 type. The analysis also shows that there is a trend towards organic avocado varieties after the crisis, even though they are more expensive. The analysis results show that the best time to sell avocados is from early spring to the end of summer.
In this analysis we will only focus on the Organic variety, because of its prominence in the previous analysis. In addition, 2020 will be the base mark for this analysis, to show how the recovery level of each city varies.
Top 5 cities with the highest revenue from Organic avocados in the last 3 years 1. New York 2. Los Angeles 3. San Francisco 4. Seattle 5. Portland
The analysis results show that Seattle is really a potential city for participating in the avocado market in the US, with the dominance in volume as well as the highest selling price in 2022.
In this project I also created a dynamic dashboard by Power BI but sadly is it's in pbix file and hard for me while Microsoft to limit the dashboard to only pbix or pdf so I can't share it 😭😭😭
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12751256%2Fab08cec7f8d13b757713e039fbfb4584%2FAvocado_fn-1.jpg?generation=1685432960051688&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12751256%2Fdaf5313745fb737010377335c96bedd3%2FAvocado_fn-2.jpg?generation=1685432975149069&alt=media" alt="">. Attributes: product_id, product_name, category, release_date, price.
Lists 20 fictional customers with their industry and contact information. Attributes: customer_id, customer_name, industry, contact_email, contact_phone.
Contains 100 sales records tied to products and customers. Attributes: sale_id, product_id, customer_id, sale_date, region, quantity_sold, revenue.
Features 50 suppliers and the materials they provide. Attributes: supplier_id, supplier_name, material_supplied, contact_email.
Tracks materials supplied to produce products, proportional to sales. Attributes: supply_chain_id, supplier_id, product_id, supply_date, quantity_supplied.
Lists 5 departments within the business. Attributes: department_id, department_name, location.
Contains data on 30 employees and their roles in different departments. Attributes: employee_id, first_name, last_name, department_id, hire_date, salary.
Describes 10 projects handled by different departments. Attributes: project_id, project_name, department_id, start_date, end_date, budget.
Number of Tables: 8 Total Rows: Around 230 across all tables, ensuring quick queries and easy exploration.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard
This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.
Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.
These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.
This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.
Facebook
TwitterUse our https://app.powerbi.com/view?r=eyJrIjoiMDQ1MmRlMjEtMThlMy00MWIxLThmNTEtMzU4M2I5ODNmYTJlIiwidCI6ImJmMzQ2ODEwLTljN2QtNDNkZS1hODcyLTI0YTJlZjM5OTVhOCJ9">interactive dashboard to explore the data.
For queries please contact planning.statistics@communities.gov.uk.
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute"><abbr title="OpenDocument Spreadsheet" class="gem-c-attachment_abbr">ODS</abbr></span>, <span class="gem-c-attachment_attribute">248 KB</span></p>
<p class="gem-c-attachment_metadata">
This file is in an <a href="https://www.gov.uk/guidance/using-open-document-formats-odf-in-your-organisation" target="_self" class="govuk-link">OpenDocument</a> format
Local authority level statistics from table P124A are available in fully open and linkable data formats at http://opendatacommunities.org/def/concept/folders/themes/planning">Open Data Communities.
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute"><abbr title="OpenDocument Spreadsheet" class="gem-c-attachment_abbr">ODS</abbr></span>, <span class="gem-c-attachment_attribute">904 KB</span></p>
<p class="gem-c-attachment_metadata">
This file is in an <a href="https://www.gov.uk/guidance/using-open-document-formats-odf-in-your-organisation" target="_self" class="govuk-link">OpenDocument</a> format
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Welcome to my first real-world football dataset, scraped from Transfermarkt, containing detailed market value data for 499 Premier League players (2025).
This dataset includes the following attributes for each player:
Each field was carefully extracted and cleaned from public sources using custom Python scripts (available on GitHub below).
This is just Phase 1. My goal is to:
Facebook
TwitterTypically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".
"This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."
Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.
Image from stocksnap.io.
Analyses for this dataset could include time series, clustering, classification and more.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis Power BI dashboard shows the COVID-19 vaccination rate by key demographics including age groups, race and ethnicity, and sex for Tempe zip codes.Data Source: Maricopa County GIS Open Data weekly count of COVID-19 vaccinations. The data were reformatted from the source data to accommodate dashboard configuration. The Maricopa County Department of Public Health (MCDPH) releases the COVID-19 vaccination data for each zip code and city in Maricopa County at ~12:00 PM weekly on Wednesdays via the Maricopa County GIS Open Data website (https://data-maricopa.opendata.arcgis.com/). More information about the data is available on the Maricopa County COVID-19 Vaccine Data page (https://www.maricopa.gov/5671/Public-Vaccine-Data#dashboard). The dashboard’s values are refreshed at 3:00 PM weekly on Wednesdays. The most recent date included on the dashboard is available by hovering over the last point on the right-hand side of each chart. Please note that the times when the Maricopa County Department of Public Health (MCDPH) releases weekly data for COVID-19 vaccines may vary. If data are not released by the time of the scheduled dashboard refresh, the values may appear on the dashboard with the next data release, which may be one or more days after the last scheduled release.Dates: Updated data shows publishing dates which represents values from the previous calendar week (Sunday through Saturday). For more details on data reporting, please see the Maricopa County COVID-19 data reporting notes at https://www.maricopa.gov/5460/Coronavirus-Disease-2019.