Facebook
TwitterThis dataset was created by Truong Dai
Facebook
TwitterClick here for original dataset: https://community.tableau.com/docs/DOC-1236
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
The Superstore Sales Data dataset, available in an Excel format as "Superstore.xlsx," is a comprehensive collection of sales and customer-related information from a retail superstore. This dataset comprises* three distinct tables*, each providing specific insights into the store's operations and customer interactions.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A dataset I generated to showcase a sample set of user data for a fictional streaming service. This data is great for practicing SQL, Excel, Tableau, or Power BI.
1000 rows and 25 columns of connected data.
See below for column descriptions.
Enjoy :)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This file is used by the SampleQC tableau workbook to provide insights on which samples passed QC. It is a subset of the file that is generated by the RNASeq pipeline where all the genes are dropped out.
Facebook
TwitterThe HR dataset contains employee-related information, such as personal details, job roles, salaries, and performance metrics. It's used by organizations to manage human resources, make informed staffing decisions, and analyze workforce trends. The dataset aids in optimizing employee satisfaction, productivity, and organizational growth.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15826402%2F6f621dd7a72a2d8c6d0df659c6604189%2FHR%20Dashboard.jpg?generation=1692882310646646&alt=media" alt="">
Facebook
TwitterSocrata datasets, including private datasets, can be accessed through a unique OData endpoint, allowing users to seamlessly connect to their data through a number of different tools, including Tableau Desktop. ↙︎↙︎CLICK THE "MORE" LINK↙︎↙︎
See the links below for relevant documentation.
Note that Socrata OData endpoints support basic filtering, for example to retrieve just the currently active Edmonton Public School (EPSB) ward boundaries from the dataset that contains the historical data as well:
https://data.edmonton.ca/OData.svc/y5qu-dj6t?$filter=effdt_type eq 'Current'
To retrieve the EPSB ward boundaries as they were in 2014:
https://data.edmonton.ca/OData.svc/y5qu-dj6t?$filter=year(effective_start_date) Le 2014 and year(effective_end_date) Gt 2014
This kind of filtering may be better achieved in Tableau though.
Facebook
TwitterThis data set is from Tableau which is the default sample data for all tableau dashboard development. It is extensively helpful for all analyses.
Can have orders, Returns, and Customer details.
This is from Tableau
Combine all Orders, Returns, and customers' data and how to improve the avg. revenue on MoM and how to reduce returns. Predict the return products and losses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article discusses how to make statistical graphics a more prominent element of the undergraduate statistics curricula. The focus is on several different types of assignments that exemplify how to incorporate graphics into a course in a pedagogically meaningful way. These assignments include having students deconstruct and reconstruct plots, copy masterful graphs, create one-minute visual revelations, convert tables into “pictures,” and develop interactive visualizations, for example, with the virtual earth as a plotting canvas. In addition to describing the goals and details of each assignment, we also discuss the broader topic of graphics and key concepts that we think warrant inclusion in the statistics curricula. We advocate that more attention needs to be paid to this fundamental field of statistics at all levels, from introductory undergraduate through graduate level courses. With the rapid rise of tools to visualize data, for example, Google trends, GapMinder, ManyEyes, and Tableau, and the increased use of graphics in the media, understanding the principles of good statistical graphics, and having the ability to create informative visualizations is an ever more important aspect of statistics education. Supplementary materials containing code and data for the assignments are available online.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Six datasets that have been used to evaluate reasoning with a Controlled Natural Language. A seventh file contains the grammar specification of the Controlled Natural Language. * 'BNF Grammar.pdf' contains the grammar specification of the Controlled Natural Language in Backus-Naur Form. * 'syllogism_dataset.csv' contains the adapted Kaggle dataset. * 'puzzles_dataset.csv' contains logical puzzles found on the internet and reformulated in the Controlled Natural Language if necessary. * 'gpt_dataset_easy.csv', 'gpt_dataset_hard.csv' and 'deepseek_dataset.csv' contain reasoning examples generated by Large Language Models. * 'interface_examples.csv' contains the inference example used in the user tests.
Facebook
TwitterThe data here include SFI research programmes from 2011 that were managed end-to-end in SFI’s Grants and Awards Management System. Programmes were gradually managed through the Grants and Awards Management System from 2011, and therefore awards made under programmes prior to 2011 were excluded as these data were not available. Furthermore, non-research funded programmes (e.g. education and public engagement grants) and programmes where SFI simply provided the funding to another organisation who solicit and process the applications, for example Wellcome, Royal Society etc., were also excluded.
The data include awards offered by SFI, irrespective of whether the award was accepted or declined by the applicant, as this best represents completion of the SFI peer review process. Where awards were transferred or underwent different ownership after their inception, data were based on the lead applicant’s self-declared gender at the time the award decision was made and currently reflects a binary categorisation of gender, e.g. male or female (with exclusions as described previously) between 2011 and 2021.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .
We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments
--View dataset
SELECT *
FROM netflix;
--The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
SELECT show_id, COUNT(*)
FROM netflix
GROUP BY show_id
ORDER BY show_id DESC;
--No duplicates
--Check null values across columns
SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
FROM netflix;
We can see that there are NULLS.
director_nulls = 2634
movie_cast_nulls = 825
country_nulls = 831
date_added_nulls = 10
rating_nulls = 4
duration_nulls = 3
The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column
-- Below, we find out if some directors are likely to work with particular cast
WITH cte AS
(
SELECT title, CONCAT(director, '---', movie_cast) AS director_cast
FROM netflix
)
SELECT director_cast, COUNT(*) AS count
FROM cte
GROUP BY director_cast
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;
With this, we can now populate NULL rows in directors
using their record with movie_cast
UPDATE netflix
SET director = 'Alastair Fothergill'
WHERE movie_cast = 'David Attenborough'
AND director IS NULL ;
--Repeat this step to populate the rest of the director nulls
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET director = 'Not Given'
WHERE director IS NULL;
--When I was doing this, I found a less complex and faster way to populate a column which I will use next
Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column
--Populate the country using the director column
SELECT COALESCE(nt.country,nt2.country)
FROM netflix AS nt
JOIN netflix AS nt2
ON nt.director = nt2.director
AND nt.show_id <> nt2.show_id
WHERE nt.country IS NULL;
UPDATE netflix
SET country = nt2.country
FROM netflix AS nt2
WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id
AND netflix.country IS NULL;
--To confirm if there are still directors linked to country that refuse to update
SELECT director, country, date_added
FROM netflix
WHERE country IS NULL;
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET country = 'Not Given'
WHERE country IS NULL;
The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization
--Show date_added nulls
SELECT show_id, date_added
FROM netflix_clean
WHERE date_added IS NULL;
--DELETE nulls
DELETE F...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer, the second-leading cause of mortality, kills 16% of people worldwide. Unhealthy lifestyles, smoking, alcohol abuse, obesity, and a lack of exercise have been linked to cancer incidence and mortality. However, it is hard. Cancer and lifestyle correlation analysis and cancer incidence and mortality prediction in the next several years are used to guide people’s healthy lives and target medical financial resources. Two key research areas of this paper are Data preprocessing and sample expansion design Using experimental analysis and comparison, this study chooses the best cubic spline interpolation technology on the original data from 32 entry points to 420 entry points and converts annual data into monthly data to solve the problem of insufficient correlation analysis and prediction. Factor analysis is possible because data sources indicate changing factors. TSA-LSTM Two-stage attention design a popular tool with advanced visualization functions, Tableau, simplifies this paper’s study. Tableau’s testing findings indicate it cannot analyze and predict this paper’s time series data. LSTM is utilized by the TSA-LSTM optimization model. By commencing with input feature attention, this model attention technique guarantees that the model encoder converges to a subset of input sequence features during the prediction of output sequence features. As a result, the model’s natural learning trend and prediction quality are enhanced. The second step, time performance attention, maintains We can choose network features and improve forecasts based on real-time performance. Validating the data source with factor correlation analysis and trend prediction using the TSA-LSTM model Most cancers have overlapping risk factors, and excessive drinking, lack of exercise, and obesity can cause breast, colorectal, and colon cancer. A poor lifestyle directly promotes lung, laryngeal, and oral cancers, according to visual tests. Cancer incidence is expected to climb 18–21% between 2020 and 2025, according to 2021. Long-term projection accuracy is 98.96 percent, and smoking and obesity may be the main cancer causes.
Facebook
TwitterData for replication package Paper Title: Drivers of firm-government engagement for technology ventures Authors: Lauren Lanahan; Iman Hemmatian; Amol M. Joshi; Evan E. Johnson Lead Author of Data Curation, Code, & Analysis: Lauren Lanahan (llanahan@uoregon.edu) Software: Stata 18; Tableau Computational Requirements: We utilized a powerful server (32-core processors, 384 GB memory, 32 TB storage) to construct the sample and run the analyses. We provide the code and data for all empirical assessments. And for additional transparency, we provide the log file for the set of empirical assessments (i.e., descriptive statistics and regression assessments). Replication package includes: [1] do file Do File_READ FIRST.do [3] dta files Descriptive Statistics Data File.dta (Table 4, Table 5; S1 Table; S3 Table) Regression Data File.dta (Table 6; S2 Table; S4 Table; S5 Table; S6 Table; Table 7; Table 8) Data for Tableau.dta (Figure 2) [1] log file Replication Log.smcl Comment: Code is organized in manner that reflects ordering of the empirical results presented in paper. Note, we report the raw data for Table 1 in the table itself.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Domain-Specific Dataset and Visualization Guide
This package contains 20 realistic datasets in CSV format across different industries, along with 20 text files suggesting visualization ideas. Each dataset includes about 300 rows of synthetic but domain-appropriate data. They are designed for data analysis, visualization practice, machine learning projects, and dashboard building.
What’s inside
20 CSV files, one for each domain:
20 TXT files, each listing 10 relevant graphing options for the dataset.
MASTER_INDEX.csv, which summarizes all domains with their column names.
Use cases
Example
Education dataset has columns like StudentName, Class, Subject, Marks, AttendancePercent. Suggested graphs: bar chart of average marks by subject, scatter plot of marks vs attendance percent, line chart of attendance over time.
E-Commerce dataset has columns like OrderDate, Product, Category, Price, Quantity, Total. Suggested graphs: line chart of revenue trend, bar chart of revenue by category, pie chart of payment mode share.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
The visual analytics market has the potential to grow by USD 4.39 billion during 2021-2025, and the market’s growth momentum will accelerate at a CAGR of 11.32%.
This visual analytics market research report provides valuable insights on the post COVID-19 impact on the market, which will help companies evaluate their business approaches. Furthermore, this report extensively covers market segmentation by end-user (BFSI, CPG and retail, healthcare, manufacturing, and others) and geography (North America, APAC, Europe, MEA, and South America). The visual analytics market report also offers information on several market vendors, including Altair Engineering Inc., Alteryx Inc., Arcadia Data Inc., Datameer Inc., International Business Machines Corp., Microsoft Corp., QlikTech international AB, SAP SE, SAS Institute Inc., and Tableau Software LLC among others.
What will the Visual Analytics Market Size be in 2021?
Browse TOC and LoE with selected illustrations and example pages of Visual Analytics Market
Get Your FREE Sample Now!
Visual Analytics Market: Key Drivers and Trends
The growing availability and complexity of data are notably driving the visual analytics market growth, although factors such as data privacy and security concerns may impede market growth. Our research analysts have studied the historical data and deduced the key market drivers and the COVID-19 pandemic impact on the visual analytics industry. The holistic analysis of the drivers will help in deducing end goals and refining marketing strategies to gain a competitive edge.
The growing availability and complexity of data will fuel the growth of the visual analytics market size.
The availability of a large volume of data and rapidly growing data complexity in organizations are the major drivers for the development of various intelligence-based data analysis techniques.
Intelligent techniques involving technologies such as ML and AI can help companies retrieve the huge amount of complex data in a useful manner and use that data to enhance their services and business processes. This, in turn, is expected to drive the growth of the market for visual analytics.
The increased dependency on Internet for critical operations will drive the visual analytics market growth during the forecast period.
E-commerce vendors are posting advertisements on search engines and other websites to attract several customers. This will increase the demand for visual analytics to help e-commerce vendors track customers, analyze customer behavior, and ensure proper decision-making.
With the rising popularity and use of e-commerce, the number of digital media advertisements by e-commerce vendors is expected to increase, which will drive the growth of the market during the forecast period.
This visual analytics market analysis report also provides detailed information on other upcoming trends and challenges that will have a far-reaching effect on the market growth. The actionable insights on the trends and challenges will help companies evaluate and develop growth strategies for 2021-2025.
Who are the Major Visual Analytics Market Vendors?
The report analyzes the market’s competitive landscape and offers information on several market vendors, including:
Altair Engineering Inc.
Alteryx Inc.
Arcadia Data Inc.
Datameer Inc.
International Business Machines Corp.
Microsoft Corp.
QlikTech international AB
SAP SE
SAS Institute Inc.
Tableau Software LLC
This statistical study of the visual analytics market encompasses successful business strategies deployed by the key vendors. The visual analytics market is fragmented and the vendors are deploying growth strategies such as providing customized solutions to compete in the market.
To make the most of the opportunities and recover from post COVID-19 impact, market vendors should focus more on the growth prospects in the fast-growing segments, while maintaining their positions in the slow-growing segments.
The visual analytics market forecast report offers in-depth insights into key vendor profiles. The profiles include information on the production, sustainability, and prospects of the leading companies.
Which are the Key Regions for Visual Analytics Market?
For more insights on the market share of various regions Request for a FREE sample now!
35% of the market’s growth will originate from North America during the forecast period. The US is a key market for visual analytics in North America. Market growth in this region will be faster than the growth of the market in Europe, MEA, and South America.
This market research report entails detailed information on the competitive intelligence, marketing gaps, and regional opportunities in store for vendors, which will assist in creating efficient business plans.
What are the Revenue-generating End-user Segments in the Visual Analy
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
🇬🇧 United Kingdom English This dataset shows different breakdowns of London's resident population by their country of birth. Data used comes from ONS' Annual Population Survey (APS). The APS has a sample of around 320,000 people in the UK (around 28,000 in London). As such all figures must be treated with some caution. 95% confidence interval levels are provided. Numbers have been rounded to the nearest thousand and figures for smaller populations have been suppressed. Four files are available for download: Country of Birth - Borough: Shows country of birth estimates in their broad groups such as European Union, South East Asia, North Africa, etc. broken down to borough level. Detailed Country of Birth - London: Shows country of birth estimates for specific countries such as France, Bangladesh, Nigeria, etc. available for London as a whole Demography Update 09-2015: A GLA Demography report that uses APS data to analyse the trends in London for the period 2004 to 2014. A supporting data file is also provided. Country of Birth Borough 2004-2016 Analysis Tool: A tool produced by GLA Demography that allows users to explore different breakdowns of country of birth data. An accompanying Tableau visualisation tool has also been produced which maps data from 2004 to 2015. Nationality data can be found here: https://data.london.gov.uk/dataset/nationality Nationality refers to that stated by the respondent during the interview. Country of birth is the country in which they were born. It is possible that an individual’s nationality may change, but the respondent’s country of birth cannot change. This means that country of birth gives a more robust estimate of change over time.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This interactive Tableau dashboard provides a detailed analysis of car sales trends from 2022 to 2023. It explores key metrics such as total sales, average car prices, and sales distribution by car type, color, and region.
Key Features: 📊 Sales Overview: Total sales, quantity, and price analysis. 📈 Monthly Trends: A time-series visualization of sales growth. 🎨 Car Color Preferences: Pie chart showing distribution by color. 🌍 Regional Sales Breakdown: Geospatial analysis of sales across the U.S. 🏆 Model-wise Performance: Sales comparison across different car brands. ⚙️ Engine & Transmission Impact: Filtering options to analyze impact by car type. This dashboard is ideal for automotive industry analysts, data enthusiasts, and business decision-makers interested in sales performance insights.
📌 Tools Used: Tableau, Data Cleaning & Preparation.
Facebook
TwitterThis dataset shows different breakdowns of London's resident population by their nationality. Data used comes from ONS' Annual Population Survey (APS). The APS has a sample of around 320,000 people in the UK (around 28,000 in London). As such all figures must be treated with some caution. 95% confidence interval levels are provided. Numbers have been rounded to the nearest thousand and figures for smaller populations have been suppressed. Two files are available to download: Nationality - Borough: Shows nationality estimates in their broad groups such as European Union, South East Asia, North Africa, etc. broken down to borough level. Detailed Nationality - London: Shows nationality estimates for specific countries such as France, Bangladesh, Nigeria, etc. available for London as a whole. A Tableau visualisation tool is also available. Country of Birth data can be found here: https://data.london.gov.uk/dataset/country-of-birth Nationality refers to that stated by the respondent during the interview. Country of birth is the country in which they were born. It is possible that an individual’s nationality may change, but the respondent’s country of birth cannot change. This means that country of birth gives a more robust estimate of change over time.
Facebook
TwitterHealth and Wellbeing of 15-year-olds in England - results from What About Youth Survey. Data has been collected on general health, diet, use of free time, physical activity, smoking, drinking, emotional wellbeing, drugs and bullying. What About YOUth? 2014 (WAY 2014) is a newly-established survey designed to collect robust local authority (LA) level data on a range of health behaviours amongst 15 year-olds.WAY 2014 is the first survey to be conducted of its kind and it is hoped that the survey will be repeated in order to form a time series of comparable data on a range of indicators for 15 year-olds across England. Questionnaire packs were sent to 295,245 young people in England and 120,115 of these responded with usable data, giving an unadjusted response rate of 40 per cent (based on the issued sample) and an adjusted response rate of 41 per cent.Participants for WAY 2014 were sampled from the Department for Education’s National Pupil Database (NPD). The NPD is a near full population database (with the exception that independent schools are not included). See this data visualised in this Tableau report. More Information from The Health and Social Care Information Centre (HSCIC) website and data downloads available from PHE Fingertips.
Facebook
TwitterThis dataset was created by Truong Dai