9 datasets found

Data Visualization Cheat sheets and Resources
kaggle.com
zip
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kash (2022). Data Visualization Cheat sheets and Resources [Dataset]. https://www.kaggle.com/kaushiksuresh147/data-visualization-cheat-cheats-and-resources
Explore at:
zip(133638507 bytes)Available download formats
Dataset updated
May 31, 2022
Authors
Kash
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Data Visualization Corpus

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1430847%2F29f7950c3b7daf11175aab404725542c%2FGettyImages-1187621904-600x360.jpg?generation=1601115151722854&alt=media" alt="">

Data Visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions

The Data Visualizaion Copus

The Data Visualization corpus consists:

32 cheat sheets: This includes A-Z about the techniques and tricks that can be used for visualization, Python and R visualization cheat sheets, Types of charts, and their significance, Storytelling with data, etc..

32 Charts: The corpus also consists of a significant amount of data visualization charts information along with their python code, d3.js codes, and presentations relation to the respective charts explaining in a clear manner!

Some recommended books for data visualization every data scientist's should read:

Beautiful Visualization by Julie Steele and Noah Iliinsky

Information Dashboard Design by Stephen Few

Knowledge is beautiful by David McCandless (Short abstract)

The Functional Art: An Introduction to Information Graphics and Visualization by Alberto Cairo

The Visual Display of Quantitative Information by Edward R. Tufte

storytelling with data: a data visualization guide for business professionals by cole Nussbaumer knaflic

Research paper - Cheat Sheets for Data Visualization Techniques by Zezhong Wang, Lovisa Sundin, Dave Murray-Rust, Benjamin Bach

Suggestions:

In case, if you find any books, cheat sheets, or charts missing and if you would like to suggest some new documents please let me know in the discussion sections!

Resources:

Charts: I personally recommend data viz catalogue, it's easy to understand with their explanation!

Python codes: Plotly for python and Python graph gallery

R codes for charts:Plotly for R

d3 codes: Visualization codes using d3

Request to kaggle users:

A kind request to kaggle users to create notebooks on different visualization charts as per their interest by choosing a dataset of their own as many beginners and other experts could find it useful!

To create interactive EDA using animation with a combination of data visualization charts to give an idea about how to tackle data and extract the insights from the data

Suggestion and queries:

Feel free to use the discussion platform of this data set to ask questions or any queries related to the data visualization corpus and data visualization techniques

Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!
f
Data from: PSManalyst: A Dashboard for Visual Quality Control of FragPipe...
acs.figshare.com
zip
Updated Aug 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alison Felipe Alencar Chaves (2025). PSManalyst: A Dashboard for Visual Quality Control of FragPipe Results [Dataset]. http://doi.org/10.1021/acs.jproteome.5c00557.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.5c00557.s001
Dataset updated
Aug 15, 2025
Dataset provided by
ACS Publications
Authors
Alison Felipe Alencar Chaves
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
FragPipe is recognized as one of the fastest computational platforms in proteomics, making it a practical solution for the rapid quality control of high-throughput sample analyses. Starting with version 23.0, FragPipe introduced the “Generate Summary Report” feature, offering .pdf reports with essential quality control metrics to address the challenge of intuitively assessing large-scale proteomics data. While traditional spreadsheet formats (e.g., tsv files) are accessible, the complexity of the data often limits user-friendly interpretation. To further enhance accessibility, PSManalyst, a Shiny-based R application, was developed to process FragPipe output files (psm.tsv, protein.tsv, and combined_protein.tsv) and provide interactive, code-free data visualization. Users can filter peptide-spectrum matches (PSMs) by quality scores, visualize protease cleavage fingerprints as heatmaps and SeqLogos, and access a range of quality control metrics and representations such as peptide length distributions, ion densities, mass errors, and wordclouds for overrepresented peptides. The tool facilitates seamless switching between PSM and protein data visualization, offering insights into protein abundance discrepancies, samplewise similarity metrics, protein coverage, and contaminants evaluation. PSManalyst leverages several R libraries (lsa, vegan, ggfortify, ggseqlogo, wordcloud2, tidyverse, ggpointdensity, and plotly) and runs on Windows, MacOS, and Linux, requiring only a local R setup and an IDE. The app is available at (https://github.com/41ison/PSManalyst.
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Retail data analysis project (excel)
kaggle.com
zip
Updated Dec 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soe Yan Naung (2024). Retail data analysis project (excel) [Dataset]. https://www.kaggle.com/datasets/ericyang19/retail-data-analysis-project-excel
Explore at:
zip(4306415 bytes)Available download formats
Dataset updated
Dec 9, 2024
Authors
Soe Yan Naung
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
In this project, I conducted a comprehensive analysis of retail and warehouse sales data to derive actionable insights. The primary objective was to understand sales trends, evaluate performance across channels, and identify key contributors to overall business success.

To achieve this, I transformed raw data into interactive Excel dashboards that highlight sales performance and channel contributions, providing a clear and concise representation of business metrics.

Key Highlights of the Project:

Created two dashboards: Sales Dashboard and Contribution Dashboard. Answered critical business questions, such as monthly trends, channel performance, and top contributors. Presented actionable insights with professional visuals, making it easy for stakeholders to make data-driven decisions.
Open Data Visualized from Police Spreadsheets: A Case Study of Where Force,...
figshare.com
zip
Updated Oct 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JKevin Byrne (2020). Open Data Visualized from Police Spreadsheets: A Case Study of Where Force, Race and Space Collided Item [Dataset]. http://doi.org/10.6084/m9.figshare.13087235.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13087235.v2
Dataset updated
Oct 13, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
JKevin Byrne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Assets:data tables each for city of Indianapolis (IN) and Baltimore (MD) in MS Excel and MS Word, ditto shapefiles in ESRI format; all files ZIPPEDContext – abstract of report published in a blog at the serial Geospatial World:This case study called into question police conduct and policy injustice discovered in two American big cities. My path to learn about Indianapolis (IN) and Baltimore (MD) policing patterns and crime events was due to availability of open data focused on use-of-force (UOF). My specific goal was to conduct geospatial data analytics aimed at these two cities using location and other key variables. Two spreadsheets captured small data that laid acceptable statistical groundwork for iterations of exploratory spatial data analysis (ESDA). Bivariate scatterplots revealed possible police misconduct. Parallel coordinate plotting – an innovative multivariate tool – was then used to display co-occurrences of plotted UOF and racial variables associated with key police districts in Indianapolis and Baltimore. A final summary visualization sought to cartographically and dramatically compare force and race variables by way of comparative plots, graphs, and maps. I closed with three action items pertaining to a “social-justice” framework for future data-visualization, to the heightening of standards for law enforcement reform, and for a need to make a hypothetical “citizen’s arrest” of police misconduct.
Google Certificate BellaBeats Capstone Project
kaggle.com
zip
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Porzelius (2023). Google Certificate BellaBeats Capstone Project [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-certificate-bellabeats-capstone-project
Explore at:
zip(169161 bytes)Available download formats
Dataset updated
Jan 5, 2023
Authors
Jason Porzelius
Description
Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

Section 1 - Ask:

A. Guiding Questions:
1. Who are the key stakeholders and what are their goals for the data analysis project? 2. What is the business task that this data analysis project is attempting to solve?

B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.

Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

Section 2 - Prepare:

A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?

B. Key Tasks:

Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
*Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDay_merged.csv -dailyActivity_merged.csv

Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...
Bird Migration Dataset (Data Visualization / EDA)
kaggle.com
zip
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sahir Maharaj (2025). Bird Migration Dataset (Data Visualization / EDA) [Dataset]. https://www.kaggle.com/datasets/sahirmaharajj/bird-migration-dataset-data-visualization-eda
Explore at:
zip(3249826 bytes)Available download formats
Dataset updated
May 13, 2025
Authors
Sahir Maharaj
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains 10,000 synthetic records simulating the migratory behavior of various bird species across global regions. Each entry represents a single bird tagged with a tracking device and includes detailed information such as flight distance, speed, altitude, weather conditions, tagging information, and migration outcomes.

The data was entirely synthetically generated using randomized yet realistic values based on known ranges from ornithological studies. It is ideal for practicing data analysis and visualization techniques without privacy concerns or real-world data access restrictions. Because it’s artificial, the dataset can be freely used in education, portfolio projects, demo dashboards, machine learning pipelines, or business intelligence training.

With over 40 columns, this dataset supports a wide array of analysis types. Analysts can explore questions like “Do certain species migrate in larger flocks?”, “How does weather impact nesting success?”, or “What conditions lead to migration interruptions?”. Users can also perform geospatial mapping of start and end locations, cluster birds by behavior, or build time series models based on migration months and environmental factors.

For data visualization, tools like Power BI, Python (Matplotlib/Seaborn/Plotly), or Excel can be used to create insightful dashboards and interactive charts.

Join the Fabric Community DataViz Contest | May 2025: https://community.fabric.microsoft.com/t5/Power-BI-Community-Blog/%EF%B8%8F-Fabric-Community-DataViz-Contest-May-2025/ba-p/4668560
Cleaned-Data Pakistan's Largest Ecommerce Dataset
kaggle.com
Updated Mar 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
umaraziz97 (2023). Cleaned-Data Pakistan's Largest Ecommerce Dataset [Dataset]. https://www.kaggle.com/datasets/umaraziz97/cleaned-data-pakistans-largest-ecommerce-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
umaraziz97
Area covered
Pakistan
Description
Pakistan’s largest ecommerce data – Power BI Report

Dataset Link: pakistan’s_largest_ecommerce_dataset Cleaned Data: Cleaned_Pakistan’s_largest_ecommerce_dataset

Raw Data:

Rows: 584525 **Columns: **21

Process:

All the raw data transformed and saved in new Excel file Working – Pakistan Largest Ecommerce Dataset

Processed Data:

Rows: 582250 Columns: 22 Visualization: Here is the link of Visualization report link: Pakistan-s-largest-ecommerce-data-Power-BI-Data-Visualization-Report

Conclusion:

In categories Mobiles & Tables make more money by selling highest no of products and also providing highest amount of discount on products. On the other side Men’s Fashion Category has sell second highest no of products but it can’t generate money with that ratio, may be the prices of individual products is a good reason behind that. And in orders details we experience Mobiles & Tablets have highest no of canceled orders but completed orders are almost same as Men’s Fashion. We have mostly completed orders but have huge no of canceled orders. In payment methods cod has most no of completed order and mostly canceled orders have payment method Easyaxis.
10000 Indian Companies and their Basic Information
kaggle.com
zip
Updated Mar 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiv Prakash (2022). 10000 Indian Companies and their Basic Information [Dataset]. https://www.kaggle.com/datasets/shivprakash21/10000-indian-companies-and-their-basic-information
Explore at:
zip(221965 bytes)Available download formats
Dataset updated
Mar 13, 2022
Authors
Shiv Prakash
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
India
Description
Context

List of companies available in India with some additional details.

Content

This dataset contain a list of companies along with additional details like (name, type, average rating, review count, company age, company headquarters and number of employee working on that company). The whole list of company is web scrapped from the website AmbitionBox.com.

Acknowledgements

Data Source: https://www.ambitionbox.com/list-of-companies This dataset wouldn't be made without data available at ambitionbox.com. So a big thanks to the whole team of ambitionbox from the whole kaggle community.

Inspiration

My intension to create this dataset was to enlist the companies available in India and do some analysis on that.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kash (2022). Data Visualization Cheat sheets and Resources [Dataset]. https://www.kaggle.com/kaushiksuresh147/data-visualization-cheat-cheats-and-resources

Data Visualization Cheat sheets and Resources

Corpus of 32 DV cheat sheets, 32 DV charts and 7 recommended DV books

Explore at:

zip(133638507 bytes)Available download formats

Dataset updated

May 31, 2022

Authors

Kash

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Data Visualization Corpus

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1430847%2F29f7950c3b7daf11175aab404725542c%2FGettyImages-1187621904-600x360.jpg?generation=1601115151722854&alt=media" alt="">

Data Visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions

The Data Visualizaion Copus

The Data Visualization corpus consists:

32 cheat sheets: This includes A-Z about the techniques and tricks that can be used for visualization, Python and R visualization cheat sheets, Types of charts, and their significance, Storytelling with data, etc..
32 Charts: The corpus also consists of a significant amount of data visualization charts information along with their python code, d3.js codes, and presentations relation to the respective charts explaining in a clear manner!
Some recommended books for data visualization every data scientist's should read:
1. Beautiful Visualization by Julie Steele and Noah Iliinsky
2. Information Dashboard Design by Stephen Few
3. Knowledge is beautiful by David McCandless (Short abstract)
4. The Functional Art: An Introduction to Information Graphics and Visualization by Alberto Cairo
5. The Visual Display of Quantitative Information by Edward R. Tufte
6. storytelling with data: a data visualization guide for business professionals by cole Nussbaumer knaflic
7. Research paper - Cheat Sheets for Data Visualization Techniques by Zezhong Wang, Lovisa Sundin, Dave Murray-Rust, Benjamin Bach

Suggestions:

In case, if you find any books, cheat sheets, or charts missing and if you would like to suggest some new documents please let me know in the discussion sections!

Resources:

Charts: I personally recommend data viz catalogue, it's easy to understand with their explanation!
Python codes: Plotly for python and Python graph gallery
R codes for charts:Plotly for R
d3 codes: Visualization codes using d3

Request to kaggle users:

A kind request to kaggle users to create notebooks on different visualization charts as per their interest by choosing a dataset of their own as many beginners and other experts could find it useful!
To create interactive EDA using animation with a combination of data visualization charts to give an idea about how to tackle data and extract the insights from the data

Suggestion and queries:

Feel free to use the discussion platform of this data set to ask questions or any queries related to the data visualization corpus and data visualization techniques

Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!

Clear search

Close search

Google apps

Main menu

Data Visualization Cheat sheets and Resources

The Data Visualization Corpus

Data Visualization

The Data Visualizaion Copus

The Data Visualization corpus consists:

Suggestions:

Resources:

Request to kaggle users:

Suggestion and queries:

Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!

Data from: PSManalyst: A Dashboard for Visual Quality Control of FragPipe...

Dataset of development of business during the COVID-19 crisis

Retail data analysis project (excel)

Open Data Visualized from Police Spreadsheets: A Case Study of Where Force,...

Google Certificate BellaBeats Capstone Project

Bird Migration Dataset (Data Visualization / EDA)

Cleaned-Data Pakistan's Largest Ecommerce Dataset

Pakistan’s largest ecommerce data – Power BI Report

Raw Data:

Process:

Processed Data:

Conclusion:

10000 Indian Companies and their Basic Information

Context

Content

Acknowledgements

Inspiration

Data Visualization Cheat sheets and Resources

Corpus of 32 DV cheat sheets, 32 DV charts and 7 recommended DV books

The Data Visualization Corpus

Data Visualization

The Data Visualizaion Copus

The Data Visualization corpus consists:

Suggestions:

Resources:

Request to kaggle users:

Suggestion and queries:

Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!