45 datasets found

Super Market dataset
kaggle.com
zip
Updated Nov 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chiamaka Ndubuisi (2025). Super Market dataset [Dataset]. https://www.kaggle.com/datasets/chiamakandubuisi/super-market-dataset
Explore at:
zip(215497 bytes)Available download formats
Dataset updated
Nov 4, 2025
Authors
Chiamaka Ndubuisi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Problem Statements for Data Visualization – Supermarket Sales Dataset 1. Sales Performance Across Branches Management wants to understand how sales performance varies across supermarket branches in Lagos, Abuja, Ogun, and Port Harcourt to identify the best-performing locations and areas that need improvement. • Suggested Visualizations: • Bar chart comparing total sales and profit by branch • Map chart showing sales by city • KPI cards: Total Sales, Profit, and Average Transaction Value per branch 2. Customer Purchase Behavior The marketing team needs insights into how different customer types (Member vs Normal) and genders influence purchase trends and average spending. • Suggested Visualizations: • Pie chart for customer type distribution • Bar chart for average spend by gender • Segmented comparison of total sales by customer type 3. Product Line Performance The business wants to know which product categories drive the highest revenue, quantity sold, and customer satisfaction to optimize stock levels and marketing focus. • Suggested Visualizations: • Bar chart showing total sales by product line • Column chart comparing average rating per product line • Profit margin chart by product line 4. Sales Trends Over Time The management team wants to monitor sales trends over time to identify peak periods, track seasonal variations, and plan future promotions accordingly. • Suggested Visualizations: • Line chart showing monthly or weekly sales trend • Seasonal decomposition (sales by month) • Trendline showing revenue growth 5. Payment Method Analysis The finance department needs to evaluate payment method usage (Cash, E-wallet, Credit Card) across cities to improve payment convenience and reduce transaction delays. • Suggested Visualizations: • Donut or bar chart showing share of payment methods • City-level breakdown of preferred payment type • Correlation between payment method and average transaction value 6. Customer Satisfaction Insights The customer experience team wants to explore how customer ratings relate to sales amount, product type, and branch performance to identify drivers of customer satisfaction. • Suggested Visualizations: • Scatter plot of rating vs total purchase amount • Heat map of average rating by branch and product line • KPI card showing average customer rating
Car-Sales-Analysis-Excel-Dashboard
kaggle.com
zip
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrahimryk (2025). Car-Sales-Analysis-Excel-Dashboard [Dataset]. https://www.kaggle.com/datasets/ibrahimryk/car-sales-analysis-excel-dashboard/code
Explore at:
zip(496747 bytes)Available download formats
Dataset updated
Feb 11, 2025
Authors
Ibrahimryk
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
his project involves the creation of an interactive Excel dashboard for SwiftAuto Traders to analyze and visualize car sales data. The dashboard includes several visualizations to provide insights into car sales, profits, and performance across different models and manufacturers. The project makes use of various charts and slicers in Excel for the analysis.

Objective: The primary goal of this project is to showcase the ability to manipulate and visualize car sales data effectively using Excel. The dashboard aims to provide:

Profit and Sales Analysis for each dealer. Sales Performance across various car models and manufacturers. Resale Value Analysis comparing prices and resale values. Insights into Retention Percentage by car models. Files in this Project: Car_Sales_Kaggle_DV0130EN_Lab3_Start.xlsx: The original dataset used to create the dashboard. dashboards.xlsx: The final Excel file that contains the complete dashboard with interactive charts and slicers. Key Visualizations: Average Price and Year Resale Value: A bar chart comparing the average price and resale value of various car models. Power Performance Factor: A column chart displaying the performance across different car models. Unit Sales by Model: A donut chart showcasing unit sales by car model. Retention Percentage: A pie chart illustrating customer retention by car model. Tools Used: Microsoft Excel for creating and organizing the visualizations and dashboard. Excel Slicers for interactive filtering. Charts: Bar charts, pie charts, column charts, and sunburst charts. How to Use: Download the Dataset: You can download the Car_Sales_Kaggle_DV0130EN_Lab3_Start.xlsx file from Kaggle and follow the steps to create a similar dashboard in Excel. Open the Dashboard: The dashboards.xlsx file contains the final version of the dashboard. Simply open it in Excel and start exploring the interactive charts and slicers.
User Profile for Ads Project in Power BI
kaggle.com
zip
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanjana Murthy (2024). User Profile for Ads Project in Power BI [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/user-profile-for-ads-project-in-power-bi/code
Explore at:
zip(784750 bytes)Available download formats
Dataset updated
Jul 4, 2024
Authors
Sanjana Murthy
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
About Dataset:

Domain : Marketing Project: User Profiling and Segmentation Datasets: user_profile_for_ads Dataset Type: Excel Data Dataset Size: 16k+ record

KPI's:

Distribution of Key Demographic Variables like: a. Count of Age b. Count of Gender c. Count of Education Level d. Count of Income Level e. Count of Device Usage

Understanding Online Behavior like: a. Count of Time Spent Online (hrs/Weekday) b. Count of Time Spent Online (hrs/Weekend)

Ad Interaction Metrics: a. Count of likes and Reactions b. Count of click through rates (CTR) c. Count of Conversion Rate d. Count of Ad Interaction Time (secs) e. Count of Ad Interaction Time by Top Interests

Process: 1. Understanding the problem 2. Data Collection 3. Exploring and analyzing the data 4. Interpreting the results

This data contains stacked column chart, stacked bar chart, pie chart, dashboard, slicers, page navigation button.
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
Tableau Dummy Dataset for Practice
kaggle.com
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piush Dave (2025). Tableau Dummy Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/piyushdave/tableau-dummy-dataset-for-practice
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Piush Dave
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Domain-Specific Dataset and Visualization Guide

This package contains 20 realistic datasets in CSV format across different industries, along with 20 text files suggesting visualization ideas. Each dataset includes about 300 rows of synthetic but domain-appropriate data. They are designed for data analysis, visualization practice, machine learning projects, and dashboard building.

What’s inside

20 CSV files, one for each domain:

Education

E-Commerce

Healthcare

Finance

Retail

Social Media

Manufacturing

Sports

Transport

Hospitality

Telecom

Banking

Real Estate

Gaming

Agriculture

Automobile

Energy

Insurance

Government

Entertainment

20 TXT files, each listing 10 relevant graphing options for the dataset.

MASTER_INDEX.csv, which summarizes all domains with their column names.

Use cases

Practice data cleaning, exploration, and visualization in Excel, Tableau, Power BI, or Python.

Build dashboards for specific industries.

Train beginner-level machine learning models such as classification and regression.

Use in classroom teaching or workshops as ready-made datasets.

Example

Education dataset has columns like StudentName, Class, Subject, Marks, AttendancePercent. Suggested graphs: bar chart of average marks by subject, scatter plot of marks vs attendance percent, line chart of attendance over time.

E-Commerce dataset has columns like OrderDate, Product, Category, Price, Quantity, Total. Suggested graphs: line chart of revenue trend, bar chart of revenue by category, pie chart of payment mode share.
n
Summary for Policymakers of the Working Group I Contribution to the IPCC...
data-search.nerc.ac.uk
catalogue.ceda.ac.uk
Updated Jul 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Summary for Policymakers of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure SPM.4 (v20210809) [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=scenarios
Explore at:
Dataset updated
Jul 1, 2021
Description
Data for Figure SPM.4 from the Summary for Policymakers (SPM) of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Figure SPM.4 panel a shows global emissions projections for CO2 and a set of key non-CO2 climate drivers, for the core set of five IPCC AR6 scenarios. Figure SPM.4 panel b shows attributed warming in 2081-2100 relative to 1850-1900 for total anthropogenic, CO2, other greenhouse gases, and other anthropogenic forcings for five Shared Socio-economic Pathway (SSP) scenarios. --------------------------------------------------- How to cite this dataset --------------------------------------------------- When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 3−32, doi:10.1017/9781009157896.001. --------------------------------------------------- Figure subpanels --------------------------------------------------- The figure has two panels, with data provided for all panels in subdirectories named panel_a and panel_b. --------------------------------------------------- List of data provided --------------------------------------------------- This dataset contains: - Projected emissions from 2015 to 2100 for the five scenarios of the AR6 WGI core scenario set (SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5) - Projected warming for all anthropogenic forcers, CO2 only, non-CO2 greenhouse gases (GHGs) only, and other anthropogenic components for 2081-2100 relative to 1850-1900, for SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0 and SSP5-8.5. The five illustrative SSP (Shared Socio-economic Pathway) scenarios are described in Box SPM.1 of the Summary for Policymakers and Section 1.6.1.1 of Chapter 1. --------------------------------------------------- Data provided in relation to figure --------------------------------------------------- Panel a: The first column includes the years, while the next columns include the data per scenario and per climate forcer for the line graphs. - Data file: Carbon_dioxide_Gt_CO2_yr.csv. relates to Carbon dioxide emissions panel - Data file: Methane_Mt_CO2_yr.csv. relates to Methane emissions panel - Data file: Nitrous_oxide_Mt N2O_yr.csv. relates to Nitrous oxide emissions panel - Data file: Sulfur_dioxide_Mt SO2_yr.csv. relates to Sulfur dioxide emissions panel Panel b: - Data file: ts_warming_ranges_1850-1900_base_panel_b.csv. [Rows 2 to 5 relate to the first bar chart (cyan). Rows 6 to 9 relate to the second bar chart (blue). Rows 10 to 13 relate to the third bar chart (orange). Rows 14 to 17 relate to the fourth bar chart (red). Rows 18 to 21 relate to the fifth bar chart (brown).]. --------------------------------------------------- Sources of additional information --------------------------------------------------- The following weblink are provided in the Related Documents section of this catalogue record: - Link to the report webpage, which includes the report component containing the figure (Summary for Policymakers) and the Supplementary Material for Chapter 1, which contains details on the input data used in Table 1.SM.1..(Cross-Chapter Box 1.4, Figure 2). - Link to related publication for input data used in panel a.
z
Classification of web-based Digital Humanities projects leveraging...
zenodo.org
data-staging.niaid.nih.gov
csv, tsv
Updated Nov 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tommaso Battisti; Tommaso Battisti (2025). Classification of web-based Digital Humanities projects leveraging information visualisation techniques [Dataset]. http://doi.org/10.5281/zenodo.14192758
Explore at:
tsv, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14192758
Dataset updated
Nov 10, 2025
Dataset provided by
Zenodo
Authors
Tommaso Battisti; Tommaso Battisti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This dataset contains a list of 186 Digital Humanities projects leveraging information visualisation techniques. Each project has been classified according to visualisation and interaction methods, narrativity and narrative solutions, domain, methods for the representation of uncertainty and interpretation, and the employment of critical and custom approaches to visually represent humanities data.

Classification schema: categories and columns

The project_id column contains unique internal identifiers assigned to each project. Meanwhile, the last_access column records the most recent date (in DD/MM/YYYY format) on which each project was reviewed based on the web address specified in the url column.
The remaining columns can be grouped into descriptive categories aimed at characterising projects according to different aspects:

Narrativity. It reports the presence of information visualisation techniques employed within narrative structures. Here, the term narrative encompasses both author-driven linear data stories and more user-directed experiences where the narrative sequence is determined by user exploration [1]. We define 2 columns to identify projects using visualisation techniques in narrative, or non-narrative sections. Both conditions can be true for projects employing visualisations in both contexts. Columns:

non_narrative (boolean)

narrative (boolean)

Domain. The humanities domain to which the project is related. We rely on [2] and the chapters of the first part of [3] to abstract a set of general domains. Column:

domain (categorical):

History and archaeology

Art and art history

Language and literature

Music and musicology

Multimedia and performing arts

Philosophy and religion

Other: both extra-list domains and cases of collections without a unique or specific thematic focus.

Visualisation of uncertainty and interpretation. Buiding upon the frameworks proposed by [4] and [5], a set of categories was identified, highlighting a distinction between precise and impressional communication of uncertainty. Precise methods explicitly represent quantifiable uncertainty such as missing, unknown, or uncertain data, precisely locating and categorising it using visual variables and positioning. Two sub-categories are interactive distinction, when uncertain data is not visually distinguishable from the rest of the data but can be dynamically isolated or included/excluded categorically through interaction techniques (usually filters); and visual distinction, when uncertainty visually “emerges” from the representation by means of dedicated glyphs and spatial or visual cues and variables. On the other hand, impressional methods communicate the constructed and situated nature of data [6], exposing the interpretative layer of the visualisation and indicating more abstract and unquantifiable uncertainty using graphical aids or interpretative metrics. Two sub-categories are: ambiguation, when the use of graphical expedients—like permeable glyph boundaries or broken lines—visually convey the ambiguity of a phenomenon; and interpretative metrics, when expressive, non-scientific, or non-punctual metrics are used to build a visualisation. Column:

uncertainty_interpretation (categorical):

Interactive distinction

Visual distinction

Ambiguation

Interpretative metrics

Critical adaptation. We identify projects in which, with regards to at least a visualisation, the following criteria are fulfilled: 1) avoid repurposing of prepackaged, generic-use, or ready-made solutions; 2) being tailored and unique to reflect the peculiarities of the phenomena at hand; 3) avoid simplifications to embrace and depict complexity, promoting time-consuming visualisation-based inquiry. Column:

critical_adaptation (boolean)

Non-temporal visualisation techniques. We adopt and partially adapt the terminology and definitions from [7]. A column is defined for each type of visualisation and accounts for its presence within a project, also including stacked layouts and more complex variations. Columns and inclusion criteria:

plot (boolean): visual representations that map data points onto a two-dimensional coordinate system.

cluster_or_set (boolean): sets or cluster-based visualisations used to unveil possible inter-object similarities.

map (boolean): geographical maps used to show spatial insights. While we do not specify the variants of maps (e.g., pin maps, dot density maps, flow maps, etc.), we make an exception for maps where each data point is represented by another visualisation (e.g., a map where each data point is a pie chart) by accounting for the presence of both in their respective columns.

network (boolean): visual representations highlighting relational aspects through nodes connected by links or edges.

hierarchical_diagram (boolean): tree-like structures such as tree diagrams, radial trees, but also dendrograms. They differ from networks for their strictly hierarchical structure and absence of closed connection loops.

treemap (boolean): still hierarchical, but highlighting quantities expressed by means of area size. It also includes circle packing variants.

word_cloud (boolean): clouds of words, where each instance’s size is proportional to its frequency in a related context

bars (boolean): includes bar charts, histograms, and variants. It coincides with “bar charts” in [7] but with a more generic term to refer to all bar-based visualisations.

line_chart (boolean): the display of information as sequential data points connected by straight-line segments.

area_chart (boolean): similar to a line chart but with a filled area below the segments. It also includes density plots.

pie_chart (boolean): circular graphs divided into slices which can also use multi-level solutions.

plot_3d (boolean): plots that use a third dimension to encode an additional variable.

proportional_area (boolean): representations used to compare values through area size. Typically, using circle- or square-like shapes.

other (boolean): it includes all other types of non-temporal visualisations that do not fall into the aforementioned categories.

Temporal visualisations and encodings. In addition to non-temporal visualisations, a group of techniques to encode temporality is considered in order to enable comparisons with [7]. Columns:

timeline (boolean): the display of a list of data points or spans in chronological order. They include timelines working either with a scale or simply displaying events in sequence. As in [7], we also include structured solutions resembling Gantt chart layouts.

temporal_dimension (boolean): to report when time is mapped to any dimension of a visualisation, with the exclusion of timelines. We use the term “dimension” and not “axis” as in [7] as more appropriate for radial layouts or more complex representational choices.

animation (boolean): temporality is perceived through an animation changing the visualisation according to time flow.

visual_variable (boolean): another visual encoding strategy is used to represent any temporality-related variable (e.g., colour).

Interactions. A set of categories to assess affordable interactions based on the concept of user intent [8] and user-allowed perceptualisation data actions [9]. The following categories roughly match the manipulative subset of methods of the “how” an interaction is performed in the conception of [10]. Only interactions that affect the aspect of the visualisation or the visual representation of its data points, symbols, and glyphs are taken into consideration. Columns:

basic_selection (boolean): the demarcation of an element either for the duration of the interaction or more permanently until the occurrence of another selection.

advanced_selection (boolean): the demarcation involves both the selected element and connected elements within the visualisation or leads to brush and link effects across views. Basic selection is tacitly implied.

navigation (boolean): interactions that allow moving, zooming, panning, rotating, and scrolling the view but only when applied to the visualisation and not to the web page. It also includes “drill” interactions (to navigate through different levels or portions of data detail, often generating a new view that replaces or accompanies the original) and “expand” interactions generating new perspectives on data by expanding and collapsing nodes.

arrangement (boolean): the organisation of visualisation elements (symbols, glyphs, etc.) or multi-visualisation layouts spatially through drag and drop or
DATS 6401 - Final Project - Yon ho Cheong.zip
figshare.com
zip
Updated Dec 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yon ho Cheong (2018). DATS 6401 - Final Project - Yon ho Cheong.zip [Dataset]. http://doi.org/10.6084/m9.figshare.7471007.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7471007.v1
Dataset updated
Dec 15, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yon ho Cheong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau
Ecommerce Visualization
kaggle.com
zip
Updated Feb 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Alok Yadav at YBI Foundation (2023). Ecommerce Visualization [Dataset]. https://www.kaggle.com/datasets/ybifoundation/ecommerce-visualization
Explore at:
zip(7240238 bytes)Available download formats
Dataset updated
Feb 26, 2023
Authors
Dr. Alok Yadav at YBI Foundation
Description
The Ecommerce transaction analysis is one of great way to learn data visualization with Power BI or Tableau. Your visualization must reveals customer sales, product sales, regional sales, monthly sales, time of the day sales to gain valuable insights and business planning. You may use Combo Charts, Cards, Bar Charts, Tables, or Line Charts; for the customer segmentation page, you could employ Column Charts, Bubble Charts, Point Maps, Tables, etc.
Power BI Sales Data
kaggle.com
zip
Updated May 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanjana Murthy (2024). Power BI Sales Data [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/power-bi-sales-data
Explore at:
zip(7202740 bytes)Available download formats
Dataset updated
May 8, 2024
Authors
Sanjana Murthy
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This data contains Index, Text box, Button, Slicer, Image, Card, Multi row card, Table, Matrix, Conditional Formatting, Stacked Column Chart, Clustered Column Chart, Stacked Bar chart, 100% stacked column chart, background image, Line chart, Donut Chart, Gauge, Filters & Bookmarks, Maps, Scatter Chart, Anomalies, Tooltip, Animated Bar Chart Race, Enlighten Aquarium, Scroller, Measures, Dax, All Dax, Switch Dax, Waterfall Chart, Treemap.
Myntra Dataset Analysis
kaggle.com
zip
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivek Singh (2024). Myntra Dataset Analysis [Dataset]. https://www.kaggle.com/datasets/vivek052/myntra-dataset-analysis
Explore at:
zip(18601507 bytes)Available download formats
Dataset updated
Sep 16, 2024
Authors
Vivek Singh
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset contains information on nearly 150,000 products listed on Myntra. Each entry includes:

product_name: The name of the product.

brand_name: The brand of the product.

rating: The product's rating

rating_count: The number of ratings the product has received

marked_price: The original price of the product

discounted_price: The price after discount

sizes: Available sizes for the product

product_link: URL of the product

img_link: URL of the product image

product_tag: Tags associated with the product This data has been scraped from the Myntra website.

Data Analysis on Myntra Dataset

Data Analysis on Myntra dataset and represented using pivot tables and interactive dashboard

In this data analysis project, I undertook a comprehensive approach to enhance and visualize the Myntra real-time dataset. The key steps involved in the process were as follows:

Data Cleaning and Preparation:

Remove Unwanted Columns: I meticulously reviewed the dataset to identify and eliminate irrelevant columns like size, discounted amount that did not contribute to the analysis objectives. This step streamlined the dataset, focusing on the most pertinent data.

Data Cleaning: Addressed inconsistencies, missing values, and errors within the dataset. This involved standardizing data formats, correcting inaccuracies, and filling in or removing incomplete records to ensure the dataset's integrity.

Data Analysis:

Pivot Tables Creation: Developed pivot tables to summarize and analyze key metrics. This allowed for the aggregation of data across various dimensions such as product categories, sales performance, and customer demographics, providing insightful summaries and trends.

Interactive Dashboard:

Dashboard Development: Created an interactive dashboard to visualize real-time data. This dashboard includes dynamic charts, filters, and visualizations that enable users to interact with the dataset, facilitating real-time insights and decision-making. Visualization: Implemented various types of visualizations such as bar charts, column chart to effectively communicate the data trends and patterns.

Overall, this project aimed to deliver a clean, organized, and insightful view of the Myntra dataset through advanced analysis and interactive visualization techniques. The resulting dashboard offers a powerful tool for monitoring and analyzing real-time data, supporting data-driven decision-making processes.
Bank Loan Analysis_Excel
kaggle.com
zip
Updated Sep 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sthitaprangya Madhuchhanda Devi (2024). Bank Loan Analysis_Excel [Dataset]. https://www.kaggle.com/datasets/sthitaprangya4707/bank-loan-analysis
Explore at:
zip(38732536 bytes)Available download formats
Dataset updated
Sep 8, 2024
Authors
Sthitaprangya Madhuchhanda Devi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
About Datasets:

Domain : Finance Project: Bank loan of customers Datasets: Finance_1.xlsx & Finance_2.xlsx Dataset Type: Excel Data Dataset Size: Each Excel file has 39k+ records

KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revolving balance 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data

Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results

This data contains Power Query, Power Pivot, Merge data, Clustered Bar Chart, Clustered Column Chart, Line Chart, 3D Pie chart, Dashboard, slicers, timeline, formatting techniques.
Customer Sale Dataset for Data Visualization
kaggle.com
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atul (2025). Customer Sale Dataset for Data Visualization [Dataset]. https://www.kaggle.com/datasets/atulkgoyl/customer-sale-dataset-for-visualization
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atul
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This synthetic dataset is designed specifically for practicing data visualization and exploratory data analysis (EDA) using popular Python libraries like Seaborn, Matplotlib, and Pandas.

Unlike most public datasets, this one includes a diverse mix of column types:

📅 Date columns (for time series and trend plots) 🔢 Numerical columns (for histograms, boxplots, scatter plots) 🏷️ Categorical columns (for bar charts, group analysis)

Whether you are a beginner learning how to visualize data or an intermediate user testing new charting techniques, this dataset offers a versatile playground.

Feel free to:

Create EDA notebooks Practice plotting techniques Experiment with filtering, grouping, and aggregations 🛠️ No missing values, no data cleaning needed — just download and start exploring!

Hope you find this helpful. Looking forward to hearing from you all.
dataset_for_sales
kaggle.com
zip
Updated Aug 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andri Lesmana (2023). dataset_for_sales [Dataset]. https://www.kaggle.com/datasets/andrilesmana/dataset-for-sales/discussion
Explore at:
zip(2504483 bytes)Available download formats
Dataset updated
Aug 29, 2023
Authors
Andri Lesmana
Description
We start by cleaning our data. Tasks during this section include: - Drop NaN values from DataFrame - Removing rows based on a condition - Change the type of columns (to_numeric, to_datetime, astype)

Once we have cleaned up our data a bit, we move the data exploration section. In this section we explore 5 high level business questions related to our data: - What was the best month for sales? How much was earned that month? - What city sold the most product? - What time should we display advertisemens to maximize the likelihood of customer’s buying product? - What products are most often sold together? - What product sold the most? Why do you think it sold the most?

To answer these questions we walk through many different pandas & matplotlib methods. They include: - Concatenating multiple csvs together to create a new DataFrame (pd.concat) - Adding columns - Parsing cells as strings to make new columns (.str) - Using the .apply() method - Using groupby to perform aggregate analysis - Plotting bar charts and lines graphs to visualize our results - Labeling our graphs
Restaurant Dish Orders in Power BI
kaggle.com
zip
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fords (2024). Restaurant Dish Orders in Power BI [Dataset]. https://www.kaggle.com/datasets/fords001/restaurant-dish-orders
Explore at:
zip(620177 bytes)Available download formats
Dataset updated
Oct 30, 2024
Authors
Fords
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
In this data analysis, I used the dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground .Which has a license (License: Public Domain). Public domain license work is free for use by anyone for any purpose without restriction under copyright law. Public domain is the form of open/free, since no one owns or controls the material in any way. Dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground has 3 dataframes in csv format: ‘restaurant_db_data_dictionary.csv’ as an instruction or description of the relationships between tables. ‘order_details.csv’ - it has columns order_details_id,order_id, order_date, order_time,item_id ‘menu_items.csv‘ - it has columns menu_item_id , item_name ,category ,price .

Using 3 dataframes we will create new dataframe ‘order_details_table' (result dataframe in Power BI file restaurant_orders_result.pbix). Based on this new dataframe, we will generate various charts visualizations in the file restaurant_orders_result_charts.pbix and also attach the charts here .Below is a more detailed description of how I created the new dataframe 'order_details_table' ,and the visualizations, including bar charts and pie charts.

I will use Power Bi in this project . 1. Delete all rows where value rows is ‘NULL’ in the column ‘item_id’ from the dataframe ‘order_details’. For this, I use Power Query Editor and the ‘Keep Rows’ function. And keep all rows except for 'NULL' values . 2. Combine 2 columns ‘order_date’ and ‘order_time’ to 1 column ‘order_date_time’ in the format MM/DD/YY HH:MM:SS 3. We also need to merge two dataframes into one dataframe ‘order_details_table’ using the ‘Merge Queries’ function in Power Query Editor and choose inner join (only matching rows). In the dataframe ‘restaurant_db_data_dictionary.csv’ we find information that column ‘item_id’ from ‘order_details’ table matches the ‘menu_item_id’ in the ‘menu_items’ table and combine 2 tables by common column id ‘menu_item_id’ and ‘item_id’ . 4. We remove the columns that we don’t need and also create a new ‘order_id’ with unique number for each order.

As a result we have 6 columns in the new dataframe ‘order_details_table’ , such as: order_details_id: A unique identifier for each dish within an order, order_id : The unique identifier for each order or transaction , order_date_time : The date when the order was created in the format (MM/DD/YY HH:MM:SS) , menu_item_category : The category to which the dish belongs , menu_item_name : The name of the dish on the menu , menu_item_price : The price of the dish .

Table order_detail_tables from Power BI file restaurant_orders_result.pbix https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F1098315c0e34255b67ad3419aa113bf0%2Fdataframe.png?generation=1730269164808705&alt=media" alt="">

I have also created bar charts and pie charts to display the results from the new dataframe. These plots are included in the file ‘restaurant_orders_result_charts.pbix’ . And you can find pictures of charts below.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F4254696bbd3d7e0fc5f456c226c39114%2Fpicture_1.png?generation=1730269227195114&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F71092cf769862cf7364fe1ccac9fad83%2Fpicture_2.png?generation=1730269249147687&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F528ef51ecf21f006b0c21b65503e03fa%2Fpicture_3.png?generation=1730269284640753&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F147c240da4be5bfe9da057a8bc5d5939%2Fpicture_4.png?generation=1730269300346146&alt=media" alt="">

I also attached the original and new files to this project, thank you.
Law and Order TV Series Dataset
kaggle.com
zip
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Law and Order TV Series Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/law-and-order-tv-series-dataset
Explore at:
zip(1443584 bytes)Available download formats
Dataset updated
Dec 8, 2023
Authors
The Devastator
Description
Law and Order TV Series Dataset

Law and Order TV Series Data

By Gove Allen [source]

About this dataset

The Law and Order Dataset is a comprehensive collection of data related to the popular television series Law and Order that aired from 1990 to 2010. This dataset, compiled by IMDB.com, provides detailed information about each episode of the show, including its title, summary, airdate, director, writer, guest stars, and IMDb rating.

With over 450 episodes spanning 20 seasons of the original series as well as its spin-offs like Law and Order: Special Victims Unit, this dataset offers a wealth of information for analyzing various facets of criminal justice and law enforcement portrayed in the show. Whether you are a student or researcher studying crime-related topics or simply an avid fan interested in exploring behind-the-scenes details about your favorite episodes or actors involved in them, this dataset can be a valuable resource.

By examining this extensive collection of data using SQL queries or other analytical techniques, one can gain insights into patterns such as common tropes used in different seasons or characters that appeared most frequently throughout the series. Additionally, researchers can investigate correlations between factors like episode directors/writers and their impact on viewer ratings.

This dataset allows users to dive deep into analyzing aspects like crime types covered within episodes (e.g., homicide cases versus white-collar crimes), how often certain guest stars made appearances (including famous actors who had early roles on the show), or which writers/directors contributed most consistently high-rated episodes. Such analyses provide opportunities for uncovering trends over time within Law and Order's narrative structure while also shedding light on societal issues addressed by the series.

By making this dataset available for educational purposes at collegiate levels specifically aimed at teaching SQL skills—a powerful tool widely used in data analysis—the intention is to empower students with real-world examples they can explore hands-on while honing their database querying abilities. The graphical representation accompanying this dataset further enhances understanding by providing visualizations that illustrate key relationships between different variables.

Whether you are a seasoned data analyst, a budding criminologist, or simply looking to understand the intricacies of one of the most successful crime dramas in television history, the Law and Order Dataset offers you a vast array of information ripe for exploration and analysis

How to use the dataset

Understanding the Columns

Before diving into analyzing the data, it's important to understand what each column represents. Here is an overview:

Episode: The episode number within its respective season.

Title: The title of each episode.

Season: The season number in which each episode belongs.

Year: The year in which each episode was released.

Rating: IMDB rating for each episode (on a scale from 0-10).

Votes: Number of votes received by each episode on IMDB.

Description: Brief summary or description of each episode's plot.

Director: Director(s) responsible for directing an episode.

Writers: Writer(s) credited for writing an episode.

Stars : Actor(s) who starred in an individual episode.

Exploring Episode Data

The dataset allows you to explore various aspects of individual episodes as well as broader trends throughout different seasons:

1. Analyzing Ratings:

- You can examine how ratings vary across seasons using aggregation functions like average (AVG), minimum (MIN), maximum (MAX), etc., depending on your analytical goals. - Identify popular episodes by sorting based on highest ratings or most votes received.

2.Trends over Time:

- Investigate how ratings have changed over time by visualizing them using line charts or bar graphs based on release years or seasons. - Examine if there are any significant fluctuations in ratings across different seasons or years.

3. Directors and Writers:

- Identify episodes directed by a specific director or written by particular writers by filtering the dataset based on their names. - Analyze the impact of different directors or writers on episode ratings.

4. Popular Actors:

- Explore episodes featuring popular actors from the show such as Mariska Hargitay (Olivia Benson), Christopher Meloni (Elliot Stabler), etc. - Investigate whether episodes with popular actors received higher ratings compared to ...
Global Land and Surface Temperature Trends
kaggle.com
zip
Updated Jan 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Global Land and Surface Temperature Trends [Dataset]. https://www.kaggle.com/datasets/thedevastator/global-land-and-surface-temperature-trends-analy
Explore at:
zip(16000936 bytes)Available download formats
Dataset updated
Jan 11, 2023
Authors
The Devastator
Description
Global Land and Surface Temperature Trends Analysis

Assessing climate change year by year

By IBM Watson AI XPRIZE - Environment [source]

About this dataset

This dataset from Kaggle contains global land and surface temperature data from major cities around the world. By relying on the raw temperature reports that form the foundation of their averaging system, researchers are able to accurately track climate change over time. With this dataset, we can observe monthly averages and create detailed gridded temperature fields to analyze localized data on a country-by-country basis. The information in this dataset has allowed us to gain a better understanding of our changing planet and how certain regions are being impacted more than others by climate change. With such insights, we can look towards developing better responses and strategies as our temperatures continue to increase over time

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Introduction

This guide will show you how to use this dataset to explore global climate change trends over time.

Exploring the Dataset

Select one or more countries by using df[df['Country']=='countryname'] command in order to filter out any unnecessary information that is not related to those countries;

Use df.groupby('City')['AverageTemperature'] command in order to group all cities together with their respective average temperatures;

Compute basic summary statistics such as mean or median for each group with df['AverageTemperature'].{mean(),median()}, where {} can be replaced with mean or median according various statistic requirements;

4 .Plot a graph comparing these results from line plots or bar charts with pandas plot function such as df[column].plot(kind='line'/'bar'), etc., which can help visualize certain trends associated form these groups

You can also use latitude/longitude coordinates provided alongwith every record further decompose records by location using folium library within python such as folium maps that provide visualization features & zoomable maps alongwith many other rendering options within them like mapping locations according different color shades & size based on different parameters given.. These are just some ways you could visualize your data! There are plenty more possibilities!

Research Ideas

Analyzing temperature changes across different countries to identify regional climate trends and abnormalities.

Investigating how global warming is affecting urban areas by looking at the average temperatures of major cities over time.

Comparing historic average temperatures for a given region to current day average temperatures to quantify the magnitude of global warming in that region.

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: GlobalLandTemperaturesByCountry.csv | Column name | Description | |:----------------------------------|:--------------------------------------------------------------| | dt | Date of the temperature measurement. (Date) | | AverageTemperature | Average temperature for the given date. (Float) | | AverageTemperatureUncertainty | Uncertainty of the average temperature measurement. (Float) | | Country | Country where the temperature measurement was taken. (String) |

File: GlobalLandTemperaturesByMajorCity.csv | Column name | Description | |:----------------------------------|:-----------------------------------------------------------------------| | dt | Date...
HBO and HBO Max Content Dataset
kaggle.com
zip
Updated Dec 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). HBO and HBO Max Content Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/hbo-and-hbo-max-content-dataset
Explore at:
zip(100874 bytes)Available download formats
Dataset updated
Dec 3, 2023
Authors
The Devastator
Description
HBO and HBO Max Content Dataset

Genres, Ratings, and Platforms of HBO and HBO Max TV Shows and Movies

By Hunter Kempf [source]

About this dataset

The dataset HBO Content provides comprehensive information on the TV shows and movies available on HBO and HBO Max. It contains details about various aspects of the content, such as the title, type (whether it's a TV show or a movie), year of release, rating (indicating appropriate age group), IMDb score (a measure of popularity and quality), rotten_score (if available), decade (the decade in which the content was released), and IMDb score bucket (categorizing popularity range).

Additionally, it includes binary values indicating whether the content belongs to specific genres such as Action/Adventure, Animation, Biography, Children, Comedy Crime, Cult Documentary Drama Family Fantasy Food Game Show History Horror Independent LGBTQ Musical Mystery Reality Romance Science Fiction Sport Stand-up/Talk Thriller Travel. These genre indicators allow users to filter content based on their preferences.

The dataset also provides information about various platforms where the content can be accessed. These platforms include Acorntv Amazon Prime Cinemax Epix Fandor Free Fubo TV HBO HBO Max Hoopla Hulu Plus Kanopy Netflix Shout Factory TV Sundance Now Syfy TV Everywhere TLC Go Viceland TV Everywere Adult Swim TV Everywhere AMC AMC Premiere BBC America TVE BritBox Cartoon Network CBS All Access Comedy Central TVE Criterion Channel Crunchyroll Premium CuriosityStream DC Universe Funimation NBC TVEverywhere Showtime Shudder Starz TNT truTV TVEverywhere Urban Movie Channel Velocity Go Watch TCM and TBS.

The availability of each platform is indicated by binary values for each platform column. If a value is 1 (true) for a particular platform column, it means that the content is available on that platform.

This comprehensive dataset captures vital information about HBO's extensive library of TV shows and movies. It not only helps users discover content according to their preferred genres but also allows them to determine which platforms offer access to their desired titles. The IMDb score and rating further aid in making informed decisions about the popularity and appropriateness of the content

How to use the dataset

Understanding the Columns: Familiarize yourself with the columns in the dataset to comprehend what each column represents. The column names are self-explanatory and provide information about various aspects of the content like title, type (TV show or movie), year of release, rating, IMDb score, rotten score (if available), decade it belongs to, genres it falls into (like Action/Adventure or Drama), and platforms where it can be accessed.

Exploring Genres: The dataset includes several genre-related columns such as genres_Action_Adventure, genres_Drama,, genres_Thriller etc. You can analyze these columns to identify trends in popular genres among HBO and HBO Max content.

Genre-based Filtering: If you're interested in a specific genre such as Action/Adventure or Documentary content available on these platforms, you can use boolean filtering by selecting rows that have values set for corresponding genre columns.

Platform Availability: The dataset provides information about which platforms offer access to each content item through various platform-related columns like platforms_hbo_max ,platforms_netflix etc.). You can filter data based on platform availability if you only want to explore shows or movies accessible through certain platforms.

Ratings Analysis: Use the rating column for analyzing content suitable for specific age groups or audience preferences.

IMDb Scores: The imdb_score column contains IMDb ratings ranging from 0-10 for each TV show/movie included in this dataset. You can analyze this field across different dimensions like average scores per genre/platform/year etc., identify highly-rated titles within specific categories using boolean filtering.

Data Visualization: Visualize the dataset using charts or graphs to gain insights visually and interpret trends more effectively. You can create bar charts, pie charts, scatter plots, line graphs, or any other visualization technique that suits your analysis requirements.

Combining Datasets: If you have similar datasets from other platforms or services like Netflix or Amazon Prime Video, you can combine them with this dataset to perform comparative analyses across different streaming platforms.

Predictive Analysis: Use various machine learning algorithms such as regression models, classification models, or clustering algorithms to explore patterns and pred...
Adidas_Sales_Analysis
kaggle.com
zip
Updated Mar 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archis Rudra (2023). Adidas_Sales_Analysis [Dataset]. https://www.kaggle.com/datasets/archisrudra/adidas-sales-analysis/versions/1
Explore at:
zip(1863030 bytes)Available download formats
Dataset updated
Mar 11, 2023
Authors
Archis Rudra
Description
Portfolio_Adidas_Dataset A set of real-world dataset tasks is completed by using the Python Pandas and Matplotlib libraries.

Background Information: In this portfolio, we use Python Pandas & Python Matplotlib to analyze and answer business questions about 5 products worth of sales data. The data contains hundreds of thousands of footwear store purchases broken down by product type, cost, region, state, city, and so on.

We start by cleaning our data. Tasks during this section include:

Drop NaN values from DataFrame

Removing column based on a condition

Changing the column name

Removing rows based on a condition

Reindexing rows based on a condition

Adding Month and Year column (to_datetime)

Conversion of data types from string to integer (to_numeric)

Once we have cleaned up our data a bit, we move to the data exploration section. In this section we explore 5 high-level business questions related to our data:

What was the highest number of sales in which year?

What product sold the most? Why do you think it sold the most?

What was the average price for each product? And the overall average price of all products?

What was the best retailer for sales? How much was earned that retailers?

What method is most efficient for sales?

To answer these questions we walk through many different openpyxl, pandas, and matplotlib methods. They include:

Using groupby to perform aggregate analysis

Plotting bar charts, lines graphs, and pie charts to visualize our results

Labeling our graphs
COVID-19 Global Case and Death Data
kaggle.com
zip
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). COVID-19 Global Case and Death Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/covid-19-global-case-and-death-data
Explore at:
zip(81724234 bytes)Available download formats
Dataset updated
Dec 4, 2023
Authors
The Devastator
Description
COVID-19 Global Case and Death Data

Global COVID-19 Cases and Deaths Over Time

By Coronavirus (COVID-19) Data Hub [source]

About this dataset

The COVID-19 Global Time Series Case and Death Data is a comprehensive collection of global COVID-19 case and death information recorded over time. This dataset includes data from various sources such as JHU CSSE COVID-19 Data and The New York Times.

The dataset consists of several columns providing detailed information on different aspects of the COVID-19 situation. The COUNTRY_SHORT_NAME column represents the short name of the country where the data is recorded, while the Data_Source column indicates the source from which the data was obtained.

Other important columns include Cases, which denotes the number of COVID-19 cases reported, and Difference, which indicates the difference in case numbers compared to the previous day. Additionally, there are columns such as CONTINENT_NAME, DATA_SOURCE_NAME, COUNTRY_ALPHA_3_CODE, COUNTRY_ALPHA_2_CODE that provide additional details about countries and continents.

Furthermore, this dataset also includes information on deaths related to COVID-19. The column PEOPLE_DEATH_NEW_COUNT shows the number of new deaths reported on a specific date.

To provide more context to the data, certain columns offer demographic details about locations. For instance, Population_Count provides population counts for different areas. Moreover,**FIPS** code is available for provincial/state regions for identification purposes.

It is important to note that this dataset covers both confirmed cases (Case_Type: confirmed) as well as probable cases (Case_Type: probable). These classifications help differentiate between various types of COVID-19 infections.

Overall, this dataset offers a comprehensive picture of global COVID-19 situations by providing accurate and up-to-date information on cases, deaths, demographic details like population count or FIPS code), source references (such as JHU CSSE or NY Times), geographical information (country names coded with ALPHA codes) , etcetera making it useful for researchers studying patterns and trends associated with this pandemic

How to use the dataset

Understanding the Dataset Structure:

The dataset is available in two files: COVID-19 Activity.csv and COVID-19 Cases.csv.

Both files contain different columns that provide information about the COVID-19 cases and deaths.

Some important columns to look out for are: a. PEOPLE_POSITIVE_CASES_COUNT: The total number of confirmed positive COVID-19 cases. b. COUNTY_NAME: The name of the county where the data is recorded. c. PROVINCE_STATE_NAME: The name of the province or state where the data is recorded. d. REPORT_DATE: The date when the data was reported. e. CONTINENT_NAME: The name of the continent where the data is recorded. f. DATA_SOURCE_NAME: The name of the data source. g. PEOPLE_DEATH_NEW_COUNT: The number of new deaths reported on a specific date. h.COUNTRY_ALPHA_3_CODE :The three-letter alpha code represents country f.Lat,Long :latitude and longitude coordinates represent location i.Country_Region or COUNTRY_SHORT_NAME:The country or region where cases were reported.

Choosing Relevant Columns: It's important to determine which columns are relevant to your analysis or research question before proceeding with further analysis.

Exploring Data Patterns: Use various statistical techniques like summarizing statistics, creating visualizations (e.g., bar charts, line graphs), etc., to explore patterns in different variables over time or across regions/countries.

Filtering Data: You can filter your dataset based on specific criteria using column(s) such as COUNTRY_SHORT_NAME, CONTINENT_NAME, or PROVINCE_STATE_NAME to focus on specific countries, continents, or regions of interest.

Combining Data: You can combine data from different sources (e.g., COVID-19 cases and deaths) to perform advanced analysis or create insightful visualizations.

Analyzing Trends: Use the dataset to analyze and identify trends in COVID-19 cases and deaths over time. You can examine factors such as population count, testing count, hospitalization count, etc., to gain deeper insights into the impact of the virus.

Comparing Countries/Regions: Compare COVID-19

Research Ideas

Trend Analysis: This dataset can be used to analyze and track the trends of COVID-19 cases and deaths over time. It provides comprehensive global data, allowing researchers and po...

Facebook

Twitter

Click to copy link

Link copied

Cite

Chiamaka Ndubuisi (2025). Super Market dataset [Dataset]. https://www.kaggle.com/datasets/chiamakandubuisi/super-market-dataset

Super Market dataset

Explore at:

zip(215497 bytes)Available download formats

Dataset updated

Nov 4, 2025

Authors

Chiamaka Ndubuisi

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Problem Statements for Data Visualization – Supermarket Sales Dataset 1. Sales Performance Across Branches Management wants to understand how sales performance varies across supermarket branches in Lagos, Abuja, Ogun, and Port Harcourt to identify the best-performing locations and areas that need improvement. • Suggested Visualizations: • Bar chart comparing total sales and profit by branch • Map chart showing sales by city • KPI cards: Total Sales, Profit, and Average Transaction Value per branch 2. Customer Purchase Behavior The marketing team needs insights into how different customer types (Member vs Normal) and genders influence purchase trends and average spending. • Suggested Visualizations: • Pie chart for customer type distribution • Bar chart for average spend by gender • Segmented comparison of total sales by customer type 3. Product Line Performance The business wants to know which product categories drive the highest revenue, quantity sold, and customer satisfaction to optimize stock levels and marketing focus. • Suggested Visualizations: • Bar chart showing total sales by product line • Column chart comparing average rating per product line • Profit margin chart by product line 4. Sales Trends Over Time The management team wants to monitor sales trends over time to identify peak periods, track seasonal variations, and plan future promotions accordingly. • Suggested Visualizations: • Line chart showing monthly or weekly sales trend • Seasonal decomposition (sales by month) • Trendline showing revenue growth 5. Payment Method Analysis The finance department needs to evaluate payment method usage (Cash, E-wallet, Credit Card) across cities to improve payment convenience and reduce transaction delays. • Suggested Visualizations: • Donut or bar chart showing share of payment methods • City-level breakdown of preferred payment type • Correlation between payment method and average transaction value 6. Customer Satisfaction Insights The customer experience team wants to explore how customer ratings relate to sales amount, product type, and branch performance to identify drivers of customer satisfaction. • Suggested Visualizations: • Scatter plot of rating vs total purchase amount • Heat map of average rating by branch and product line • KPI card showing average customer rating

Clear search

Close search

Google apps

Main menu

Super Market dataset

Car-Sales-Analysis-Excel-Dashboard

User Profile for Ads Project in Power BI

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Tableau Dummy Dataset for Practice

Summary for Policymakers of the Working Group I Contribution to the IPCC...

Classification of web-based Digital Humanities projects leveraging...

Description

Classification schema: categories and columns

DATS 6401 - Final Project - Yon ho Cheong.zip

Ecommerce Visualization

Power BI Sales Data

Myntra Dataset Analysis

Bank Loan Analysis_Excel

Customer Sale Dataset for Data Visualization

dataset_for_sales

Restaurant Dish Orders in Power BI

Law and Order TV Series Dataset

Law and Order TV Series Dataset

Law and Order TV Series Data

About this dataset

How to use the dataset

Understanding the Columns

Exploring Episode Data

1. Analyzing Ratings:

2.Trends over Time:

3. Directors and Writers:

4. Popular Actors:

Global Land and Surface Temperature Trends

Global Land and Surface Temperature Trends Analysis

Assessing climate change year by year

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Introduction

Exploring the Dataset

Research Ideas

Acknowledgements

License

Columns

HBO and HBO Max Content Dataset

HBO and HBO Max Content Dataset

Genres, Ratings, and Platforms of HBO and HBO Max TV Shows and Movies

About this dataset

How to use the dataset

Adidas_Sales_Analysis

COVID-19 Global Case and Death Data

COVID-19 Global Case and Death Data

Global COVID-19 Cases and Deaths Over Time

About this dataset

How to use the dataset

Research Ideas

Super Market dataset