41 datasets found
  1. Tableau Dummy Dataset for Practice

    • kaggle.com
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piush Dave (2025). Tableau Dummy Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/piyushdave/tableau-dummy-dataset-for-practice
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Piush Dave
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Domain-Specific Dataset and Visualization Guide

    This package contains 20 realistic datasets in CSV format across different industries, along with 20 text files suggesting visualization ideas. Each dataset includes about 300 rows of synthetic but domain-appropriate data. They are designed for data analysis, visualization practice, machine learning projects, and dashboard building.

    What’s inside

    • 20 CSV files, one for each domain:

      1. Education
      2. E-Commerce
      3. Healthcare
      4. Finance
      5. Retail
      6. Social Media
      7. Manufacturing
      8. Sports
      9. Transport
      10. Hospitality
      11. Telecom
      12. Banking
      13. Real Estate
      14. Gaming
      15. Agriculture
      16. Automobile
      17. Energy
      18. Insurance
      19. Government
      20. Entertainment

    20 TXT files, each listing 10 relevant graphing options for the dataset.

    MASTER_INDEX.csv, which summarizes all domains with their column names.

    Use cases

    • Practice data cleaning, exploration, and visualization in Excel, Tableau, Power BI, or Python.
    • Build dashboards for specific industries.
    • Train beginner-level machine learning models such as classification and regression.
    • Use in classroom teaching or workshops as ready-made datasets.

    Example

    • Education dataset has columns like StudentName, Class, Subject, Marks, AttendancePercent. Suggested graphs: bar chart of average marks by subject, scatter plot of marks vs attendance percent, line chart of attendance over time.

    • E-Commerce dataset has columns like OrderDate, Product, Category, Price, Quantity, Total. Suggested graphs: line chart of revenue trend, bar chart of revenue by category, pie chart of payment mode share.

  2. Super Market dataset

    • kaggle.com
    zip
    Updated Nov 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chiamaka Ndubuisi (2025). Super Market dataset [Dataset]. https://www.kaggle.com/datasets/chiamakandubuisi/super-market-dataset
    Explore at:
    zip(215497 bytes)Available download formats
    Dataset updated
    Nov 4, 2025
    Authors
    Chiamaka Ndubuisi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Problem Statements for Data Visualization – Supermarket Sales Dataset 1. Sales Performance Across Branches Management wants to understand how sales performance varies across supermarket branches in Lagos, Abuja, Ogun, and Port Harcourt to identify the best-performing locations and areas that need improvement. • Suggested Visualizations: • Bar chart comparing total sales and profit by branch • Map chart showing sales by city • KPI cards: Total Sales, Profit, and Average Transaction Value per branch 2. Customer Purchase Behavior The marketing team needs insights into how different customer types (Member vs Normal) and genders influence purchase trends and average spending. • Suggested Visualizations: • Pie chart for customer type distribution • Bar chart for average spend by gender • Segmented comparison of total sales by customer type 3. Product Line Performance The business wants to know which product categories drive the highest revenue, quantity sold, and customer satisfaction to optimize stock levels and marketing focus. • Suggested Visualizations: • Bar chart showing total sales by product line • Column chart comparing average rating per product line • Profit margin chart by product line 4. Sales Trends Over Time The management team wants to monitor sales trends over time to identify peak periods, track seasonal variations, and plan future promotions accordingly. • Suggested Visualizations: • Line chart showing monthly or weekly sales trend • Seasonal decomposition (sales by month) • Trendline showing revenue growth 5. Payment Method Analysis The finance department needs to evaluate payment method usage (Cash, E-wallet, Credit Card) across cities to improve payment convenience and reduce transaction delays. • Suggested Visualizations: • Donut or bar chart showing share of payment methods • City-level breakdown of preferred payment type • Correlation between payment method and average transaction value 6. Customer Satisfaction Insights The customer experience team wants to explore how customer ratings relate to sales amount, product type, and branch performance to identify drivers of customer satisfaction. • Suggested Visualizations: • Scatter plot of rating vs total purchase amount • Heat map of average rating by branch and product line • KPI card showing average customer rating

  3. Petre_Slide_CategoricalScatterplotFigShare.pptx

    • figshare.com
    pptx
    Updated Sep 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Sep 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Benj Petre; Aurore Coince; Sophien Kamoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Categorical scatterplots with R for biologists: a step-by-step guide

    Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

    1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

    Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

    Protocol

    • Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

    • Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

    • Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

    Notes

    • Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

    • Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

    7 Display the graph in a separate window. Dot colors indicate

    replicates

    graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

    References

    Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

    Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

    Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

    https://cran.r-project.org/

    http://ggplot2.org/

  4. Car-Sales-Analysis-Excel-Dashboard

    • kaggle.com
    zip
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahimryk (2025). Car-Sales-Analysis-Excel-Dashboard [Dataset]. https://www.kaggle.com/datasets/ibrahimryk/car-sales-analysis-excel-dashboard/code
    Explore at:
    zip(496747 bytes)Available download formats
    Dataset updated
    Feb 11, 2025
    Authors
    Ibrahimryk
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    his project involves the creation of an interactive Excel dashboard for SwiftAuto Traders to analyze and visualize car sales data. The dashboard includes several visualizations to provide insights into car sales, profits, and performance across different models and manufacturers. The project makes use of various charts and slicers in Excel for the analysis.

    Objective: The primary goal of this project is to showcase the ability to manipulate and visualize car sales data effectively using Excel. The dashboard aims to provide:

    Profit and Sales Analysis for each dealer. Sales Performance across various car models and manufacturers. Resale Value Analysis comparing prices and resale values. Insights into Retention Percentage by car models. Files in this Project: Car_Sales_Kaggle_DV0130EN_Lab3_Start.xlsx: The original dataset used to create the dashboard. dashboards.xlsx: The final Excel file that contains the complete dashboard with interactive charts and slicers. Key Visualizations: Average Price and Year Resale Value: A bar chart comparing the average price and resale value of various car models. Power Performance Factor: A column chart displaying the performance across different car models. Unit Sales by Model: A donut chart showcasing unit sales by car model. Retention Percentage: A pie chart illustrating customer retention by car model. Tools Used: Microsoft Excel for creating and organizing the visualizations and dashboard. Excel Slicers for interactive filtering. Charts: Bar charts, pie charts, column charts, and sunburst charts. How to Use: Download the Dataset: You can download the Car_Sales_Kaggle_DV0130EN_Lab3_Start.xlsx file from Kaggle and follow the steps to create a similar dashboard in Excel. Open the Dashboard: The dashboards.xlsx file contains the final version of the dashboard. Simply open it in Excel and start exploring the interactive charts and slicers.

  5. Ecommerce Visualization

    • kaggle.com
    zip
    Updated Feb 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Alok Yadav at YBI Foundation (2023). Ecommerce Visualization [Dataset]. https://www.kaggle.com/datasets/ybifoundation/ecommerce-visualization
    Explore at:
    zip(7240238 bytes)Available download formats
    Dataset updated
    Feb 26, 2023
    Authors
    Dr. Alok Yadav at YBI Foundation
    Description

    The Ecommerce transaction analysis is one of great way to learn data visualization with Power BI or Tableau. Your visualization must reveals customer sales, product sales, regional sales, monthly sales, time of the day sales to gain valuable insights and business planning. You may use Combo Charts, Cards, Bar Charts, Tables, or Line Charts; for the customer segmentation page, you could employ Column Charts, Bubble Charts, Point Maps, Tables, etc.

  6. Summary for Policymakers of the Working Group I Contribution to the IPCC...

    • catalogue.ceda.ac.uk
    • data-search.nerc.ac.uk
    Updated Mar 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joeri Rogelj; Chris Smith; Gian-Kasper Plattner; Malte Meinshausen; Sophie Szopa; Sebastian Milinski; Jochem Marotzke (2024). Summary for Policymakers of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure SPM.4 (v20210809) [Dataset]. https://catalogue.ceda.ac.uk/uuid/bd65331b1d344ccca44852e495d3a049
    Explore at:
    Dataset updated
    Mar 9, 2024
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Joeri Rogelj; Chris Smith; Gian-Kasper Plattner; Malte Meinshausen; Sophie Szopa; Sebastian Milinski; Jochem Marotzke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2015 - Dec 31, 2100
    Area covered
    Earth
    Description

    Data for Figure SPM.4 from the Summary for Policymakers (SPM) of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

    Figure SPM.4 panel a shows global emissions projections for CO2 and a set of key non-CO2 climate drivers, for the core set of five IPCC AR6 scenarios. Figure SPM.4 panel b shows attributed warming in 2081-2100 relative to 1850-1900 for total anthropogenic, CO2, other greenhouse gases, and other anthropogenic forcings for five Shared Socio-economic Pathway (SSP) scenarios.

    How to cite this dataset

    When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates:

    IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 3−32, doi:10.1017/9781009157896.001.

    Figure subpanels

    The figure has two panels, with data provided for all panels in subdirectories named panel_a and panel_b.

    List of data provided

    This dataset contains:

    • Projected emissions from 2015 to 2100 for the five scenarios of the AR6 WGI core scenario set (SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5)
    • Projected warming for all anthropogenic forcers, CO2 only, non-CO2 greenhouse gases (GHGs) only, and other anthropogenic components for 2081-2100 relative to 1850-1900, for SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0 and SSP5-8.5.

    The five illustrative SSP (Shared Socio-economic Pathway) scenarios are described in Box SPM.1 of the Summary for Policymakers and Section 1.6.1.1 of Chapter 1.

    Data provided in relation to figure

    Panel a:

    The first column includes the years, while the next columns include the data per scenario and per climate forcer for the line graphs.

    • Data file: Carbon_dioxide_Gt_CO2_yr.csv. relates to Carbon dioxide emissions panel
    • Data file: Methane_Mt_CO2_yr.csv. relates to Methane emissions panel
    • Data file: Nitrous_oxide_Mt N2O_yr.csv. relates to Nitrous oxide emissions panel
    • Data file: Sulfur_dioxide_Mt SO2_yr.csv. relates to Sulfur dioxide emissions panel

      Panel b:

    • Data file: ts_warming_ranges_1850-1900_base_panel_b.csv. [Rows 2 to 5 relate to the first bar chart (cyan). Rows 6 to 9 relate to the second bar chart (blue). Rows 10 to 13 relate to the third bar chart (orange). Rows 14 to 17 relate to the fourth bar chart (red). Rows 18 to 21 relate to the fifth bar chart (brown).].

    Sources of additional information

    The following weblink are provided in the Related Documents section of this catalogue record: - Link to the report webpage, which includes the report component containing the figure (Summary for Policymakers) and the Supplementary Material for Chapter 1, which contains details on the input data used in Table 1.SM.1..(Cross-Chapter Box 1.4, Figure 2). - Link to related publication for input data used in panel a.

  7. DATS 6401 - Final Project - Yon ho Cheong.zip

    • figshare.com
    zip
    Updated Dec 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yon ho Cheong (2018). DATS 6401 - Final Project - Yon ho Cheong.zip [Dataset]. http://doi.org/10.6084/m9.figshare.7471007.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 15, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yon ho Cheong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau

  8. Heart_disease_patients_details

    • kaggle.com
    zip
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luv Harish Khati (2021). Heart_disease_patients_details [Dataset]. https://www.kaggle.com/luvharishkhati/heart-disease-patients-details
    Explore at:
    zip(3371 bytes)Available download formats
    Dataset updated
    Jul 22, 2021
    Authors
    Luv Harish Khati
    Description

    Hello all, this dataset involves various factors effecting cancer and based upon those factors, I have created a Histogram of various columns of the table which leads to heart disease. A histogram is a bar graph-like representation of data that buckets a range of outcomes into columns along the x-axis. The y-axis represents the number count or percentage of occurrences in the data for each column and can be used to visualize data distribution. At last I have created combined histogram of entire table which involves all the columns. Giving Titles, X-axis name, Y-axis name, Sizes and Colors is also done in this notebook.

  9. Law and Order TV Series Dataset

    • kaggle.com
    zip
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Law and Order TV Series Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/law-and-order-tv-series-dataset
    Explore at:
    zip(1443584 bytes)Available download formats
    Dataset updated
    Dec 8, 2023
    Authors
    The Devastator
    Description

    Law and Order TV Series Dataset

    Law and Order TV Series Data

    By Gove Allen [source]

    About this dataset

    The Law and Order Dataset is a comprehensive collection of data related to the popular television series Law and Order that aired from 1990 to 2010. This dataset, compiled by IMDB.com, provides detailed information about each episode of the show, including its title, summary, airdate, director, writer, guest stars, and IMDb rating.

    With over 450 episodes spanning 20 seasons of the original series as well as its spin-offs like Law and Order: Special Victims Unit, this dataset offers a wealth of information for analyzing various facets of criminal justice and law enforcement portrayed in the show. Whether you are a student or researcher studying crime-related topics or simply an avid fan interested in exploring behind-the-scenes details about your favorite episodes or actors involved in them, this dataset can be a valuable resource.

    By examining this extensive collection of data using SQL queries or other analytical techniques, one can gain insights into patterns such as common tropes used in different seasons or characters that appeared most frequently throughout the series. Additionally, researchers can investigate correlations between factors like episode directors/writers and their impact on viewer ratings.

    This dataset allows users to dive deep into analyzing aspects like crime types covered within episodes (e.g., homicide cases versus white-collar crimes), how often certain guest stars made appearances (including famous actors who had early roles on the show), or which writers/directors contributed most consistently high-rated episodes. Such analyses provide opportunities for uncovering trends over time within Law and Order's narrative structure while also shedding light on societal issues addressed by the series.

    By making this dataset available for educational purposes at collegiate levels specifically aimed at teaching SQL skills—a powerful tool widely used in data analysis—the intention is to empower students with real-world examples they can explore hands-on while honing their database querying abilities. The graphical representation accompanying this dataset further enhances understanding by providing visualizations that illustrate key relationships between different variables.

    Whether you are a seasoned data analyst, a budding criminologist, or simply looking to understand the intricacies of one of the most successful crime dramas in television history, the Law and Order Dataset offers you a vast array of information ripe for exploration and analysis

    How to use the dataset

    Understanding the Columns

    Before diving into analyzing the data, it's important to understand what each column represents. Here is an overview:

    • Episode: The episode number within its respective season.
    • Title: The title of each episode.
    • Season: The season number in which each episode belongs.
    • Year: The year in which each episode was released.
    • Rating: IMDB rating for each episode (on a scale from 0-10).
    • Votes: Number of votes received by each episode on IMDB.
    • Description: Brief summary or description of each episode's plot.
    • Director: Director(s) responsible for directing an episode.
    • Writers: Writer(s) credited for writing an episode.
    • Stars : Actor(s) who starred in an individual episode.

    Exploring Episode Data

    The dataset allows you to explore various aspects of individual episodes as well as broader trends throughout different seasons:

    1. Analyzing Ratings:

    - You can examine how ratings vary across seasons using aggregation functions like average (AVG), minimum (MIN), maximum (MAX), etc., depending on your analytical goals.
    - Identify popular episodes by sorting based on highest ratings or most votes received.
    

    2.Trends over Time:

    - Investigate how ratings have changed over time by visualizing them using line charts or bar graphs based on release years or seasons.
    - Examine if there are any significant fluctuations in ratings across different seasons or years.
    

    3. Directors and Writers:

    - Identify episodes directed by a specific director or written by particular writers by filtering the dataset based on their names.
    - Analyze the impact of different directors or writers on episode ratings.
    

    4. Popular Actors:

    - Explore episodes featuring popular actors from the show such as Mariska Hargitay (Olivia Benson), Christopher Meloni (Elliot Stabler), etc.
    - Investigate whether episodes with popular actors received higher ratings compared to ...
    
  10. Global Land and Surface Temperature Trends

    • kaggle.com
    zip
    Updated Jan 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Global Land and Surface Temperature Trends [Dataset]. https://www.kaggle.com/datasets/thedevastator/global-land-and-surface-temperature-trends-analy
    Explore at:
    zip(16000936 bytes)Available download formats
    Dataset updated
    Jan 11, 2023
    Authors
    The Devastator
    Description

    Global Land and Surface Temperature Trends Analysis

    Assessing climate change year by year

    By IBM Watson AI XPRIZE - Environment [source]

    About this dataset

    This dataset from Kaggle contains global land and surface temperature data from major cities around the world. By relying on the raw temperature reports that form the foundation of their averaging system, researchers are able to accurately track climate change over time. With this dataset, we can observe monthly averages and create detailed gridded temperature fields to analyze localized data on a country-by-country basis. The information in this dataset has allowed us to gain a better understanding of our changing planet and how certain regions are being impacted more than others by climate change. With such insights, we can look towards developing better responses and strategies as our temperatures continue to increase over time

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Introduction

    This guide will show you how to use this dataset to explore global climate change trends over time.

    Exploring the Dataset

    • Select one or more countries by using df[df['Country']=='countryname'] command in order to filter out any unnecessary information that is not related to those countries;

    • Use df.groupby('City')['AverageTemperature'] command in order to group all cities together with their respective average temperatures;

    • Compute basic summary statistics such as mean or median for each group with df['AverageTemperature'].{mean(),median()}, where {} can be replaced with mean or median according various statistic requirements;

    4 .Plot a graph comparing these results from line plots or bar charts with pandas plot function such as df[column].plot(kind='line'/'bar'), etc., which can help visualize certain trends associated form these groups

    You can also use latitude/longitude coordinates provided alongwith every record further decompose records by location using folium library within python such as folium maps that provide visualization features & zoomable maps alongwith many other rendering options within them like mapping locations according different color shades & size based on different parameters given.. These are just some ways you could visualize your data! There are plenty more possibilities!

    Research Ideas

    • Analyzing temperature changes across different countries to identify regional climate trends and abnormalities.
    • Investigating how global warming is affecting urban areas by looking at the average temperatures of major cities over time.
    • Comparing historic average temperatures for a given region to current day average temperatures to quantify the magnitude of global warming in that region.

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: GlobalLandTemperaturesByCountry.csv | Column name | Description | |:----------------------------------|:--------------------------------------------------------------| | dt | Date of the temperature measurement. (Date) | | AverageTemperature | Average temperature for the given date. (Float) | | AverageTemperatureUncertainty | Uncertainty of the average temperature measurement. (Float) | | Country | Country where the temperature measurement was taken. (String) |

    File: GlobalLandTemperaturesByMajorCity.csv | Column name | Description | |:----------------------------------|:-----------------------------------------------------------------------| | dt | Date...

  11. US Recorded Music Revenue by Format

    • kaggle.com
    zip
    Updated Dec 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). US Recorded Music Revenue by Format [Dataset]. https://www.kaggle.com/thedevastator/us-recorded-music-revenue-by-format
    Explore at:
    zip(21740 bytes)Available download formats
    Dataset updated
    Dec 19, 2023
    Authors
    The Devastator
    Description

    US Recorded Music Revenue by Format

    Recorded music revenue in the US by format and week 10

    By Throwback Thursday [source]

    About this dataset

    This dataset offers a comprehensive analysis of the recorded music revenue in the United States, specifically focusing on the 10th week of the year. The data is meticulously categorized based on different formats, shedding light on the diverse ways in which music is consumed and purchased by individuals. The dataset includes key columns that provide relevant information, such as Format, Year, Units, Revenue, and Revenue (Inflation Adjusted). These columns offer valuable insights into the specific format of music being consumed or purchased, the respective year in which this data was recorded, the number of units of music sold within each format category, and both the total revenue generated from sales and its corresponding inflation-adjustment amount. By analyzing this dataset with its extensive range of information about recorded music revenue in various formats during a specific week within a given year in the United States market context can help derive meaningful patterns and trends for industry professionals to make informed decisions regarding marketing strategies or investments

    How to use the dataset

    Introduction:

    • Familiarize Yourself with Columns:

      • Format: This column categorizes how music is consumed or purchased.
      • Year: This column represents the year when each data point was recorded.
      • Units: The number of units of music sold within a particular format during a given week.
      • Revenue: The total revenue generated from sales of music within a specific format during a given week.
      • Revenue (Inflation Adjusted): The total revenue generated from sales of music adjusted for inflation within a specific format during a given week.
    • Understanding Categorical Formats: In this dataset, formats refer to different ways in which music is consumed or purchased. Examples include physical formats like CDs and vinyl records, as well as digital formats such as downloads and streaming services.

    • Analyzing Trends over Time: By exploring data across multiple years, you can identify trends and patterns related to how formats have evolved over time. Use statistical techniques or visualization tools like line graphs or bar charts to gain insights into any fluctuations or consistent growth.

    • Comparing Units Sold vs Revenue Generated: Analyze both units sold and revenue generated columns simultaneously to understand if there are any significant differences between different formats' popularity versus their financial performance.

    • Examining Adjusted Revenue for Inflation Effects: Comparison between Revenue and Revenue (Inflation Adjusted) can provide insights into whether changes in revenue are due solely to changes in purchasing power caused by inflation or influenced by other factors affecting format popularity.

    • Identifying Format Preferences: Explore how units and revenue differ across various formats to determine whether consumer preferences are shifting towards digital formats or experiencing a resurgence in physical formats like vinyl.

    • Comparing Revenue Performance Between Formats: Use statistical analysis or data visualization techniques to compare revenue performance between different formats. Identify which format generates the highest revenue and whether there have been any changes in dominance over time.

    • Supplementary Research Opportunities: Combine this dataset with external sources on music industry trends, technological advancements, or major events like album releases to gain a deeper understanding of the factors influencing recorded music sales

    Research Ideas

    • Trend analysis: This dataset can be used to analyze the trends in recorded music revenue by format over the years. By examining the revenue and units sold for each format, one can identify which formats are growing in popularity and which ones are declining.
    • Comparison of revenue vs inflation-adjusted revenue: The dataset includes both total revenue and inflation-adjusted revenue for each format. This allows for a comparison of the actual revenue generated with the potential impact of inflation on that revenue. It can provide insights into whether the increase or decrease in revenue is solely due to changes in market demand or if it is influenced by changes in purchasing power.
    • Format preference analysis: By analyzing the units sold for each format, one can identify which formats are preferred by consumers during a particular week. This information can be useful for music industry professionals and marketers to under...
  12. dataset_for_sales

    • kaggle.com
    zip
    Updated Aug 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andri Lesmana (2023). dataset_for_sales [Dataset]. https://www.kaggle.com/datasets/andrilesmana/dataset-for-sales/discussion
    Explore at:
    zip(2504483 bytes)Available download formats
    Dataset updated
    Aug 29, 2023
    Authors
    Andri Lesmana
    Description

    We start by cleaning our data. Tasks during this section include: - Drop NaN values from DataFrame - Removing rows based on a condition - Change the type of columns (to_numeric, to_datetime, astype)

    Once we have cleaned up our data a bit, we move the data exploration section. In this section we explore 5 high level business questions related to our data: - What was the best month for sales? How much was earned that month? - What city sold the most product? - What time should we display advertisemens to maximize the likelihood of customer’s buying product? - What products are most often sold together? - What product sold the most? Why do you think it sold the most?

    To answer these questions we walk through many different pandas & matplotlib methods. They include: - Concatenating multiple csvs together to create a new DataFrame (pd.concat) - Adding columns - Parsing cells as strings to make new columns (.str) - Using the .apply() method - Using groupby to perform aggregate analysis - Plotting bar charts and lines graphs to visualize our results - Labeling our graphs

  13. COVID-19 Global Case and Death Data

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). COVID-19 Global Case and Death Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/covid-19-global-case-and-death-data
    Explore at:
    zip(81724234 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    The Devastator
    Description

    COVID-19 Global Case and Death Data

    Global COVID-19 Cases and Deaths Over Time

    By Coronavirus (COVID-19) Data Hub [source]

    About this dataset

    The COVID-19 Global Time Series Case and Death Data is a comprehensive collection of global COVID-19 case and death information recorded over time. This dataset includes data from various sources such as JHU CSSE COVID-19 Data and The New York Times.

    The dataset consists of several columns providing detailed information on different aspects of the COVID-19 situation. The COUNTRY_SHORT_NAME column represents the short name of the country where the data is recorded, while the Data_Source column indicates the source from which the data was obtained.

    Other important columns include Cases, which denotes the number of COVID-19 cases reported, and Difference, which indicates the difference in case numbers compared to the previous day. Additionally, there are columns such as CONTINENT_NAME, DATA_SOURCE_NAME, COUNTRY_ALPHA_3_CODE, COUNTRY_ALPHA_2_CODE that provide additional details about countries and continents.

    Furthermore, this dataset also includes information on deaths related to COVID-19. The column PEOPLE_DEATH_NEW_COUNT shows the number of new deaths reported on a specific date.

    To provide more context to the data, certain columns offer demographic details about locations. For instance, Population_Count provides population counts for different areas. Moreover,**FIPS** code is available for provincial/state regions for identification purposes.

    It is important to note that this dataset covers both confirmed cases (Case_Type: confirmed) as well as probable cases (Case_Type: probable). These classifications help differentiate between various types of COVID-19 infections.

    Overall, this dataset offers a comprehensive picture of global COVID-19 situations by providing accurate and up-to-date information on cases, deaths, demographic details like population count or FIPS code), source references (such as JHU CSSE or NY Times), geographical information (country names coded with ALPHA codes) , etcetera making it useful for researchers studying patterns and trends associated with this pandemic

    How to use the dataset

    • Understanding the Dataset Structure:

      • The dataset is available in two files: COVID-19 Activity.csv and COVID-19 Cases.csv.
      • Both files contain different columns that provide information about the COVID-19 cases and deaths.
      • Some important columns to look out for are: a. PEOPLE_POSITIVE_CASES_COUNT: The total number of confirmed positive COVID-19 cases. b. COUNTY_NAME: The name of the county where the data is recorded. c. PROVINCE_STATE_NAME: The name of the province or state where the data is recorded. d. REPORT_DATE: The date when the data was reported. e. CONTINENT_NAME: The name of the continent where the data is recorded. f. DATA_SOURCE_NAME: The name of the data source. g. PEOPLE_DEATH_NEW_COUNT: The number of new deaths reported on a specific date. h.COUNTRY_ALPHA_3_CODE :The three-letter alpha code represents country f.Lat,Long :latitude and longitude coordinates represent location i.Country_Region or COUNTRY_SHORT_NAME:The country or region where cases were reported.
    • Choosing Relevant Columns: It's important to determine which columns are relevant to your analysis or research question before proceeding with further analysis.

    • Exploring Data Patterns: Use various statistical techniques like summarizing statistics, creating visualizations (e.g., bar charts, line graphs), etc., to explore patterns in different variables over time or across regions/countries.

    • Filtering Data: You can filter your dataset based on specific criteria using column(s) such as COUNTRY_SHORT_NAME, CONTINENT_NAME, or PROVINCE_STATE_NAME to focus on specific countries, continents, or regions of interest.

    • Combining Data: You can combine data from different sources (e.g., COVID-19 cases and deaths) to perform advanced analysis or create insightful visualizations.

    • Analyzing Trends: Use the dataset to analyze and identify trends in COVID-19 cases and deaths over time. You can examine factors such as population count, testing count, hospitalization count, etc., to gain deeper insights into the impact of the virus.

    • Comparing Countries/Regions: Compare COVID-19

    Research Ideas

    • Trend Analysis: This dataset can be used to analyze and track the trends of COVID-19 cases and deaths over time. It provides comprehensive global data, allowing researchers and po...
  14. Vitamin D Deficiency Dataset Lifestyle Data

    • kaggle.com
    zip
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent James (2025). Vitamin D Deficiency Dataset Lifestyle Data [Dataset]. https://www.kaggle.com/datasets/cabdimahomed/vitamin-d-deficiency-dataset-lifestyle-data
    Explore at:
    zip(4555951 bytes)Available download formats
    Dataset updated
    Aug 6, 2025
    Authors
    Vincent James
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This file is a dataset likely used to study or predict Vitamin D deficiency using lifestyle and demographic data from different individuals.

    🧾 Possible Columns in the Dataset (Example): Column Name Explanation Age Age of the individual Gender Male or Female BMI Body Mass Index (based on height and weight) Sun Exposure Amount of daily sunlight exposure Diet Type Type of diet followed (e.g., vegetarian, balanced) Physical Activity Level of physical exercise per day/week Vitamin D Level Blood vitamin D level (e.g., Normal, Deficient, Insufficient)

    🎯 Purpose of the Dataset: This dataset can be used to:

    Analyze how lifestyle choices impact Vitamin D levels

    Conduct health research

    Train machine learning models to predict if a person is at risk of Vitamin D deficiency

    🔬 Example Insights You Can Discover: Whether people under 30 have less sun exposure

    If females are more likely to be deficient

    How diet and physical activity affect Vitamin D levels

    ✅ What You Can Do with It: Summary statistics

    Build prediction models (e.g., using machine learning)

    Visualizations like:

    Bar graphs (e.g., deficiency by gender)

    Pie charts (e.g., distribution of vitamin D levels)

    Correlation heatmaps (e.g., link between BMI and deficiency)

  15. User Profile for Ads Project in Power BI

    • kaggle.com
    zip
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjana Murthy (2024). User Profile for Ads Project in Power BI [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/user-profile-for-ads-project-in-power-bi/code
    Explore at:
    zip(784750 bytes)Available download formats
    Dataset updated
    Jul 4, 2024
    Authors
    Sanjana Murthy
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    About Dataset:

    Domain : Marketing Project: User Profiling and Segmentation Datasets: user_profile_for_ads Dataset Type: Excel Data Dataset Size: 16k+ record

    KPI's:

    1. Distribution of Key Demographic Variables like: a. Count of Age b. Count of Gender c. Count of Education Level d. Count of Income Level e. Count of Device Usage

    2. Understanding Online Behavior like: a. Count of Time Spent Online (hrs/Weekday) b. Count of Time Spent Online (hrs/Weekend)

    3. Ad Interaction Metrics: a. Count of likes and Reactions b. Count of click through rates (CTR) c. Count of Conversion Rate d. Count of Ad Interaction Time (secs) e. Count of Ad Interaction Time by Top Interests

    Process: 1. Understanding the problem 2. Data Collection 3. Exploring and analyzing the data 4. Interpreting the results

    This data contains stacked column chart, stacked bar chart, pie chart, dashboard, slicers, page navigation button.

  16. Restaurant Dish Orders in Power BI

    • kaggle.com
    zip
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fords (2024). Restaurant Dish Orders in Power BI [Dataset]. https://www.kaggle.com/datasets/fords001/restaurant-dish-orders
    Explore at:
    zip(620177 bytes)Available download formats
    Dataset updated
    Oct 30, 2024
    Authors
    Fords
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    In this data analysis, I used the dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground .Which has a license (License: Public Domain). Public domain license work is free for use by anyone for any purpose without restriction under copyright law. Public domain is the form of open/free, since no one owns or controls the material in any way. Dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground has 3 dataframes in csv format: ‘restaurant_db_data_dictionary.csv’ as an instruction or description of the relationships between tables. ‘order_details.csv’ - it has columns order_details_id,order_id, order_date, order_time,item_id ‘menu_items.csv‘ - it has columns menu_item_id , item_name ,category ,price .

    Using 3 dataframes we will create new dataframe ‘order_details_table' (result dataframe in Power BI file restaurant_orders_result.pbix). Based on this new dataframe, we will generate various charts visualizations in the file restaurant_orders_result_charts.pbix and also attach the charts here .Below is a more detailed description of how I created the new dataframe 'order_details_table' ,and the visualizations, including bar charts and pie charts.

    I will use Power Bi in this project . 1. Delete all rows where value rows is ‘NULL’ in the column ‘item_id’ from the dataframe ‘order_details’. For this, I use Power Query Editor and the ‘Keep Rows’ function. And keep all rows except for 'NULL' values . 2. Combine 2 columns ‘order_date’ and ‘order_time’ to 1 column ‘order_date_time’ in the format MM/DD/YY HH:MM:SS 3. We also need to merge two dataframes into one dataframe ‘order_details_table’ using the ‘Merge Queries’ function in Power Query Editor and choose inner join (only matching rows). In the dataframe ‘restaurant_db_data_dictionary.csv’ we find information that column ‘item_id’ from ‘order_details’ table matches the ‘menu_item_id’ in the ‘menu_items’ table and combine 2 tables by common column id ‘menu_item_id’ and ‘item_id’ . 4. We remove the columns that we don’t need and also create a new ‘order_id’ with unique number for each order.

    As a result we have 6 columns in the new dataframe ‘order_details_table’ , such as: order_details_id: A unique identifier for each dish within an order, order_id : The unique identifier for each order or transaction , order_date_time : The date when the order was created in the format (MM/DD/YY HH:MM:SS) , menu_item_category : The category to which the dish belongs , menu_item_name : The name of the dish on the menu , menu_item_price : The price of the dish .

    Table order_detail_tables from Power BI file restaurant_orders_result.pbix https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F1098315c0e34255b67ad3419aa113bf0%2Fdataframe.png?generation=1730269164808705&alt=media" alt="">

    I have also created bar charts and pie charts to display the results from the new dataframe. These plots are included in the file ‘restaurant_orders_result_charts.pbix’ . And you can find pictures of charts below.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F4254696bbd3d7e0fc5f456c226c39114%2Fpicture_1.png?generation=1730269227195114&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F71092cf769862cf7364fe1ccac9fad83%2Fpicture_2.png?generation=1730269249147687&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F528ef51ecf21f006b0c21b65503e03fa%2Fpicture_3.png?generation=1730269284640753&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F147c240da4be5bfe9da057a8bc5d5939%2Fpicture_4.png?generation=1730269300346146&alt=media" alt="">

    I also attached the original and new files to this project, thank you.

  17. Credit Rating History Dataset

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Credit Rating History Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/credit-rating-history-dataset
    Explore at:
    zip(26498 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    The Devastator
    Description

    Credit Rating History Dataset

    Credit Rating History

    By Center for Municipal Finance [source]

    About this dataset

    The project that led to the creation of this dataset received funding from the Center for Corporate and Securities Law at the University of San Diego School of Law. The dataset itself can be accessed through a GitHub repository or on its dedicated website.

    In terms of columns contained in this dataset, it encompasses a range of variables relevant to analyzing credit ratings. However, specific details about these columns are not provided in the given information. To acquire a more accurate understanding of the column labels and their corresponding attributes or measurements present in this dataset, further exploration or referencing additional resources may be required

    How to use the dataset

    • Understanding the Data

      The dataset consists of several columns that provide essential information about credit ratings and fixed income securities. Familiarize yourself with the column names and their meanings to better understand the data:

      • Column 1: [Credit Agency]
      • Column 2: [Issuer Name]
      • Column 3: [CUSIP/ISIN]
      • Column 4: [Rating Type]
      • Column 5: [Rating Source]
      • Column 6: [Rating Date]
    • Exploratory Data Analysis (EDA)

      Before diving into detailed analysis, start by performing exploratory data analysis to get an overview of the dataset.

      • Identify Unique Values: Explore each column's unique values to understand rating agencies, issuers, rating types, sources, etc.

      • Frequency Distribution: Analyze the frequency distribution of various attributes like credit agencies or rating types to identify any imbalances or biases in the data.

    • Data Visualization

      Visualizing your data can provide insights that are difficult to derive from tabular representation alone. Utilize various visualization techniques such as bar charts, pie charts, histograms, or line graphs based on your specific objectives.

      For example:

      • Plotting a histogram of each credit agency's ratings can help you understand their distribution across different categories.
      • A time-series line graph can show how ratings have evolved over time for specific issuers or industries.
    • Analyzing Ratings Performance

      One of the main objectives of using credit rating datasets is to assess the performance and accuracy of different credit agencies. Conducting a thorough analysis can help you understand how ratings have changed over time and evaluate the consistency of each agency's ratings.

      • Rating Changes Over Time: Analyze how ratings for specific issuers or industries have changed over different periods.

      • Comparing Rating Agencies: Compare ratings from different agencies to identify any discrepancies or trends. Are there consistent differences in their assessments?

    • Detecting Rating Trends

      The dataset allows you to detect trends and correlations between various factors related to

    Research Ideas

    • Credit Rating Analysis: This dataset can be used for analyzing credit ratings and trends of various fixed income securities. It provides historical credit rating data from different rating agencies, allowing researchers to study the performance, accuracy, and consistency of these ratings over time.
    • Comparative Analysis: The dataset allows for comparative analysis between different agencies' credit ratings for a specific security or issuer. Researchers can compare the ratings assigned by different agencies and identify any discrepancies or differences in their assessments. This analysis can help in understanding variations in methodologies and improving the transparency of credit rating processes

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all ...

  18. US Tobacco Use Prevalence

    • kaggle.com
    zip
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). US Tobacco Use Prevalence [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-tobacco-use-prevalence/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(32112 bytes)Available download formats
    Dataset updated
    Dec 19, 2023
    Authors
    The Devastator
    Description

    US Tobacco Use Prevalence

    US Tobacco Use Prevalence by Year, State, Type, and Age

    By Throwback Thursday [source]

    About this dataset

    This dataset contains comprehensive information on tobacco use in the United States from 2011 to 2016. The data is sourced from the CDC Behavioral Risk Factor Survey, a reliable and extensive survey that captures important data about tobacco use behaviors across different states in the United States.

    The dataset includes various key variables such as the year of data collection, state abbreviation indicating where the data was collected, and specific tobacco types explored in the survey. It also provides valuable insight into the prevalence of tobacco use through quantitative measures represented by numeric values. The unit of measurement for these values, such as percentages or numbers, is included as well.

    Moreover, this dataset offers an understanding of how different age groups are affected by tobacco use, with age being categorized into distinct groups. This ensures that researchers and analysts can assess variations in tobacco consumption and its associated health implications across different age demographics.

    With all these informative attributes arranged in a convenient tabular format, this dataset serves as a valuable resource for investigating patterns and trends related to tobacco use within varying contexts over a six-year period

    How to use the dataset

    Introduction:

    Step 1: Familiarize Yourself with the Columns

    Before diving into any analysis, it is important to understand the structure of the dataset by familiarizing yourself with its columns. Here are the key columns in this dataset:

    • Year: The year in which the data was collected (Numeric)
    • State Abbreviation: The abbreviation of the state where the data was collected (String)
    • Tobacco Type: The type of tobacco product used (String)
    • Data Value: The percentage or number representing prevalence of tobacco use (Numeric)
    • Data Value Unit: The unit of measurement for data value (e.g., percentage, number) (String)
    • Age: The age group to which the data value corresponds (String)

    Step 2: Determine Your Research Questions or Objectives

    To make effective use of this dataset, it is essential to clearly define your research questions or objectives. Some potential research questions related to this dataset could be:

    • How has tobacco use prevalence changed over time?
    • Which states have the highest and lowest rates of tobacco use?
    • What are the most commonly used types of tobacco products?
    • Is there a correlation between age group and tobacco use?

    By defining your research questions or objectives upfront, you can focus your analysis accordingly.

    Step 3: Analyzing Trends Over Time

    To analyze trends over time using this dataset: - Group and aggregate relevant columns such as Year and Data Value. - Plot the data using line graphs or bar charts to visualize the changes in tobacco use prevalence over time. - Interpret the trends and draw conclusions from your analysis.

    Step 4: Comparing States

    To compare states and their tobacco use prevalence: - Group and aggregate relevant columns such as State Abbreviation and Data Value. - Sort the data based on prevalence rates to identify states with the highest and lowest rates of tobacco use. - Visualize this comparison using bar charts or maps for a clearer understanding.

    Step 5: Understanding Tobacco Types

    To gain insights into different types of tobacco products used: - Analyze the Tobacco

    Research Ideas

    • Analyzing trends in tobacco use: This dataset can be used to analyze the prevalence of tobacco use over time and across different states. It can help identify patterns and trends in tobacco consumption, which can be valuable for public health research and policy-making.
    • Assessing the impact of anti-smoking campaigns: Researchers or organizations working on anti-smoking campaigns can use this dataset to evaluate the effectiveness of their interventions. By comparing the data before and after a campaign, they can determine whether there has been a decrease in tobacco use and if specific groups or regions have responded better to the campaign.
    • Understanding demographic factors related to tobacco use: The dataset includes information on age groups, allowing for analysis of how different age demographics are affected by tobacco use. By examining data value variations across age groups, researchers can gain insights into which populations are most vulnerable to smoking-related issues and design targeted prevention programs an...
  19. Myntra Dataset Analysis

    • kaggle.com
    zip
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek Singh (2024). Myntra Dataset Analysis [Dataset]. https://www.kaggle.com/datasets/vivek052/myntra-dataset-analysis
    Explore at:
    zip(18601507 bytes)Available download formats
    Dataset updated
    Sep 16, 2024
    Authors
    Vivek Singh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset contains information on nearly 150,000 products listed on Myntra. Each entry includes:

    1. product_name: The name of the product.
    2. brand_name: The brand of the product.
    3. rating: The product's rating
    4. rating_count: The number of ratings the product has received
    5. marked_price: The original price of the product
    6. discounted_price: The price after discount
    7. sizes: Available sizes for the product
    8. product_link: URL of the product
    9. img_link: URL of the product image
    10. product_tag: Tags associated with the product This data has been scraped from the Myntra website.

    Data Analysis on Myntra Dataset

    Data Analysis on Myntra dataset and represented using pivot tables and interactive dashboard

    In this data analysis project, I undertook a comprehensive approach to enhance and visualize the Myntra real-time dataset. The key steps involved in the process were as follows:

    Data Cleaning and Preparation:

    Remove Unwanted Columns: I meticulously reviewed the dataset to identify and eliminate irrelevant columns like size, discounted amount that did not contribute to the analysis objectives. This step streamlined the dataset, focusing on the most pertinent data.

    Data Cleaning: Addressed inconsistencies, missing values, and errors within the dataset. This involved standardizing data formats, correcting inaccuracies, and filling in or removing incomplete records to ensure the dataset's integrity.

    Data Analysis:

    Pivot Tables Creation: Developed pivot tables to summarize and analyze key metrics. This allowed for the aggregation of data across various dimensions such as product categories, sales performance, and customer demographics, providing insightful summaries and trends.

    Interactive Dashboard:

    Dashboard Development: Created an interactive dashboard to visualize real-time data. This dashboard includes dynamic charts, filters, and visualizations that enable users to interact with the dataset, facilitating real-time insights and decision-making. Visualization: Implemented various types of visualizations such as bar charts, column chart to effectively communicate the data trends and patterns.

    Overall, this project aimed to deliver a clean, organized, and insightful view of the Myntra dataset through advanced analysis and interactive visualization techniques. The resulting dashboard offers a powerful tool for monitoring and analyzing real-time data, supporting data-driven decision-making processes.

  20. Power BI Sales Data

    • kaggle.com
    zip
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjana Murthy (2024). Power BI Sales Data [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/power-bi-sales-data
    Explore at:
    zip(7202740 bytes)Available download formats
    Dataset updated
    May 8, 2024
    Authors
    Sanjana Murthy
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This data contains Index, Text box, Button, Slicer, Image, Card, Multi row card, Table, Matrix, Conditional Formatting, Stacked Column Chart, Clustered Column Chart, Stacked Bar chart, 100% stacked column chart, background image, Line chart, Donut Chart, Gauge, Filters & Bookmarks, Maps, Scatter Chart, Anomalies, Tooltip, Animated Bar Chart Race, Enlighten Aquarium, Scroller, Measures, Dax, All Dax, Switch Dax, Waterfall Chart, Treemap.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Piush Dave (2025). Tableau Dummy Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/piyushdave/tableau-dummy-dataset-for-practice
Organization logo

Tableau Dummy Dataset for Practice

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Piush Dave
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Domain-Specific Dataset and Visualization Guide

This package contains 20 realistic datasets in CSV format across different industries, along with 20 text files suggesting visualization ideas. Each dataset includes about 300 rows of synthetic but domain-appropriate data. They are designed for data analysis, visualization practice, machine learning projects, and dashboard building.

What’s inside

  • 20 CSV files, one for each domain:

    1. Education
    2. E-Commerce
    3. Healthcare
    4. Finance
    5. Retail
    6. Social Media
    7. Manufacturing
    8. Sports
    9. Transport
    10. Hospitality
    11. Telecom
    12. Banking
    13. Real Estate
    14. Gaming
    15. Agriculture
    16. Automobile
    17. Energy
    18. Insurance
    19. Government
    20. Entertainment

20 TXT files, each listing 10 relevant graphing options for the dataset.

MASTER_INDEX.csv, which summarizes all domains with their column names.

Use cases

  • Practice data cleaning, exploration, and visualization in Excel, Tableau, Power BI, or Python.
  • Build dashboards for specific industries.
  • Train beginner-level machine learning models such as classification and regression.
  • Use in classroom teaching or workshops as ready-made datasets.

Example

  • Education dataset has columns like StudentName, Class, Subject, Marks, AttendancePercent. Suggested graphs: bar chart of average marks by subject, scatter plot of marks vs attendance percent, line chart of attendance over time.

  • E-Commerce dataset has columns like OrderDate, Product, Category, Price, Quantity, Total. Suggested graphs: line chart of revenue trend, bar chart of revenue by category, pie chart of payment mode share.

Search
Clear search
Close search
Google apps
Main menu