Well,the data is taken form the machine hack site.It leads us to the problem of finding the traffic problems in the metro cities. It is also about how to regulate the movement of the cabs so as to get control over the traffic problems.
Modern cities are changing. The rise of vehicular traffic has been changing the design of our cities. It is very important to know how traffic moves in a city and how it changes during different times in a week. Hence it is very important to analyse and gain insights from traffic data. We invite data scientists, analysts and people from all technical interests to analyse the traffic data from Bengaluru. The data gives us some information about how traffic moves from source to destination under various circumstances. The data is sourced from Uber Movement. Uber Movement provides anonymized data from over two billion trips to help urban planning around the world.
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
This dataset had adapted from 'Credit Card Churn Prediction: https://www.kaggle.com/datasets/anwarsan/credit-card-bank-churn ' for visualization in our university project. We have modified customer information, spending behavior, and also added revenue targets.
Scenario 🕶️
In 2019, the marketing team launched a campaign to attract millennial customers (born 1980-1996) with the goal of increasing revenue and enhancing the brand's appeal to a younger audience.
As the BI team, your task is to create a dashboard for users.
1. The Vice President of Sales wants to view the performance of the credit business.
2. The marketing team is interested in understanding customer segments and customer spending to measure Customer Lifetime Value (CLV) and Marketing Cost per Acquired Customer (MCAC).
⚠️Note: This is just a suggestion to guide the creation of the dashboard
Example in Tableau
Executive summary
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10099382%2F508a2d2d89dabdfd368743f86c2a71e1%2Fexecutive%20overview.JPG?generation=1696110593484137&alt=media" alt="">
Customer behavior
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10099382%2F1e4a1f62a25eab3c6707d002243894c7%2Fcustomer_behaviour.JPG?generation=1696110689732332&alt=media" alt="">
Click on any of the images below to explore an interactive data visualization:
This dataset was created by Ankush Tiwari
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
The Hubway trip history data includes every trip taken through Nov 2013 ? with date, time, origin and destination stations, plus the bike number and more. Data from 2011/07 through 2013/11 The Hubway trip history data Every time a Hubway user checks a bike out from a station, the system records basic information about the trip. Those anonymous data points have been exported into the spreadsheet. Please note, all private data including member names have been removed from these files. What can the data tell us? The CSV file contains data for every Hubway trip from the system launch on July 28th, 2011, through the end of September, 2012. The file contains the data points listed below for each trip. We ve also posed some of the questions you could answer with this dataset - we re sure you.ll have lots more of your own. Duration - Duration of trip. What s the average trip duration for annual members vs. casual users? Start date - Includes start date and time. What are the peak Hubway hours?
This dataset was created by Ankush Tiwari
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by beautifulbanker
Released under CC0: Public Domain
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The purpose of this code is to produce a line graph visualization of COVID-19 data. This Jupyter notebook was built and run on Google Colab. This code will serve mostly as a guide and will need to be adapted where necessary to be run locally. The separate COVID-19 datasets uploaded to this Dataverse can be used with this code. This upload is made up of the IPYNB and PDF files of the code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Social data visualization with HTML5 and JavaScript : leverage the power of HTML5 and JavaScript to build compelling visualizations of social data from Twitter, Facebook, and more. It features 7 columns including author, publication date, language, and book publisher.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Power BI Sample Data is a financial sample dataset provided for Power BI practice and data visualization exercises that includes a variety of financial metrics and transaction information, including sales, profits, and expenses.
2) Data Utilization (1) Power BI Sample Data has characteristics that: • This dataset consists of numerical and categorical variables such as transaction date, region, product category, sales, profit, and cost, optimized for aggregation, analysis, and visualization. (2) Power BI Sample Data can be used to: • Revenue and Revenue Analysis: Analyze sales and profit data by region, product, and period to understand business performance and trends. • Power BI Dashboard Practice: Utilize a variety of financial metrics and transaction data to design and practice dashboards, reports, visualization charts, and more directly at Power BI.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Ryon James
Released under Apache 2.0
This site collates and visualizes critical indicators within the automotive vehicles and parts markets to enable firms to develop export strategies and identify target markets. These data include trade flows (exports and imports) of New Passenger Vehicles and Light Trucks, Medium- and Heavy-Duty Trucks, Used Vehicles, and Automotive Parts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
The level structure is harmonized according to the standards of the INSPIRE 2007/2/EC directive of 14 March 2007 starting from the regional information level "Administrative limits". the Inspire level with areal geometry consists of a hierarchical representation of the three types of administrative area present: 4thOrder (Municipality), 3rdOrder (Province), 2ndOrder (Region). The information level has been updated with a change in the toponymy of the Municipality of Ortonovo in Luni L.R. n.5/2017 and the merger of the geometries of the municipalities of Montalto Ligure and Carpasio into the new municipality Montalto Carpasio L.R. 21/2017.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Krimi Mohamed
Released under CC0: Public Domain
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
📌**Context**
The Healthcare Workforce Mental Health Dataset is designed to explore workplace mental health challenges in the healthcare industry, an environment known for high stress and burnout rates.
This dataset enables users to analyze key trends related to:
💠 Workplace Stressors: Examining the impact of heavy workloads, poor work environments, and emotional demands.
💠 Mental Health Outcomes: Understanding how stress and burnout influence job satisfaction, absenteeism, and turnover intention.
💠 Educational & Analytical Applications: A valuable resource for data analysts, students, and career changers looking to practice skills in data exploration and data visualization.
To help users gain deeper insights, this dataset is fully compatible with a Power BI Dashboard, available as part of a complete analytics bundle for enhanced visualization and reporting.
📌**Source**
This dataset was synthetically generated using the following methods:
💠 Python & Data Science Techniques: Probabilistic modeling to simulate realistic data distributions. Industry-informed variable relationships based on healthcare workforce studies.
💠 Guidance & Validation Using AI (ChatGPT): Assisted in refining dataset realism and logical mappings.
💠 Industry Research & Reports: Based on insights from WHO, CDC, OSHA, and academic studies on workplace stress and mental health in healthcare settings.
📌**Inspiration**
This dataset was inspired by ongoing discussions in healthcare regarding burnout, mental health, and staff retention. The goal is to bridge the gap between raw data and actionable insights by providing a structured, analyst-friendly dataset.
For those who want a ready-to-use reporting solution, a Power BI Dashboard Template is available, designed for interactive data exploration, workforce insights, and stress factor analysis.
📌**Important Note** This dataset is synthetic and intended for educational purposes only. It is not real-world employee data and should not be used for actual decision-making or policy implementation.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains 10,000 synthetic records simulating the migratory behavior of various bird species across global regions. Each entry represents a single bird tagged with a tracking device and includes detailed information such as flight distance, speed, altitude, weather conditions, tagging information, and migration outcomes.
The data was entirely synthetically generated using randomized yet realistic values based on known ranges from ornithological studies. It is ideal for practicing data analysis and visualization techniques without privacy concerns or real-world data access restrictions. Because it’s artificial, the dataset can be freely used in education, portfolio projects, demo dashboards, machine learning pipelines, or business intelligence training.
With over 40 columns, this dataset supports a wide array of analysis types. Analysts can explore questions like “Do certain species migrate in larger flocks?”, “How does weather impact nesting success?”, or “What conditions lead to migration interruptions?”. Users can also perform geospatial mapping of start and end locations, cluster birds by behavior, or build time series models based on migration months and environmental factors.
For data visualization, tools like Power BI, Python (Matplotlib/Seaborn/Plotly), or Excel can be used to create insightful dashboards and interactive charts.
Join the Fabric Community DataViz Contest | May 2025: https://community.fabric.microsoft.com/t5/Power-BI-Community-Blog/%EF%B8%8F-Fabric-Community-DataViz-Contest-May-2025/ba-p/4668560
The following datasets were each created and used to create the data visualizations (see https://www.lukas-grosserhode.com/). Raw data sets:
This dataset is based on train and test dataset from this competition: https://www.kaggle.com/competitions/widsdatathon2024-challenge1 .
What did I change?
1. I dropped 2 columns that contained to little data.
2. using Machine Learning I imputed "payer_type", "patient_race" and "bmi".
3. using "patient_zip3" I filled missing values in "patient_state" , "Region" and "Division"
4. using SinmpleImputer I imputed few missing numeric data in "Ozone", "PM2.5" and other columns
5. I created some new features, based on demographic features, that may be a bit more informative.
6. I tokenized the 'breast_cancer_diagnosis_desc' column
If you're interested how I did that check those notebooks: https://www.kaggle.com/code/anopsy/ml-for-missing-values for "bmi" and new features check this: https://www.kaggle.com/code/anopsy/fe-and-xgb-on-clean-data
According to the description of the original dataset, it's a "39k record dataset (split into training and test sets) representing patients and their characteristics (age, race, BMI, zip code), their diagnosis and treatment information (breast cancer diagnosis code, metastatic cancer diagnosis code, metastatic cancer treatments, … etc.), their geo (zip-code level) demographic data (income, education, rent, race, poverty, …etc), as well as toxic air quality data (Ozone, PM25 and NO2)."
In this project, I used a dataset of a company's bike sales, cleaned the data in Excel, created some pivot tables of interesting insights, and used the pivot tables to create a dashboard.
Well,the data is taken form the machine hack site.It leads us to the problem of finding the traffic problems in the metro cities. It is also about how to regulate the movement of the cabs so as to get control over the traffic problems.
Modern cities are changing. The rise of vehicular traffic has been changing the design of our cities. It is very important to know how traffic moves in a city and how it changes during different times in a week. Hence it is very important to analyse and gain insights from traffic data. We invite data scientists, analysts and people from all technical interests to analyse the traffic data from Bengaluru. The data gives us some information about how traffic moves from source to destination under various circumstances. The data is sourced from Uber Movement. Uber Movement provides anonymized data from over two billion trips to help urban planning around the world.