45 datasets found

B
Data Cleaning Sample
borealisdata.ca
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
Data Cleaning Excel Tutorial
kaggle.com
Updated Jul 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Khaled Idris (2023). Data Cleaning Excel Tutorial [Dataset]. https://www.kaggle.com/datasets/mohamedkhaledidris/data-cleaning-excel-tutorial
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohamed Khaled Idris
Description
Dataset

This dataset was created by Mohamed Khaled Idris

Contents
d
Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop
search.dataone.org
borealisdata.ca
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Costanzo, Lucia; Jadon, Vivek (2024). Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop [Dataset]. http://doi.org/10.5683/SP3/FF6AI9
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/FF6AI9
Dataset updated
Jul 31, 2024
Dataset provided by
Borealis
Authors
Costanzo, Lucia; Jadon, Vivek
Description
Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
Excel-project: Glassdoor Data Cleaning
kaggle.com
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis Lira (2023). Excel-project: Glassdoor Data Cleaning [Dataset]. https://www.kaggle.com/datasets/luisliraportfolio/excel-project-clean-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 26, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Luis Lira
Description
Dataset

This dataset was created by Luis Lira

Contents
E
Data from: Facebook Data for Sentiment Analysis
live.european-language-grid.eu
lindat.mff.cuni.cz
+1more
binary format
Updated Jul 16, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Facebook Data for Sentiment Analysis [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1057
Explore at:
binary formatAvailable download formats
Dataset updated
Jul 16, 2013
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Corpus consisting of 10,000 Facebook posts manually annotated on sentiment (2,587 positive, 5,174 neutral, 1,991 negative and 248 bipolar posts). The archive contains data and statistics in an Excel file (FBData.xlsx) and gold data in two text files with posts (gold-posts.txt) and labels (gols-labels.txt) on corresponding lines.
o
Data from: Cleaning Data with Open Refine
explore.openaire.eu
Updated Jan 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Richard Berry; Dr Luc Small; Dr Jeff Christiansen (2016). Cleaning Data with Open Refine [Dataset]. http://doi.org/10.5281/zenodo.6423839
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6423839
Dataset updated
Jan 1, 2016
Authors
Dr Richard Berry; Dr Luc Small; Dr Jeff Christiansen
Description
About this course Do you have messy data from multiple inconsistent sources, or open-responses to questionnaires? Do you want to improve the quality of your data by refining it and using the power of the internet? Open Refine is the perfect partner to Excel. It is a powerful, free tool for exploring, normalising and cleaning datasets, and extending data by accessing the internet through APIs. In this course we’ll work through the various features of Refine, including importing data, faceting, clustering, and calling remote APIs, by working on a fictional but plausible humanities research project. Learning Outcomes Download, install and run Open Refine Import data from csv, text or online sources and create projects Navigate data using the Open Refine interface Explore data by using facets Clean data using clustering Parse data using GREL syntax Extend data using Application Programming Interfaces (APIs) Export project for use in other applications Prerequisites The course has no prerequisites. Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.
popular baby names with data cleaning
kaggle.com
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Real Sourabh Singhal (2023). popular baby names with data cleaning [Dataset]. https://www.kaggle.com/datasets/realsourabhsinghal/popular-baby-names-with-data-cleaning/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Real Sourabh Singhal
Description
It completely data clean excel file to attain accurate data analysis with proper visualization
Call Center Performance MS Excel Analysis
kaggle.com
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oluwabori Abiodun-Johnson (2023). Call Center Performance MS Excel Analysis [Dataset]. https://www.kaggle.com/datasets/oluwaboriaj/call-center-dataset-analysis/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Oluwabori Abiodun-Johnson
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
--------CALL CENTER PERFORMANCE DATASET ANALYSIS--------

This is a self-guided project.

The Call Center dataset contained customer data such as caller id, customer name, date, call channel, city, state, reason for calling, call duration, e.t.c.

I tasked myself with identifying trends and patterns so as to create a summarical overview of the data which can give an overview-level understanding of the data to technical and non-technical viewers.

OBJECTIVES: Create a dashboard (using charts, slicers and KPIs) which can be used to statistically track, monitor and visualize the performance of a Call Center.

SOFTWARE TOOLS USED: Microsoft Excel

ANALYTICAL ACTIONS PERFORMED: Data Importation, Data Processing, Data Cleaning, VLOOKUP Pivot Tables Data Visualization (Dashboard creation) Connection Reporting (connecting slicers to Dashboard)
i
Agriculture Sample Census Survey 2002-2003 - Tanzania
catalog.ihsn.org
datacatalog.ihsn.org
+1more
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Bureau of Statistics (2019). Agriculture Sample Census Survey 2002-2003 - Tanzania [Dataset]. https://catalog.ihsn.org/catalog/1086
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
Office of Chief Government Statistician-Zanzibar
National Bureau of Statistics
Time period covered
2004
Area covered
Tanzania
Description
Abstract

The 2003 Agriculture Sample Census was designed to meet the data needs of a wide range of users down to district level including policy makers at local, regional and national levels, rural development agencies, funding institutions, researchers, NGOs, farmer organisations, etc. As a result the dataset is both more numerous in its sample and detailed in its scope compared to previous censuses and surveys. To date this is the most detailed Agricultural Census carried out in Africa.

The census was carried out in order to: · Identify structural changes if any, in the size of farm household holdings, crop and livestock production, farm input and implement use. It also seeks to determine if there are any improvements in rural infrastructure and in the level of agriculture household living conditions; · Provide benchmark data on productivity, production and agricultural practices in relation to policies and interventions promoted by the Ministry of Agriculture and Food Security and other stake holders. · Establish baseline data for the measurement of the impact of high level objectives of the Agriculture Sector Development Programme (ASDP), National Strategy for Growth and Reduction of Poverty (NSGRP) and other rural development programs and projects. · Obtain benchmark data that will be used to address specific issues such as: food security, rural poverty, gender, agro-processing, marketing, service delivery, etc.

Geographic coverage

Tanzania Mainland and Zanzibar

Analysis unit

Households

Individuals

Universe

Large scale, small scale and community farms.

Kind of data

Census/enumeration data [cen]

Sampling procedure

The Mainland sample consisted of 3,221 villages. These villages were drawn from the National Master Sample (NMS) developed by the National Bureau of Statistics (NBS) to serve as a national framework for the conduct of household based surveys in the country. The National Master Sample was developed from the 2002 Population and Housing Census. The total Mainland sample was 48,315 agricultural households. In Zanzibar a total of 317 enumeration areas (EAs) were selected and 4,755 agriculture households were covered. Nationwide, all regions and districts were sampled with the exception of three urban districts (two from Mainland and one from Zanzibar).

In both Mainland and Zanzibar, a stratified two stage sample was used. The number of villages/EAs selected for the first stage was based on a probability proportional to the number of villages in each district. In the second stage, 15 households were selected from a list of farming households in each selected Village/EA, using systematic random sampling, with the village chairpersons assisting to locate the selected households.

Mode of data collection

Face-to-face [f2f]

Research instrument

The census covered agriculture in detail as well as many other aspects of rural development and was conducted using three different questionnaires: • Small scale questionnaire • Community level questionnaire • Large scale farm questionnaire

The small scale farm questionnaire was the main census instrument and it includes questions related to crop and livestock production and practices; population demographics; access to services, resources and infrastructure; and issues on poverty, gender and subsistence versus profit making production unit.

The community level questionnaire was designed to collect village level data such as access and use of common resources, community tree plantation and seasonal farm gate prices.

The large scale farm questionnaire was administered to large farms either privately or corporately managed.

Questionnaire Design The questionnaires were designed following user meetings to ensure that the questions asked were in line with users data needs. Several features were incorporated into the design of the questionnaires to increase the accuracy of the data: • Where feasible all variables were extensively coded to reduce post enumeration coding error. • The definitions for each section were printed on the opposite page so that the enumerator could easily refer to the instructions whilst interviewing the farmer. • The responses to all questions were placed in boxes printed on the questionnaire, with one box per character. This feature made it possible to use scanning and Intelligent Character Recognition (ICR) technologies for data entry. • Skip patterns were used to reduce unnecessary and incorrect coding of sections which do not apply to the respondent. • Each section was clearly numbered, which facilitated the use of skip patterns and provided a reference for data type coding for the programming of CSPro, SPSS and the dissemination applications.

Cleaning operations

Data processing consisted of the following processes: · Data entry · Data structure formatting · Batch validation · Tabulation

Data Entry Scanning and ICR data capture technology for the small holder questionnaire were used on the Mainland. This not only increased the speed of data entry, it also increased the accuracy due to the reduction of keystroke errors. Interactive validation routines were incorporated into the ICR software to track errors during the verification process. The scanning operation was so successful that it is highly recommended for adoption in future censuses/surveys. In Zanzibar all data was entered manually using CSPro.

Prior to scanning, all questionnaires underwent a manual cleaning exercise. This involved checking that the questionnaire had a full set of pages, correct identification and good handwriting. A score was given to each questionnaire based on the legibility and the completeness of enumeration. This score will be used to assess the quality of enumeration and supervision in order to select the best field staff for future censuses/surveys.

CSPro was used for data entry of all Large Scale Farm and community based questionnaires due to the relatively small number of questionnaires. It was also used to enter data from the 2,880 small holder questionnaires that were rejected by the ICR extraction application.

Data Structure Formatting A program was developed in visual basic to automatically alter the structure of the output from the scanning/extraction process in order to harmonise it with the manually entered data. The program automatically checked and changed the number of digits for each variable, the record type code, the number of questionnaires in the village, the consistency of the Village ID Code and saved the data of one village in a file named after the village code.

Batch Validation A batch validation program was developed in order to identify inconsistencies within a questionnaire. This is in addition to the interactive validation during the ICR extraction process. The procedures varied from simple range checking within each variable to the more complex checking between variables. It took six months to screen, edit and validate the data from the smallholder questionnaires. After the long process of data cleaning, tabulations were prepared based on a pre-designed tabulation plan.

Tabulations Statistical Package for Social Sciences (SPSS) was used to produce the Census tabulations and Microsoft Excel was used to organize the tables and compute additional indicators. Excel was also used to produce charts while ArcView and Freehand were used for the maps.

Analysis and Report Preparation The analysis in this report focuses on regional comparisons, time series and national production estimates. Microsoft Excel was used to produce charts; ArcView and Freehand were used for maps, whereas Microsoft Word was used to compile the report.

Data Quality A great deal of emphasis was placed on data quality throughout the whole exercise from planning, questionnaire design, training, supervision, data entry, validation and cleaning/editing. As a result of this, it is believed that the census is highly accurate and representative of what was experienced at field level during the Census year. With very few exceptions, the variables in the questionnaire are within the norms for Tanzania and they follow expected time series trends when compared to historical data. Standard Errors and Coefficients of Variation for the main variables are presented in the Technical Report (Volume I).

Sampling error estimates

The Sampling Error found on page (21) up to page (22) in the Technical Report for Agriculture Sample Census Survey 2002-2003
f
Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping
figshare.com
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maryam Binti Haji Abdul Halim (2025). Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping [Dataset]. http://doi.org/10.6084/m9.figshare.28147451.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.28147451.v1
Dataset updated
Jan 6, 2025
Dataset provided by
figshare
Authors
Maryam Binti Haji Abdul Halim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This project focuses on data mapping, integration, and analysis to support the development and enhancement of six UNCDF operational applications: OrgTraveler, Comms Central, Internal Support Hub, Partnership 360, SmartHR, and TimeTrack. These apps streamline workflows for travel claims, internal support, partnership management, and time tracking within UNCDF.Key Features and Tools:Data Mapping for Salesforce CRM Migration: Structured and mapped data flows to ensure compatibility and seamless migration to Salesforce CRM.Python for Data Cleaning and Transformation: Utilized pandas, numpy, and APIs to clean, preprocess, and transform raw datasets into standardized formats.Power BI Dashboards: Designed interactive dashboards to visualize workflows and monitor performance metrics for decision-making.Collaboration Across Platforms: Integrated Google Collab for code collaboration and Microsoft Excel for data validation and analysis.
d
Data from: Data cleaning and enrichment through data integration: networking...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar (2025). Data cleaning and enrichment through data integration: networking the Italian academia [Dataset]. http://doi.org/10.5061/dryad.wpzgmsbwj
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.wpzgmsbwj
Dataset updated
Feb 25, 2025
Dataset provided by
Dryad Digital Repository
Authors
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar
Description
We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts. , The proposed network is built starting from two distinct data sources:

the entire dataset dump from Semantic Scholar (with particular emphasis on the authors and papers datasets) the entire list of Italian faculty members as maintained by Cineca (under appointment by the Italian Ministry of University and Research).

By means of a custom name-identity recognition algorithm (details are available in the accompanying paper published in Scientific Data), the names of the authors in the Semantic Scholar dataset have been mapped against the names contained in the Cineca dataset and authors with no match (e.g., because of not being part of an Italian university) have been discarded. The remaining authors will compose the nodes of the network, which have been enriched with node-related (i.e., author-related) attributes. In order to build the network edges, we leveraged the papers dataset from Semantic Scholar: specifically, any two authors are said to be connected if there is at least one pap..., , # Data cleaning and enrichment through data integration: networking the Italian academia

https://doi.org/10.5061/dryad.wpzgmsbwj

Manuscript published inÂ Scientific Data with DOI .

Description of the data and file structure

This repository contains two main data files:

edge_data_AGG.csv, the full network in comma-separated edge list format (this file contains mainly temporal co-authorship information);

Coauthorship_Network_AGG.graphml, the full network in GraphML format.Â

along with several supplementary data, listed below, useful only to build the network (i.e., for reproducibility only):

University-City-match.xlsx, an Excel file that maps the name of a university against the city where its respective headquarter is located;

Areas-SS-CINECA-match.xlsx, an Excel file that maps the research areas in Cineca against the research areas in Semantic Scholar.

Description of the main data files

TheÂ `Coauthorship_Networ...
Hospital Excel Dataset
kaggle.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omolola Labiyi (2025). Hospital Excel Dataset [Dataset]. https://www.kaggle.com/datasets/t0ut0u/hospital-excel-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Omolola Labiyi
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
📌 Project Overview This project analyzes hospital admissions, patient stays, and cost trends using Excel. The dataset contains information on patient demographics, hospital names, insurance providers, and treatment costs. Key insights were derived using PivotTables, charts, and formulas.

📊 Key Insights & Visualizations ✅ Top Hospitals by Admissions → Bar Chart ✅ Insurance Provider with Most Patients → Pie Chart ✅ Cost per Day Trends → Line Chart ✅ Average Length of Stay per Hospital → Bar Chart

🛠 Excel Analysis Techniques Used PivotTables for summarizing patient data

Conditional Formatting to highlight cost trends

Bar, Pie, and Line Charts for visualization

Statistical Analysis (Average length of stay, cost trends)

📂 Files Included 📌 hospital_analysis.xlsx – The full Excel analysis file 📌 hospital_summary.pdf – Summary of key findings

Healthcare #HospitalData #ExcelAnalysis #DataVisualization #PivotTables #DataCleaning #MedicalAnalytics #PatientTrends #CostAnalysis #AdmissionsAnalysis #InsuranceData #DataAnalysis #ExcelDashboards #HealthTech
o
Data from: Skepticism in science and punitive attitudes
openicpsr.org
delimited
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Rydberg; Luke DeZago (2025). Skepticism in science and punitive attitudes [Dataset]. http://doi.org/10.3886/E228541V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E228541V1
Dataset updated
May 4, 2025
Dataset provided by
University of Massachusetts Lowell
Authors
Jason Rydberg; Luke DeZago
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication materials for the manuscript "Skepticism in Science and Punitive Attitudes", published in the Journal of Criminal Justice.Note that the GSS repeated cross sections for 1972 to 2018 are too large to upload here, but they can be accessed from https://gss.norc.org/content/dam/gss/get-the-data/documents/spss/GSS_spss.zipIncluded here are:(A link to the repeated cross-sections data)Each of the 3 wave panels (2006-2010; 2008-2012; 2010-2014)Replication R script for the repeated cross sections cleaning and analysisReplication R script for the panel data cleaning and analysisAn excel spreadsheet with Uniform Crime Report data to merge to the cross sections.
d
Correspondence Metadata from the Digital Scholarly Edition of Edvard Munch's...
search.dataone.org
dataverse.azure.uit.no
+1more
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rockenberger, Annika; Sjølie, Loke; Bøe, Hilde (2024). Correspondence Metadata from the Digital Scholarly Edition of Edvard Munch's Writings [Dataset]. http://doi.org/10.18710/TAFUSV
Explore at:
Unique identifier
https://doi.org/10.18710/TAFUSV
Dataset updated
Sep 25, 2024
Dataset provided by
DataverseNO
Authors
Rockenberger, Annika; Sjølie, Loke; Bøe, Hilde
Time period covered
Jan 1, 1874 - Jan 1, 1944
Description
The eMunch dataset contains correspondence metadata of 8.527 letters to and from the Norwegian painter Edvard Munch (1863-1944). The dataset is derived from the digital scholarly edition of Edvard Munch's Writings, eMunch.no, edited by Hilde Bøe, The Munch Museum, Oslo. The eMunch dataset is part of the NorKorr - Norwegian Correspondences project that aims to collect metadata from all correspondences in collections of Norwegian academic and cultural heritage institutions, project website on GitHub. A Python script was developed to parse the XML files on eMunch.no and supplementary data files (Excel spreadsheet with updated dates, CSV file with GeoNames IDs for places) and extract the following metadata: sender's name, receiver's name, place name, date, and letter ID in the scholarly edition. These metadata were then converted into the Correspondence Metadata Interchange Format (CMIF). The entire dataset has been integrated into the international CorrespSearch search service for scholarly editions of letters hosted by the Berlin-Brandenburg Academy of Sciences—link to the CorrespSearch website.
s
Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum...
skyquestt.com
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SkyQuest Technology (2024). Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum Cleaning Robots, Floor Cleaning Robots, Window Cleaning Robots, Pool Cleaning Robots), By Application(Residential, Commercial, Industrial, and others), By Sales Channel(Online, Offline, and Others), By Region - Industry Forecast 2024-2031 [Dataset]. https://www.skyquestt.com/report/cleaning-robot-market
Explore at:
Dataset updated
Apr 16, 2024
Dataset authored and provided by
SkyQuest Technology
License
https://www.skyquestt.com/privacy/https://www.skyquestt.com/privacy/
Time period covered
2024 - 2031
Area covered
Global
Description
Global Cleaning Robot Market size was valued at USD 4.19 billion in 2022 and is poised to grow from USD 4.97 billion in 2023 to USD 12.81 billion by 2031, growing at a CAGR of 22.9% in the forecast period (2024-2031).
Pivot Tables and Charts with HR Data
kaggle.com
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carina Cruz (2025). Pivot Tables and Charts with HR Data [Dataset]. https://www.kaggle.com/datasets/carinacruz/hr-data-using-pivot-tables
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carina Cruz
Description
This project demonstrates the use of data cleaning techniques, Pivot Tables and charts in Excel to answer 3 main questions:

What is the employee age distribution?

What is the workforce gender distribution?

What is the workforce tenure distribution?

It includes 5 sheets:

Employee Data: Raw employee demographics data.

Employee Data_Edited: Raw data in table format and after data cleaning.

Age: Pivot table summarizing data for workforce age distribution and the respective chart.

Gender: Pivot table summarizing data for workforce gender distribution and the respective chart.

Tenure: Pivot table summarizing data for workforce tenure distribution and the respective chart.

You can download the Excel file with all formatting.
Coffee Shop Sales Analysis
kaggle.com
Updated Apr 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monis Amir (2024). Coffee Shop Sales Analysis [Dataset]. https://www.kaggle.com/datasets/monisamir/coffee-shop-sales-analysis/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Monis Amir
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Analyzing Coffee Shop Sales: Excel Insights 📈

In my first Data Analytics Project, I Discover the secrets of a fictional coffee shop's success with my data-driven analysis. By Analyzing a 5-sheet Excel dataset, I've uncovered valuable sales trends, customer preferences, and insights that can guide future business decisions. 📊☕

DATA CLEANING 🧹

• REMOVED DUPLICATES OR IRRELEVANT ENTRIES: Thoroughly eliminated duplicate records and irrelevant data to refine the dataset for analysis.

• FIXED STRUCTURAL ERRORS: Rectified any inconsistencies or structural issues within the data to ensure uniformity and accuracy.

• CHECKED FOR DATA CONSISTENCY: Verified the integrity and coherence of the dataset by identifying and resolving any inconsistencies or discrepancies.

DATA MANIPULATION 🛠️

• UTILIZED LOOKUPS: Used Excel's lookup functions for efficient data retrieval and analysis.

• IMPLEMENTED INDEX MATCH: Leveraged the Index Match function to perform advanced data searches and matches.

• APPLIED SUMIFS FUNCTIONS: Utilized SumIFs to calculate totals based on specified criteria.

• CALCULATED PROFITS: Used relevant formulas and techniques to determine profit margins and insights from the data.

PIVOTING THE DATA 𝄜

• CREATED PIVOT TABLES: Utilized Excel's PivotTable feature to pivot the data for in-depth analysis.

• FILTERED DATA: Utilized pivot tables to filter and analyze specific subsets of data, enabling focused insights. Specially used in “PEAK HOURS” and “TOP 3 PRODUCTS” charts.

VISUALIZATION 📊

• KEY INSIGHTS: Unveiled the grand total sales revenue while also analyzing the average bill per person, offering comprehensive insights into the coffee shop's performance and customer spending habits.

• SALES TREND ANALYSIS: Used Line chart to compute total sales across various time intervals, revealing valuable insights into evolving sales trends.

• PEAK HOUR ANALYSIS: Leveraged Clustered Column chart to identify peak sales hours, shedding light on optimal operating times and potential staffing needs.

• TOP 3 PRODUCTS IDENTIFICATION: Utilized Clustered Bar chart to determine the top three coffee types, facilitating strategic decisions regarding inventory management and marketing focus.

*I also used a Timeline to visualize chronological data trends and identify key patterns over specific times.

While it's a significant milestone for me, I recognize that there's always room for growth and improvement. Your feedback and insights are invaluable to me as I continue to refine my skills and tackle future projects. I'm eager to hear your thoughts and suggestions on how I can make my next endeavor even more impactful and insightful.

THANKS TO: WsCube Tech Mo Chen Alex Freberg

TOOLS USED: Microsoft Excel

DataAnalytics #DataAnalyst #ExcelProject #DataVisualization #BusinessIntelligence #SalesAnalysis #DataAnalysis #DataDrivenDecisions
c
Supporting data for "Using the scanning fluid dynamic gauging device to...
repository.cam.ac.uk
bin, xls
Updated Sep 16, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali, Akin; Ward, Glenn; Alam, Zayeem; Wilson, David Ian (2015). Supporting data for "Using the scanning fluid dynamic gauging device to understand the cleaning of baked lard layers" (Ali et al., Journal of Surfactants and Detergents) [Dataset]. http://doi.org/10.17863/CAM.68933
Explore at:
xls(182272 bytes), bin(493625 bytes), bin(302421 bytes), bin(123870 bytes), bin(234148 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.68933
Dataset updated
Sep 16, 2015
Dataset provided by
Apollo
University of Cambridge
Authors
Ali, Akin; Ward, Glenn; Alam, Zayeem; Wilson, David Ian
License
Attribution-ShareAlike 2.0 (CC BY-SA 2.0)https://creativecommons.org/licenses/by-sa/2.0/
License information was derived automatically
Description
These are Microsoft Excel files which contain the data used to generate the plots in the paper. The files are labelled by Figure number: a complete description is given in the paper.
d
Shanghai experiment of consequence conditions on effort - Dataset -...
catalogue.data.govt.nz
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2001). Shanghai experiment of consequence conditions on effort - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-10277999
Explore at:
Dataset updated
Feb 1, 2001
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Shanghai
Description
This data set supports the journal paper "Manipulating the consequences of tests: How Shanghai teens react to different consequences", published in Educational Research and Evaluation, v26 (n5-6), pp.221-251.The data were obtained to test the impact of different levels of consequence for taking a test on student test-taking effort. The data are part of the PhD project of Anran Zhao, supervised by Brown & Meissel.The data set is in MS Excel format. Sheet 1 provides an anonymous wide-format data set post-cleaning and missing value analysis of the data.Sheet 2 provides a description of each variable.
Bank Loan Analysis Project in Excel
kaggle.com
Updated May 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanjana Murthy (2024). Bank Loan Analysis Project in Excel [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/bank-loan-analysis-project/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 4, 2024
Dataset provided by
Kaggle
Authors
Sanjana Murthy
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
About Datasets: - Domain : Finance - Project: Bank loan of customers - Datasets: Finance_1.xlsx & Finance_2.xlsx - Dataset Type: Excel Data - Dataset Size: Each Excel file has 39k+ records

KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revol_bal 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data

Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results

This data contains Power Query, Power Pivot, Merge data, Clustered Bar Chart, Clustered Column Chart, Line Chart, 3D Pie chart, Dashboard, slicers, timeline, formatting techniques.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177

Data Cleaning Sample

Explore at:

151 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.5683/SP3/ZCN177

Dataset updated

Jul 13, 2023

Dataset provided by

Borealis

Authors

Rong Luo

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Sample data for exercises in Further Adventures in Data Cleaning.

Clear search

Close search

Google apps

Main menu

Data Cleaning Sample

Data Cleaning Excel Tutorial

Dataset

Contents

Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop

Excel-project: Glassdoor Data Cleaning

Dataset

Contents

Data from: Facebook Data for Sentiment Analysis

Data from: Cleaning Data with Open Refine

popular baby names with data cleaning

Call Center Performance MS Excel Analysis

Agriculture Sample Census Survey 2002-2003 - Tanzania

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Sampling error estimates

Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping

Data from: Data cleaning and enrichment through data integration: networking...

Description of the data and file structure

Description of the main data files

Hospital Excel Dataset

Healthcare #HospitalData #ExcelAnalysis #DataVisualization #PivotTables #DataCleaning #MedicalAnalytics #PatientTrends #CostAnalysis #AdmissionsAnalysis #InsuranceData #DataAnalysis #ExcelDashboards #HealthTech

Data from: Skepticism in science and punitive attitudes

Correspondence Metadata from the Digital Scholarly Edition of Edvard Munch's...

Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum...

Pivot Tables and Charts with HR Data

Coffee Shop Sales Analysis

DataAnalytics #DataAnalyst #ExcelProject #DataVisualization #BusinessIntelligence #SalesAnalysis #DataAnalysis #DataDrivenDecisions

Supporting data for "Using the scanning fluid dynamic gauging device to...

Shanghai experiment of consequence conditions on effort - Dataset -...

Bank Loan Analysis Project in Excel

Data Cleaning Sample