45 datasets found
  1. B

    Data Cleaning Sample

    • borealisdata.ca
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  2. Data Cleaning Excel Tutorial

    • kaggle.com
    Updated Jul 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Khaled Idris (2023). Data Cleaning Excel Tutorial [Dataset]. https://www.kaggle.com/datasets/mohamedkhaledidris/data-cleaning-excel-tutorial
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 22, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohamed Khaled Idris
    Description

    Dataset

    This dataset was created by Mohamed Khaled Idris

    Contents

  3. d

    Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop

    • search.dataone.org
    • borealisdata.ca
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Costanzo, Lucia; Jadon, Vivek (2024). Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop [Dataset]. http://doi.org/10.5683/SP3/FF6AI9
    Explore at:
    Dataset updated
    Jul 31, 2024
    Dataset provided by
    Borealis
    Authors
    Costanzo, Lucia; Jadon, Vivek
    Description

    Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.

  4. Excel-project: Glassdoor Data Cleaning

    • kaggle.com
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Lira (2023). Excel-project: Glassdoor Data Cleaning [Dataset]. https://www.kaggle.com/datasets/luisliraportfolio/excel-project-clean-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Luis Lira
    Description

    Dataset

    This dataset was created by Luis Lira

    Contents

  5. E

    Data from: Facebook Data for Sentiment Analysis

    • live.european-language-grid.eu
    • lindat.mff.cuni.cz
    • +1more
    binary format
    Updated Jul 16, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Facebook Data for Sentiment Analysis [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1057
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Jul 16, 2013
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Corpus consisting of 10,000 Facebook posts manually annotated on sentiment (2,587 positive, 5,174 neutral, 1,991 negative and 248 bipolar posts). The archive contains data and statistics in an Excel file (FBData.xlsx) and gold data in two text files with posts (gold-posts.txt) and labels (gols-labels.txt) on corresponding lines.

  6. o

    Data from: Cleaning Data with Open Refine

    • explore.openaire.eu
    Updated Jan 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr Richard Berry; Dr Luc Small; Dr Jeff Christiansen (2016). Cleaning Data with Open Refine [Dataset]. http://doi.org/10.5281/zenodo.6423839
    Explore at:
    Dataset updated
    Jan 1, 2016
    Authors
    Dr Richard Berry; Dr Luc Small; Dr Jeff Christiansen
    Description

    About this course Do you have messy data from multiple inconsistent sources, or open-responses to questionnaires? Do you want to improve the quality of your data by refining it and using the power of the internet? Open Refine is the perfect partner to Excel. It is a powerful, free tool for exploring, normalising and cleaning datasets, and extending data by accessing the internet through APIs. In this course we’ll work through the various features of Refine, including importing data, faceting, clustering, and calling remote APIs, by working on a fictional but plausible humanities research project. Learning Outcomes Download, install and run Open Refine Import data from csv, text or online sources and create projects Navigate data using the Open Refine interface Explore data by using facets Clean data using clustering Parse data using GREL syntax Extend data using Application Programming Interfaces (APIs) Export project for use in other applications Prerequisites The course has no prerequisites. Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.

  7. popular baby names with data cleaning

    • kaggle.com
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Real Sourabh Singhal (2023). popular baby names with data cleaning [Dataset]. https://www.kaggle.com/datasets/realsourabhsinghal/popular-baby-names-with-data-cleaning/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Real Sourabh Singhal
    Description

    It completely data clean excel file to attain accurate data analysis with proper visualization

  8. Call Center Performance MS Excel Analysis

    • kaggle.com
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oluwabori Abiodun-Johnson (2023). Call Center Performance MS Excel Analysis [Dataset]. https://www.kaggle.com/datasets/oluwaboriaj/call-center-dataset-analysis/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Oluwabori Abiodun-Johnson
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    --------CALL CENTER PERFORMANCE DATASET ANALYSIS--------

    This is a self-guided project.

    The Call Center dataset contained customer data such as caller id, customer name, date, call channel, city, state, reason for calling, call duration, e.t.c.

    I tasked myself with identifying trends and patterns so as to create a summarical overview of the data which can give an overview-level understanding of the data to technical and non-technical viewers.

    OBJECTIVES: Create a dashboard (using charts, slicers and KPIs) which can be used to statistically track, monitor and visualize the performance of a Call Center.

    SOFTWARE TOOLS USED: Microsoft Excel

    ANALYTICAL ACTIONS PERFORMED: Data Importation, Data Processing, Data Cleaning, VLOOKUP Pivot Tables Data Visualization (Dashboard creation) Connection Reporting (connecting slicers to Dashboard)

  9. i

    Agriculture Sample Census Survey 2002-2003 - Tanzania

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    • +1more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (2019). Agriculture Sample Census Survey 2002-2003 - Tanzania [Dataset]. https://catalog.ihsn.org/catalog/1086
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset provided by
    Office of Chief Government Statistician-Zanzibar
    National Bureau of Statistics
    Time period covered
    2004
    Area covered
    Tanzania
    Description

    Abstract

    The 2003 Agriculture Sample Census was designed to meet the data needs of a wide range of users down to district level including policy makers at local, regional and national levels, rural development agencies, funding institutions, researchers, NGOs, farmer organisations, etc. As a result the dataset is both more numerous in its sample and detailed in its scope compared to previous censuses and surveys. To date this is the most detailed Agricultural Census carried out in Africa.

    The census was carried out in order to: · Identify structural changes if any, in the size of farm household holdings, crop and livestock production, farm input and implement use. It also seeks to determine if there are any improvements in rural infrastructure and in the level of agriculture household living conditions; · Provide benchmark data on productivity, production and agricultural practices in relation to policies and interventions promoted by the Ministry of Agriculture and Food Security and other stake holders. · Establish baseline data for the measurement of the impact of high level objectives of the Agriculture Sector Development Programme (ASDP), National Strategy for Growth and Reduction of Poverty (NSGRP) and other rural development programs and projects. · Obtain benchmark data that will be used to address specific issues such as: food security, rural poverty, gender, agro-processing, marketing, service delivery, etc.

    Geographic coverage

    Tanzania Mainland and Zanzibar

    Analysis unit

    • Households
    • Individuals

    Universe

    Large scale, small scale and community farms.

    Kind of data

    Census/enumeration data [cen]

    Sampling procedure

    The Mainland sample consisted of 3,221 villages. These villages were drawn from the National Master Sample (NMS) developed by the National Bureau of Statistics (NBS) to serve as a national framework for the conduct of household based surveys in the country. The National Master Sample was developed from the 2002 Population and Housing Census. The total Mainland sample was 48,315 agricultural households. In Zanzibar a total of 317 enumeration areas (EAs) were selected and 4,755 agriculture households were covered. Nationwide, all regions and districts were sampled with the exception of three urban districts (two from Mainland and one from Zanzibar).

    In both Mainland and Zanzibar, a stratified two stage sample was used. The number of villages/EAs selected for the first stage was based on a probability proportional to the number of villages in each district. In the second stage, 15 households were selected from a list of farming households in each selected Village/EA, using systematic random sampling, with the village chairpersons assisting to locate the selected households.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The census covered agriculture in detail as well as many other aspects of rural development and was conducted using three different questionnaires: • Small scale questionnaire • Community level questionnaire • Large scale farm questionnaire

    The small scale farm questionnaire was the main census instrument and it includes questions related to crop and livestock production and practices; population demographics; access to services, resources and infrastructure; and issues on poverty, gender and subsistence versus profit making production unit.

    The community level questionnaire was designed to collect village level data such as access and use of common resources, community tree plantation and seasonal farm gate prices.

    The large scale farm questionnaire was administered to large farms either privately or corporately managed.

    Questionnaire Design The questionnaires were designed following user meetings to ensure that the questions asked were in line with users data needs. Several features were incorporated into the design of the questionnaires to increase the accuracy of the data: • Where feasible all variables were extensively coded to reduce post enumeration coding error. • The definitions for each section were printed on the opposite page so that the enumerator could easily refer to the instructions whilst interviewing the farmer. • The responses to all questions were placed in boxes printed on the questionnaire, with one box per character. This feature made it possible to use scanning and Intelligent Character Recognition (ICR) technologies for data entry. • Skip patterns were used to reduce unnecessary and incorrect coding of sections which do not apply to the respondent. • Each section was clearly numbered, which facilitated the use of skip patterns and provided a reference for data type coding for the programming of CSPro, SPSS and the dissemination applications.

    Cleaning operations

    Data processing consisted of the following processes: · Data entry · Data structure formatting · Batch validation · Tabulation

    Data Entry Scanning and ICR data capture technology for the small holder questionnaire were used on the Mainland. This not only increased the speed of data entry, it also increased the accuracy due to the reduction of keystroke errors. Interactive validation routines were incorporated into the ICR software to track errors during the verification process. The scanning operation was so successful that it is highly recommended for adoption in future censuses/surveys. In Zanzibar all data was entered manually using CSPro.

    Prior to scanning, all questionnaires underwent a manual cleaning exercise. This involved checking that the questionnaire had a full set of pages, correct identification and good handwriting. A score was given to each questionnaire based on the legibility and the completeness of enumeration. This score will be used to assess the quality of enumeration and supervision in order to select the best field staff for future censuses/surveys.

    CSPro was used for data entry of all Large Scale Farm and community based questionnaires due to the relatively small number of questionnaires. It was also used to enter data from the 2,880 small holder questionnaires that were rejected by the ICR extraction application.

    Data Structure Formatting A program was developed in visual basic to automatically alter the structure of the output from the scanning/extraction process in order to harmonise it with the manually entered data. The program automatically checked and changed the number of digits for each variable, the record type code, the number of questionnaires in the village, the consistency of the Village ID Code and saved the data of one village in a file named after the village code.

    Batch Validation A batch validation program was developed in order to identify inconsistencies within a questionnaire. This is in addition to the interactive validation during the ICR extraction process. The procedures varied from simple range checking within each variable to the more complex checking between variables. It took six months to screen, edit and validate the data from the smallholder questionnaires. After the long process of data cleaning, tabulations were prepared based on a pre-designed tabulation plan.

    Tabulations Statistical Package for Social Sciences (SPSS) was used to produce the Census tabulations and Microsoft Excel was used to organize the tables and compute additional indicators. Excel was also used to produce charts while ArcView and Freehand were used for the maps.

    Analysis and Report Preparation The analysis in this report focuses on regional comparisons, time series and national production estimates. Microsoft Excel was used to produce charts; ArcView and Freehand were used for maps, whereas Microsoft Word was used to compile the report.

    Data Quality A great deal of emphasis was placed on data quality throughout the whole exercise from planning, questionnaire design, training, supervision, data entry, validation and cleaning/editing. As a result of this, it is believed that the census is highly accurate and representative of what was experienced at field level during the Census year. With very few exceptions, the variables in the questionnaire are within the norms for Tanzania and they follow expected time series trends when compared to historical data. Standard Errors and Coefficients of Variation for the main variables are presented in the Technical Report (Volume I).

    Sampling error estimates

    The Sampling Error found on page (21) up to page (22) in the Technical Report for Agriculture Sample Census Survey 2002-2003

  10. f

    Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping

    • figshare.com
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maryam Binti Haji Abdul Halim (2025). Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping [Dataset]. http://doi.org/10.6084/m9.figshare.28147451.v1
    Explore at:
    Dataset updated
    Jan 6, 2025
    Dataset provided by
    figshare
    Authors
    Maryam Binti Haji Abdul Halim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project focuses on data mapping, integration, and analysis to support the development and enhancement of six UNCDF operational applications: OrgTraveler, Comms Central, Internal Support Hub, Partnership 360, SmartHR, and TimeTrack. These apps streamline workflows for travel claims, internal support, partnership management, and time tracking within UNCDF.Key Features and Tools:Data Mapping for Salesforce CRM Migration: Structured and mapped data flows to ensure compatibility and seamless migration to Salesforce CRM.Python for Data Cleaning and Transformation: Utilized pandas, numpy, and APIs to clean, preprocess, and transform raw datasets into standardized formats.Power BI Dashboards: Designed interactive dashboards to visualize workflows and monitor performance metrics for decision-making.Collaboration Across Platforms: Integrated Google Collab for code collaboration and Microsoft Excel for data validation and analysis.

  11. d

    Data from: Data cleaning and enrichment through data integration: networking...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar (2025). Data cleaning and enrichment through data integration: networking the Italian academia [Dataset]. http://doi.org/10.5061/dryad.wpzgmsbwj
    Explore at:
    Dataset updated
    Feb 25, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar
    Description

    We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts. , The proposed network is built starting from two distinct data sources:

    the entire dataset dump from Semantic Scholar (with particular emphasis on the authors and papers datasets) the entire list of Italian faculty members as maintained by Cineca (under appointment by the Italian Ministry of University and Research).

    By means of a custom name-identity recognition algorithm (details are available in the accompanying paper published in Scientific Data), the names of the authors in the Semantic Scholar dataset have been mapped against the names contained in the Cineca dataset and authors with no match (e.g., because of not being part of an Italian university) have been discarded. The remaining authors will compose the nodes of the network, which have been enriched with node-related (i.e., author-related) attributes. In order to build the network edges, we leveraged the papers dataset from Semantic Scholar: specifically, any two authors are said to be connected if there is at least one pap..., , # Data cleaning and enrichment through data integration: networking the Italian academia

    https://doi.org/10.5061/dryad.wpzgmsbwj

    Manuscript published in Scientific Data with DOI .

    Description of the data and file structure

    This repository contains two main data files:

    • edge_data_AGG.csv, the full network in comma-separated edge list format (this file contains mainly temporal co-authorship information);
    • Coauthorship_Network_AGG.graphml, the full network in GraphML format.Â

    along with several supplementary data, listed below, useful only to build the network (i.e., for reproducibility only):

    • University-City-match.xlsx, an Excel file that maps the name of a university against the city where its respective headquarter is located;
    • Areas-SS-CINECA-match.xlsx, an Excel file that maps the research areas in Cineca against the research areas in Semantic Scholar.

    Description of the main data files

    The `Coauthorship_Networ...

  12. Hospital Excel Dataset

    • kaggle.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omolola Labiyi (2025). Hospital Excel Dataset [Dataset]. https://www.kaggle.com/datasets/t0ut0u/hospital-excel-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Omolola Labiyi
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    📌 Project Overview This project analyzes hospital admissions, patient stays, and cost trends using Excel. The dataset contains information on patient demographics, hospital names, insurance providers, and treatment costs. Key insights were derived using PivotTables, charts, and formulas.

    📊 Key Insights & Visualizations ✅ Top Hospitals by Admissions → Bar Chart ✅ Insurance Provider with Most Patients → Pie Chart ✅ Cost per Day Trends → Line Chart ✅ Average Length of Stay per Hospital → Bar Chart

    🛠 Excel Analysis Techniques Used PivotTables for summarizing patient data

    Conditional Formatting to highlight cost trends

    Bar, Pie, and Line Charts for visualization

    Statistical Analysis (Average length of stay, cost trends)

    📂 Files Included 📌 hospital_analysis.xlsx – The full Excel analysis file 📌 hospital_summary.pdf – Summary of key findings

    Healthcare #HospitalData #ExcelAnalysis #DataVisualization #PivotTables #DataCleaning #MedicalAnalytics #PatientTrends #CostAnalysis #AdmissionsAnalysis #InsuranceData #DataAnalysis #ExcelDashboards #HealthTech

  13. o

    Data from: Skepticism in science and punitive attitudes

    • openicpsr.org
    delimited
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Rydberg; Luke DeZago (2025). Skepticism in science and punitive attitudes [Dataset]. http://doi.org/10.3886/E228541V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset provided by
    University of Massachusetts Lowell
    Authors
    Jason Rydberg; Luke DeZago
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Replication materials for the manuscript "Skepticism in Science and Punitive Attitudes", published in the Journal of Criminal Justice.Note that the GSS repeated cross sections for 1972 to 2018 are too large to upload here, but they can be accessed from https://gss.norc.org/content/dam/gss/get-the-data/documents/spss/GSS_spss.zipIncluded here are:(A link to the repeated cross-sections data)Each of the 3 wave panels (2006-2010; 2008-2012; 2010-2014)Replication R script for the repeated cross sections cleaning and analysisReplication R script for the panel data cleaning and analysisAn excel spreadsheet with Uniform Crime Report data to merge to the cross sections.

  14. d

    Correspondence Metadata from the Digital Scholarly Edition of Edvard Munch's...

    • search.dataone.org
    • dataverse.azure.uit.no
    • +1more
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rockenberger, Annika; Sjølie, Loke; Bøe, Hilde (2024). Correspondence Metadata from the Digital Scholarly Edition of Edvard Munch's Writings [Dataset]. http://doi.org/10.18710/TAFUSV
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    DataverseNO
    Authors
    Rockenberger, Annika; Sjølie, Loke; Bøe, Hilde
    Time period covered
    Jan 1, 1874 - Jan 1, 1944
    Description

    The eMunch dataset contains correspondence metadata of 8.527 letters to and from the Norwegian painter Edvard Munch (1863-1944). The dataset is derived from the digital scholarly edition of Edvard Munch's Writings, eMunch.no, edited by Hilde Bøe, The Munch Museum, Oslo. The eMunch dataset is part of the NorKorr - Norwegian Correspondences project that aims to collect metadata from all correspondences in collections of Norwegian academic and cultural heritage institutions, project website on GitHub. A Python script was developed to parse the XML files on eMunch.no and supplementary data files (Excel spreadsheet with updated dates, CSV file with GeoNames IDs for places) and extract the following metadata: sender's name, receiver's name, place name, date, and letter ID in the scholarly edition. These metadata were then converted into the Correspondence Metadata Interchange Format (CMIF). The entire dataset has been integrated into the international CorrespSearch search service for scholarly editions of letters hosted by the Berlin-Brandenburg Academy of Sciences—link to the CorrespSearch website.

  15. s

    Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum...

    • skyquestt.com
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SkyQuest Technology (2024). Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum Cleaning Robots, Floor Cleaning Robots, Window Cleaning Robots, Pool Cleaning Robots), By Application(Residential, Commercial, Industrial, and others), By Sales Channel(Online, Offline, and Others), By Region - Industry Forecast 2024-2031 [Dataset]. https://www.skyquestt.com/report/cleaning-robot-market
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset authored and provided by
    SkyQuest Technology
    License

    https://www.skyquestt.com/privacy/https://www.skyquestt.com/privacy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Global Cleaning Robot Market size was valued at USD 4.19 billion in 2022 and is poised to grow from USD 4.97 billion in 2023 to USD 12.81 billion by 2031, growing at a CAGR of 22.9% in the forecast period (2024-2031).

  16. Pivot Tables and Charts with HR Data

    • kaggle.com
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carina Cruz (2025). Pivot Tables and Charts with HR Data [Dataset]. https://www.kaggle.com/datasets/carinacruz/hr-data-using-pivot-tables
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carina Cruz
    Description

    This project demonstrates the use of data cleaning techniques, Pivot Tables and charts in Excel to answer 3 main questions:

    • What is the employee age distribution?
    • What is the workforce gender distribution?
    • What is the workforce tenure distribution?

    It includes 5 sheets:

    • Employee Data: Raw employee demographics data.
    • Employee Data_Edited: Raw data in table format and after data cleaning.
    • Age: Pivot table summarizing data for workforce age distribution and the respective chart.
    • Gender: Pivot table summarizing data for workforce gender distribution and the respective chart.
    • Tenure: Pivot table summarizing data for workforce tenure distribution and the respective chart.

    You can download the Excel file with all formatting.

  17. Coffee Shop Sales Analysis

    • kaggle.com
    Updated Apr 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monis Amir (2024). Coffee Shop Sales Analysis [Dataset]. https://www.kaggle.com/datasets/monisamir/coffee-shop-sales-analysis/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Monis Amir
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Analyzing Coffee Shop Sales: Excel Insights 📈

    In my first Data Analytics Project, I Discover the secrets of a fictional coffee shop's success with my data-driven analysis. By Analyzing a 5-sheet Excel dataset, I've uncovered valuable sales trends, customer preferences, and insights that can guide future business decisions. 📊☕

    DATA CLEANING 🧹

    • REMOVED DUPLICATES OR IRRELEVANT ENTRIES: Thoroughly eliminated duplicate records and irrelevant data to refine the dataset for analysis.

    • FIXED STRUCTURAL ERRORS: Rectified any inconsistencies or structural issues within the data to ensure uniformity and accuracy.

    • CHECKED FOR DATA CONSISTENCY: Verified the integrity and coherence of the dataset by identifying and resolving any inconsistencies or discrepancies.

    DATA MANIPULATION 🛠️

    • UTILIZED LOOKUPS: Used Excel's lookup functions for efficient data retrieval and analysis.

    • IMPLEMENTED INDEX MATCH: Leveraged the Index Match function to perform advanced data searches and matches.

    • APPLIED SUMIFS FUNCTIONS: Utilized SumIFs to calculate totals based on specified criteria.

    • CALCULATED PROFITS: Used relevant formulas and techniques to determine profit margins and insights from the data.

    PIVOTING THE DATA 𝄜

    • CREATED PIVOT TABLES: Utilized Excel's PivotTable feature to pivot the data for in-depth analysis.

    • FILTERED DATA: Utilized pivot tables to filter and analyze specific subsets of data, enabling focused insights. Specially used in “PEAK HOURS” and “TOP 3 PRODUCTS” charts.

    VISUALIZATION 📊

    • KEY INSIGHTS: Unveiled the grand total sales revenue while also analyzing the average bill per person, offering comprehensive insights into the coffee shop's performance and customer spending habits.

    • SALES TREND ANALYSIS: Used Line chart to compute total sales across various time intervals, revealing valuable insights into evolving sales trends.

    • PEAK HOUR ANALYSIS: Leveraged Clustered Column chart to identify peak sales hours, shedding light on optimal operating times and potential staffing needs.

    • TOP 3 PRODUCTS IDENTIFICATION: Utilized Clustered Bar chart to determine the top three coffee types, facilitating strategic decisions regarding inventory management and marketing focus.

    *I also used a Timeline to visualize chronological data trends and identify key patterns over specific times.

    While it's a significant milestone for me, I recognize that there's always room for growth and improvement. Your feedback and insights are invaluable to me as I continue to refine my skills and tackle future projects. I'm eager to hear your thoughts and suggestions on how I can make my next endeavor even more impactful and insightful.

    THANKS TO: WsCube Tech Mo Chen Alex Freberg

    TOOLS USED: Microsoft Excel

    DataAnalytics #DataAnalyst #ExcelProject #DataVisualization #BusinessIntelligence #SalesAnalysis #DataAnalysis #DataDrivenDecisions

  18. c

    Supporting data for "Using the scanning fluid dynamic gauging device to...

    • repository.cam.ac.uk
    bin, xls
    Updated Sep 16, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali, Akin; Ward, Glenn; Alam, Zayeem; Wilson, David Ian (2015). Supporting data for "Using the scanning fluid dynamic gauging device to understand the cleaning of baked lard layers" (Ali et al., Journal of Surfactants and Detergents) [Dataset]. http://doi.org/10.17863/CAM.68933
    Explore at:
    xls(182272 bytes), bin(493625 bytes), bin(302421 bytes), bin(123870 bytes), bin(234148 bytes)Available download formats
    Dataset updated
    Sep 16, 2015
    Dataset provided by
    Apollo
    University of Cambridge
    Authors
    Ali, Akin; Ward, Glenn; Alam, Zayeem; Wilson, David Ian
    License

    Attribution-ShareAlike 2.0 (CC BY-SA 2.0)https://creativecommons.org/licenses/by-sa/2.0/
    License information was derived automatically

    Description

    These are Microsoft Excel files which contain the data used to generate the plots in the paper. The files are labelled by Figure number: a complete description is given in the paper.

  19. d

    Shanghai experiment of consequence conditions on effort - Dataset -...

    • catalogue.data.govt.nz
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2001). Shanghai experiment of consequence conditions on effort - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-10277999
    Explore at:
    Dataset updated
    Feb 1, 2001
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Shanghai
    Description

    This data set supports the journal paper "Manipulating the consequences of tests: How Shanghai teens react to different consequences", published in Educational Research and Evaluation, v26 (n5-6), pp.221-251.The data were obtained to test the impact of different levels of consequence for taking a test on student test-taking effort. The data are part of the PhD project of Anran Zhao, supervised by Brown & Meissel.The data set is in MS Excel format. Sheet 1 provides an anonymous wide-format data set post-cleaning and missing value analysis of the data.Sheet 2 provides a description of each variable.

  20. Bank Loan Analysis Project in Excel

    • kaggle.com
    Updated May 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjana Murthy (2024). Bank Loan Analysis Project in Excel [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/bank-loan-analysis-project/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 4, 2024
    Dataset provided by
    Kaggle
    Authors
    Sanjana Murthy
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    About Datasets: - Domain : Finance - Project: Bank loan of customers - Datasets: Finance_1.xlsx & Finance_2.xlsx - Dataset Type: Excel Data - Dataset Size: Each Excel file has 39k+ records

    KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revol_bal 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data

    Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results

    This data contains Power Query, Power Pivot, Merge data, Clustered Bar Chart, Clustered Column Chart, Line Chart, 3D Pie chart, Dashboard, slicers, timeline, formatting techniques.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177

Data Cleaning Sample

Explore at:
151 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Sample data for exercises in Further Adventures in Data Cleaning.

Search
Clear search
Close search
Google apps
Main menu