59 datasets found

o
Messy data for data cleaning exercise - Dataset - openAFRICA
open.africa
Updated Oct 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise
Explore at:
Dataset updated
Oct 6, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions
B
Data Cleaning Sample
borealisdata.ca
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
d
Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop
search.dataone.org
borealisdata.ca
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Costanzo, Lucia; Jadon, Vivek (2024). Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop [Dataset]. http://doi.org/10.5683/SP3/FF6AI9
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/FF6AI9
Dataset updated
Jul 31, 2024
Dataset provided by
Borealis
Authors
Costanzo, Lucia; Jadon, Vivek
Description
Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
E
Data from: Facebook Data for Sentiment Analysis
live.european-language-grid.eu
lindat.mff.cuni.cz
+1more
binary format
Updated Jul 16, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Facebook Data for Sentiment Analysis [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1057
Explore at:
binary formatAvailable download formats
Dataset updated
Jul 16, 2013
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Corpus consisting of 10,000 Facebook posts manually annotated on sentiment (2,587 positive, 5,174 neutral, 1,991 negative and 248 bipolar posts). The archive contains data and statistics in an Excel file (FBData.xlsx) and gold data in two text files with posts (gold-posts.txt) and labels (gols-labels.txt) on corresponding lines.
RAAAP-2 Datasets (17 linked datasets)
figshare.com
bin
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Kerridge; Patrice Ajai-Ajagbe; Cindy Kiel; Jennifer Shambrook; BRYONY WAKEFIELD (2023). RAAAP-2 Datasets (17 linked datasets) [Dataset]. http://doi.org/10.6084/m9.figshare.18972935.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.18972935.v2
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Simon Kerridge; Patrice Ajai-Ajagbe; Cindy Kiel; Jennifer Shambrook; BRYONY WAKEFIELD
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This collection contains the 17 anonymised datasets from the RAAAP-2 international survey of research management and administration professional undertaken in 2019. To preserve anonymity the data are presented in 17 datasets linked only by AnalysisRegionofEmployment, as many of the textual responses, even though redacted to remove institutional affiliation could be used to identify some individuals if linked to the other data. Each dataset is presented in the original SPSS format, suitable for further analyses, as well as an Excel equivalent for ease of viewing. There are additional files in this collection showing the the questionnaire and the mappings to the datasets together with the SPSS scripts used to produce the datasets. These data follow on from, but re not directly linked to the first RAAAP survey undertaken in 2016, data from which can also be found in FigShare Errata (16/5/23) an error in v13 of the main Data Cleansing syntax file (now updated to v14) meant that two variables were missing their value labels (the underlying codes were correct) - a new version (SPSS & Excel) of the Main Dataset has been updated
o
Data from: Cleaning Data with Open Refine
explore.openaire.eu
Updated Jan 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Richard Berry; Dr Luc Small; Dr Jeff Christiansen (2016). Cleaning Data with Open Refine [Dataset]. http://doi.org/10.5281/zenodo.6423839
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6423839
Dataset updated
Jan 1, 2016
Authors
Dr Richard Berry; Dr Luc Small; Dr Jeff Christiansen
Description
About this course Do you have messy data from multiple inconsistent sources, or open-responses to questionnaires? Do you want to improve the quality of your data by refining it and using the power of the internet? Open Refine is the perfect partner to Excel. It is a powerful, free tool for exploring, normalising and cleaning datasets, and extending data by accessing the internet through APIs. In this course we’ll work through the various features of Refine, including importing data, faceting, clustering, and calling remote APIs, by working on a fictional but plausible humanities research project. Learning Outcomes Download, install and run Open Refine Import data from csv, text or online sources and create projects Navigate data using the Open Refine interface Explore data by using facets Clean data using clustering Parse data using GREL syntax Extend data using Application Programming Interfaces (APIs) Export project for use in other applications Prerequisites The course has no prerequisites. Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.
Hospital Excel Dataset
kaggle.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omolola Labiyi (2025). Hospital Excel Dataset [Dataset]. https://www.kaggle.com/datasets/t0ut0u/hospital-excel-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Omolola Labiyi
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
📌 Project Overview This project analyzes hospital admissions, patient stays, and cost trends using Excel. The dataset contains information on patient demographics, hospital names, insurance providers, and treatment costs. Key insights were derived using PivotTables, charts, and formulas.

📊 Key Insights & Visualizations ✅ Top Hospitals by Admissions → Bar Chart ✅ Insurance Provider with Most Patients → Pie Chart ✅ Cost per Day Trends → Line Chart ✅ Average Length of Stay per Hospital → Bar Chart

🛠 Excel Analysis Techniques Used PivotTables for summarizing patient data

Conditional Formatting to highlight cost trends

Bar, Pie, and Line Charts for visualization

Statistical Analysis (Average length of stay, cost trends)

📂 Files Included 📌 hospital_analysis.xlsx – The full Excel analysis file 📌 hospital_summary.pdf – Summary of key findings

Healthcare #HospitalData #ExcelAnalysis #DataVisualization #PivotTables #DataCleaning #MedicalAnalytics #PatientTrends #CostAnalysis #AdmissionsAnalysis #InsuranceData #DataAnalysis #ExcelDashboards #HealthTech
i
Agriculture Sample Census Survey 2002-2003 - Tanzania
catalog.ihsn.org
datacatalog.ihsn.org
+1more
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Chief Government Statistician-Zanzibar (2019). Agriculture Sample Census Survey 2002-2003 - Tanzania [Dataset]. https://catalog.ihsn.org/catalog/1086
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
National Bureau of Statistics
Office of Chief Government Statistician-Zanzibar
Time period covered
2004
Area covered
Tanzania
Description
Abstract

The 2003 Agriculture Sample Census was designed to meet the data needs of a wide range of users down to district level including policy makers at local, regional and national levels, rural development agencies, funding institutions, researchers, NGOs, farmer organisations, etc. As a result the dataset is both more numerous in its sample and detailed in its scope compared to previous censuses and surveys. To date this is the most detailed Agricultural Census carried out in Africa.

The census was carried out in order to: · Identify structural changes if any, in the size of farm household holdings, crop and livestock production, farm input and implement use. It also seeks to determine if there are any improvements in rural infrastructure and in the level of agriculture household living conditions; · Provide benchmark data on productivity, production and agricultural practices in relation to policies and interventions promoted by the Ministry of Agriculture and Food Security and other stake holders. · Establish baseline data for the measurement of the impact of high level objectives of the Agriculture Sector Development Programme (ASDP), National Strategy for Growth and Reduction of Poverty (NSGRP) and other rural development programs and projects. · Obtain benchmark data that will be used to address specific issues such as: food security, rural poverty, gender, agro-processing, marketing, service delivery, etc.

Geographic coverage

Tanzania Mainland and Zanzibar

Analysis unit

Households

Individuals

Universe

Large scale, small scale and community farms.

Kind of data

Census/enumeration data [cen]

Sampling procedure

The Mainland sample consisted of 3,221 villages. These villages were drawn from the National Master Sample (NMS) developed by the National Bureau of Statistics (NBS) to serve as a national framework for the conduct of household based surveys in the country. The National Master Sample was developed from the 2002 Population and Housing Census. The total Mainland sample was 48,315 agricultural households. In Zanzibar a total of 317 enumeration areas (EAs) were selected and 4,755 agriculture households were covered. Nationwide, all regions and districts were sampled with the exception of three urban districts (two from Mainland and one from Zanzibar).

In both Mainland and Zanzibar, a stratified two stage sample was used. The number of villages/EAs selected for the first stage was based on a probability proportional to the number of villages in each district. In the second stage, 15 households were selected from a list of farming households in each selected Village/EA, using systematic random sampling, with the village chairpersons assisting to locate the selected households.

Mode of data collection

Face-to-face [f2f]

Research instrument

The census covered agriculture in detail as well as many other aspects of rural development and was conducted using three different questionnaires: • Small scale questionnaire • Community level questionnaire • Large scale farm questionnaire

The small scale farm questionnaire was the main census instrument and it includes questions related to crop and livestock production and practices; population demographics; access to services, resources and infrastructure; and issues on poverty, gender and subsistence versus profit making production unit.

The community level questionnaire was designed to collect village level data such as access and use of common resources, community tree plantation and seasonal farm gate prices.

The large scale farm questionnaire was administered to large farms either privately or corporately managed.

Questionnaire Design The questionnaires were designed following user meetings to ensure that the questions asked were in line with users data needs. Several features were incorporated into the design of the questionnaires to increase the accuracy of the data: • Where feasible all variables were extensively coded to reduce post enumeration coding error. • The definitions for each section were printed on the opposite page so that the enumerator could easily refer to the instructions whilst interviewing the farmer. • The responses to all questions were placed in boxes printed on the questionnaire, with one box per character. This feature made it possible to use scanning and Intelligent Character Recognition (ICR) technologies for data entry. • Skip patterns were used to reduce unnecessary and incorrect coding of sections which do not apply to the respondent. • Each section was clearly numbered, which facilitated the use of skip patterns and provided a reference for data type coding for the programming of CSPro, SPSS and the dissemination applications.

Cleaning operations

Data processing consisted of the following processes: · Data entry · Data structure formatting · Batch validation · Tabulation

Data Entry Scanning and ICR data capture technology for the small holder questionnaire were used on the Mainland. This not only increased the speed of data entry, it also increased the accuracy due to the reduction of keystroke errors. Interactive validation routines were incorporated into the ICR software to track errors during the verification process. The scanning operation was so successful that it is highly recommended for adoption in future censuses/surveys. In Zanzibar all data was entered manually using CSPro.

Prior to scanning, all questionnaires underwent a manual cleaning exercise. This involved checking that the questionnaire had a full set of pages, correct identification and good handwriting. A score was given to each questionnaire based on the legibility and the completeness of enumeration. This score will be used to assess the quality of enumeration and supervision in order to select the best field staff for future censuses/surveys.

CSPro was used for data entry of all Large Scale Farm and community based questionnaires due to the relatively small number of questionnaires. It was also used to enter data from the 2,880 small holder questionnaires that were rejected by the ICR extraction application.

Data Structure Formatting A program was developed in visual basic to automatically alter the structure of the output from the scanning/extraction process in order to harmonise it with the manually entered data. The program automatically checked and changed the number of digits for each variable, the record type code, the number of questionnaires in the village, the consistency of the Village ID Code and saved the data of one village in a file named after the village code.

Batch Validation A batch validation program was developed in order to identify inconsistencies within a questionnaire. This is in addition to the interactive validation during the ICR extraction process. The procedures varied from simple range checking within each variable to the more complex checking between variables. It took six months to screen, edit and validate the data from the smallholder questionnaires. After the long process of data cleaning, tabulations were prepared based on a pre-designed tabulation plan.

Tabulations Statistical Package for Social Sciences (SPSS) was used to produce the Census tabulations and Microsoft Excel was used to organize the tables and compute additional indicators. Excel was also used to produce charts while ArcView and Freehand were used for the maps.

Analysis and Report Preparation The analysis in this report focuses on regional comparisons, time series and national production estimates. Microsoft Excel was used to produce charts; ArcView and Freehand were used for maps, whereas Microsoft Word was used to compile the report.

Data Quality A great deal of emphasis was placed on data quality throughout the whole exercise from planning, questionnaire design, training, supervision, data entry, validation and cleaning/editing. As a result of this, it is believed that the census is highly accurate and representative of what was experienced at field level during the Census year. With very few exceptions, the variables in the questionnaire are within the norms for Tanzania and they follow expected time series trends when compared to historical data. Standard Errors and Coefficients of Variation for the main variables are presented in the Technical Report (Volume I).

Sampling error estimates

The Sampling Error found on page (21) up to page (22) in the Technical Report for Agriculture Sample Census Survey 2002-2003
Pivot Tables and Charts with HR Data
kaggle.com
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carina Cruz (2025). Pivot Tables and Charts with HR Data [Dataset]. https://www.kaggle.com/datasets/carinacruz/hr-data-using-pivot-tables
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carina Cruz
Description
This project demonstrates the use of data cleaning techniques, Pivot Tables and charts in Excel to answer 3 main questions:

What is the employee age distribution?

What is the workforce gender distribution?

What is the workforce tenure distribution?

It includes 5 sheets:

Employee Data: Raw employee demographics data.

Employee Data_Edited: Raw data in table format and after data cleaning.

Age: Pivot table summarizing data for workforce age distribution and the respective chart.

Gender: Pivot table summarizing data for workforce gender distribution and the respective chart.

Tenure: Pivot table summarizing data for workforce tenure distribution and the respective chart.

You can download the Excel file with all formatting.
f
Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping
figshare.com
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maryam Binti Haji Abdul Halim (2025). Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping [Dataset]. http://doi.org/10.6084/m9.figshare.28147451.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.28147451.v1
Dataset updated
Jan 6, 2025
Dataset provided by
figshare
Authors
Maryam Binti Haji Abdul Halim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This project focuses on data mapping, integration, and analysis to support the development and enhancement of six UNCDF operational applications: OrgTraveler, Comms Central, Internal Support Hub, Partnership 360, SmartHR, and TimeTrack. These apps streamline workflows for travel claims, internal support, partnership management, and time tracking within UNCDF.Key Features and Tools:Data Mapping for Salesforce CRM Migration: Structured and mapped data flows to ensure compatibility and seamless migration to Salesforce CRM.Python for Data Cleaning and Transformation: Utilized pandas, numpy, and APIs to clean, preprocess, and transform raw datasets into standardized formats.Power BI Dashboards: Designed interactive dashboards to visualize workflows and monitor performance metrics for decision-making.Collaboration Across Platforms: Integrated Google Collab for code collaboration and Microsoft Excel for data validation and analysis.
Coffee Shop Sales Analysis
kaggle.com
Updated Apr 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monis Amir (2024). Coffee Shop Sales Analysis [Dataset]. https://www.kaggle.com/datasets/monisamir/coffee-shop-sales-analysis/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Monis Amir
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Analyzing Coffee Shop Sales: Excel Insights 📈

In my first Data Analytics Project, I Discover the secrets of a fictional coffee shop's success with my data-driven analysis. By Analyzing a 5-sheet Excel dataset, I've uncovered valuable sales trends, customer preferences, and insights that can guide future business decisions. 📊☕

DATA CLEANING 🧹

• REMOVED DUPLICATES OR IRRELEVANT ENTRIES: Thoroughly eliminated duplicate records and irrelevant data to refine the dataset for analysis.

• FIXED STRUCTURAL ERRORS: Rectified any inconsistencies or structural issues within the data to ensure uniformity and accuracy.

• CHECKED FOR DATA CONSISTENCY: Verified the integrity and coherence of the dataset by identifying and resolving any inconsistencies or discrepancies.

DATA MANIPULATION 🛠️

• UTILIZED LOOKUPS: Used Excel's lookup functions for efficient data retrieval and analysis.

• IMPLEMENTED INDEX MATCH: Leveraged the Index Match function to perform advanced data searches and matches.

• APPLIED SUMIFS FUNCTIONS: Utilized SumIFs to calculate totals based on specified criteria.

• CALCULATED PROFITS: Used relevant formulas and techniques to determine profit margins and insights from the data.

PIVOTING THE DATA 𝄜

• CREATED PIVOT TABLES: Utilized Excel's PivotTable feature to pivot the data for in-depth analysis.

• FILTERED DATA: Utilized pivot tables to filter and analyze specific subsets of data, enabling focused insights. Specially used in “PEAK HOURS” and “TOP 3 PRODUCTS” charts.

VISUALIZATION 📊

• KEY INSIGHTS: Unveiled the grand total sales revenue while also analyzing the average bill per person, offering comprehensive insights into the coffee shop's performance and customer spending habits.

• SALES TREND ANALYSIS: Used Line chart to compute total sales across various time intervals, revealing valuable insights into evolving sales trends.

• PEAK HOUR ANALYSIS: Leveraged Clustered Column chart to identify peak sales hours, shedding light on optimal operating times and potential staffing needs.

• TOP 3 PRODUCTS IDENTIFICATION: Utilized Clustered Bar chart to determine the top three coffee types, facilitating strategic decisions regarding inventory management and marketing focus.

*I also used a Timeline to visualize chronological data trends and identify key patterns over specific times.

While it's a significant milestone for me, I recognize that there's always room for growth and improvement. Your feedback and insights are invaluable to me as I continue to refine my skills and tackle future projects. I'm eager to hear your thoughts and suggestions on how I can make my next endeavor even more impactful and insightful.

THANKS TO: WsCube Tech Mo Chen Alex Freberg

TOOLS USED: Microsoft Excel

DataAnalytics #DataAnalyst #ExcelProject #DataVisualization #BusinessIntelligence #SalesAnalysis #DataAnalysis #DataDrivenDecisions
d
Data from: Data cleaning and enrichment through data integration: networking...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar (2025). Data cleaning and enrichment through data integration: networking the Italian academia [Dataset]. http://doi.org/10.5061/dryad.wpzgmsbwj
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.wpzgmsbwj
Dataset updated
Feb 25, 2025
Dataset provided by
Dryad Digital Repository
Authors
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar
Description
We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts. , The proposed network is built starting from two distinct data sources:

the entire dataset dump from Semantic Scholar (with particular emphasis on the authors and papers datasets) the entire list of Italian faculty members as maintained by Cineca (under appointment by the Italian Ministry of University and Research).

By means of a custom name-identity recognition algorithm (details are available in the accompanying paper published in Scientific Data), the names of the authors in the Semantic Scholar dataset have been mapped against the names contained in the Cineca dataset and authors with no match (e.g., because of not being part of an Italian university) have been discarded. The remaining authors will compose the nodes of the network, which have been enriched with node-related (i.e., author-related) attributes. In order to build the network edges, we leveraged the papers dataset from Semantic Scholar: specifically, any two authors are said to be connected if there is at least one pap..., , # Data cleaning and enrichment through data integration: networking the Italian academia

https://doi.org/10.5061/dryad.wpzgmsbwj

Manuscript published inÂ Scientific Data with DOI .

Description of the data and file structure

This repository contains two main data files:

edge_data_AGG.csv, the full network in comma-separated edge list format (this file contains mainly temporal co-authorship information);

Coauthorship_Network_AGG.graphml, the full network in GraphML format.Â

along with several supplementary data, listed below, useful only to build the network (i.e., for reproducibility only):

University-City-match.xlsx, an Excel file that maps the name of a university against the city where its respective headquarter is located;

Areas-SS-CINECA-match.xlsx, an Excel file that maps the research areas in Cineca against the research areas in Semantic Scholar.

Description of the main data files

TheÂ `Coauthorship_Networ...
o
Vaginal Pulse Amplitude Data Cleaning Guide
osf.io
Updated Oct 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kirstin Clephane; Tierney Lorenz (2022). Vaginal Pulse Amplitude Data Cleaning Guide [Dataset]. http://doi.org/10.17605/OSF.IO/T67QN
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/T67QN
Dataset updated
Oct 9, 2022
Dataset provided by
Center For Open Science
Authors
Kirstin Clephane; Tierney Lorenz
Description
This is a how-to guide for cleaning vaginal photoplethysmography (VPP) signal using AcqKnowledge (v. 5.0.5) and Excel (or equivalent database management software, like OpenOffice).
o
Data from: Skepticism in science and punitive attitudes
openicpsr.org
delimited
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Rydberg; Luke DeZago (2025). Skepticism in science and punitive attitudes [Dataset]. http://doi.org/10.3886/E228541V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E228541V1
Dataset updated
May 4, 2025
Dataset provided by
University of Massachusetts Lowell
Authors
Jason Rydberg; Luke DeZago
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication materials for the manuscript "Skepticism in Science and Punitive Attitudes", published in the Journal of Criminal Justice.Note that the GSS repeated cross sections for 1972 to 2018 are too large to upload here, but they can be accessed from https://gss.norc.org/content/dam/gss/get-the-data/documents/spss/GSS_spss.zipIncluded here are:(A link to the repeated cross-sections data)Each of the 3 wave panels (2006-2010; 2008-2012; 2010-2014)Replication R script for the repeated cross sections cleaning and analysisReplication R script for the panel data cleaning and analysisAn excel spreadsheet with Uniform Crime Report data to merge to the cross sections.
m
Data from: Automating Knowledge: A Case Study of Library Automation in of...
data.mendeley.com
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RUCHI SINHA (2025). Automating Knowledge: A Case Study of Library Automation in of College Libraries of Dadra and Nagar Haveli [Dataset]. http://doi.org/10.17632/h2c2w5sgbx.1
Explore at:
Unique identifier
https://doi.org/10.17632/h2c2w5sgbx.1
Dataset updated
Mar 10, 2025
Authors
RUCHI SINHA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Dadra and Nagar Haveli
Description
Research Design: Mixed-methods approach, combining quantitative and qualitative methods. Data Collection: - Survey questionnaire (Google Forms) with 500 respondents from 10 college libraries. - In-depth interviews with 20 librarians and library administrators. - Observational studies in 5 college libraries. Data Analysis: - Descriptive statistics (mean, median, mode, standard deviation). - Inferential statistics (t-tests, ANOVA). - Thematic analysis for qualitative data. Instruments and Software: - Google Forms - Microsoft Excel - SPSS - NVivo Protocols: - Survey protocol: pilot-tested with a small group. - Interview protocol: used an interview guide. Workflows: - Data cleaning and validation.
Titanic Analysis Project
kaggle.com
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Samir (2023). Titanic Analysis Project [Dataset]. https://www.kaggle.com/datasets/ahmedsamir11111/titanic-analysis-project/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 9, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ahmed Samir
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Titanic Analysis May they all rest in peace... I extracted some statistics based on the dataset available about the passengers of the sunken Titanic ship. The workflow included the following stages:

Data collection. Data understanding. Data cleaning. Analysis and posing questions. Drawing answers to the questions and extracting results. Creating a visualization of those results on a dashboard. Power Query. Power Pivot.
s
Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum...
skyquestt.com
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SkyQuest Technology (2024). Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum Cleaning Robots, Floor Cleaning Robots, Window Cleaning Robots, Pool Cleaning Robots), By Application(Residential, Commercial, Industrial, and others), By Sales Channel(Online, Offline, and Others), By Region - Industry Forecast 2024-2031 [Dataset]. https://www.skyquestt.com/report/cleaning-robot-market
Explore at:
Dataset updated
Apr 16, 2024
Dataset authored and provided by
SkyQuest Technology
License
https://www.skyquestt.com/privacy/https://www.skyquestt.com/privacy/
Time period covered
2024 - 2031
Area covered
Global
Description
Global Cleaning Robot Market size was valued at USD 4.19 billion in 2022 and is poised to grow from USD 4.97 billion in 2023 to USD 12.81 billion by 2031, growing at a CAGR of 22.9% in the forecast period (2024-2031).
Dataset for "Cognitive behavioural therapy self-help intervention...
zenodo.org
data.niaid.nih.gov
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chelsea Coumoundouros; Chelsea Coumoundouros; Paul Farrand; Paul Farrand; Alexander Hamilton; Alexander Hamilton; Louise Von Essen; Robbert Sanderman; Joanne Woodford; Joanne Woodford; Louise Von Essen; Robbert Sanderman (2024). Dataset for "Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey" [Dataset]. http://doi.org/10.5281/zenodo.7104638
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7104638
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chelsea Coumoundouros; Chelsea Coumoundouros; Paul Farrand; Paul Farrand; Alexander Hamilton; Alexander Hamilton; Louise Von Essen; Robbert Sanderman; Joanne Woodford; Joanne Woodford; Louise Von Essen; Robbert Sanderman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and R code used for the analysis of data for the publication: Coumoundouros et al., Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey. BMC Nephrology

Summary of study

An online cross-sectional survey for informal caregivers (e.g. family and friends) of people living with chronic kidney disease in the United Kingdom. Study aimed to examine informal caregivers' cognitive behavioural therapy self-help intervention preferences, and describe the caregiving situation (e.g. types of care activities) and informal caregiver's mental health (depression, anxiety and stress symptoms).

Participants were eligible to participate if they were at least 18 years old, lived in the United Kingdom, and provided unpaid care to someone living with chronic kidney disease who was at least 18 years old.

The online survey included questions regarding (1) informal caregiver's characteristics; (2) care recipient's characteristics; (3) intervention preferences (e.g. content, delivery format); and (4) informal caregiver's mental health. Informal caregiver's mental health was assessed using the 21 item Depression, Anxiety, and Stress Scale (DASS-21), which is composed of three subscales measuring depression, anxiety, and stress, respectively.

Sixty-five individuals participated in the survey.

See the published article for full study details.

Description of uploaded files

1. ENTWINE_ESR14_Kidney Carer Survey Data_FULL_2022-08-30: Excel file with the complete, raw survey data. Note: the first half of participant's postal codes was collected, however this data was removed from the uploaded dataset to ensure participant anonymity.

2. ENTWINE_ESR14_Kidney Carer Survey Data_Clean DASS-21 Data_2022-08-30: Excel file with cleaned data for the DASS-21 scale. Data cleaning involved imputation of missing data if participants were missing data for one item within a subscale of the DASS-21. Missing values were imputed by finding the mean of all other items within the relevant subscale.

3. ENTWINE_ESR14_Kidney Carer Survey_KEY_2022-08-30: Excel file with key linking item labels in uploaded datasets with the corresponding survey question.

4. R Code for Kidney Carer Survey_2022-08-30: R file of R code used to analyse survey data.

5. R code for Kidney Carer Survey_PDF_2022-08-30: PDF file of R code used to analyse survey data.
Bank Loan Analysis Project in Excel
kaggle.com
Updated May 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanjana Murthy (2024). Bank Loan Analysis Project in Excel [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/bank-loan-analysis-project/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 4, 2024
Dataset provided by
Kaggle
Authors
Sanjana Murthy
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
About Datasets: - Domain : Finance - Project: Bank loan of customers - Datasets: Finance_1.xlsx & Finance_2.xlsx - Dataset Type: Excel Data - Dataset Size: Each Excel file has 39k+ records

KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revol_bal 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data

Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results

This data contains Power Query, Power Pivot, Merge data, Clustered Bar Chart, Clustered Column Chart, Line Chart, 3D Pie chart, Dashboard, slicers, timeline, formatting techniques.
f
Cleaned NHANES 1988-2018
figshare.com
txt
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21743372.v9
Dataset updated
Feb 18, 2025
Dataset provided by
figshare
Authors
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise

Messy data for data cleaning exercise - Dataset - openAFRICA

Explore at:

Dataset updated

Oct 6, 2021

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions

Clear search

Close search

Google apps

Main menu

Messy data for data cleaning exercise - Dataset - openAFRICA

Data Cleaning Sample

Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop

Data from: Facebook Data for Sentiment Analysis

RAAAP-2 Datasets (17 linked datasets)

Data from: Cleaning Data with Open Refine

Hospital Excel Dataset

Healthcare #HospitalData #ExcelAnalysis #DataVisualization #PivotTables #DataCleaning #MedicalAnalytics #PatientTrends #CostAnalysis #AdmissionsAnalysis #InsuranceData #DataAnalysis #ExcelDashboards #HealthTech

Agriculture Sample Census Survey 2002-2003 - Tanzania

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Sampling error estimates

Pivot Tables and Charts with HR Data

Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping

Coffee Shop Sales Analysis

DataAnalytics #DataAnalyst #ExcelProject #DataVisualization #BusinessIntelligence #SalesAnalysis #DataAnalysis #DataDrivenDecisions

Data from: Data cleaning and enrichment through data integration: networking...

Description of the data and file structure

Description of the main data files

Vaginal Pulse Amplitude Data Cleaning Guide

Data from: Skepticism in science and punitive attitudes

Data from: Automating Knowledge: A Case Study of Library Automation in of...

Titanic Analysis Project

Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum...

Dataset for "Cognitive behavioural therapy self-help intervention...

Bank Loan Analysis Project in Excel

Cleaned NHANES 1988-2018

Messy data for data cleaning exercise - Dataset - openAFRICA