64 datasets found

B
Data Cleaning Sample
borealisdata.ca
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
d
Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop
search.dataone.org
borealisdata.ca
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Costanzo, Lucia; Jadon, Vivek (2024). Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop [Dataset]. http://doi.org/10.5683/SP3/FF6AI9
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/FF6AI9
Dataset updated
Jul 31, 2024
Dataset provided by
Borealis
Authors
Costanzo, Lucia; Jadon, Vivek
Description
Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
Excel-project: Glassdoor Data Cleaning
kaggle.com
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis Lira (2023). Excel-project: Glassdoor Data Cleaning [Dataset]. https://www.kaggle.com/datasets/luisliraportfolio/excel-project-clean-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 26, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Luis Lira
Description
Dataset

This dataset was created by Luis Lira

Contents
Data Cleaning Excel Tutorial
kaggle.com
Updated Jul 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Khaled Idris (2023). Data Cleaning Excel Tutorial [Dataset]. https://www.kaggle.com/datasets/mohamedkhaledidris/data-cleaning-excel-tutorial
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohamed Khaled Idris
Description
Dataset

This dataset was created by Mohamed Khaled Idris

Contents
E
Data from: Facebook Data for Sentiment Analysis
live.european-language-grid.eu
lindat.mff.cuni.cz
+1more
binary format
Updated Jul 16, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Facebook Data for Sentiment Analysis [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1057
Explore at:
binary formatAvailable download formats
Dataset updated
Jul 16, 2013
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Corpus consisting of 10,000 Facebook posts manually annotated on sentiment (2,587 positive, 5,174 neutral, 1,991 negative and 248 bipolar posts). The archive contains data and statistics in an Excel file (FBData.xlsx) and gold data in two text files with posts (gold-posts.txt) and labels (gols-labels.txt) on corresponding lines.
o
Data from: Cleaning Data with Open Refine
explore.openaire.eu
Updated Jan 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Richard Berry; Dr Luc Small; Dr Jeff Christiansen (2016). Cleaning Data with Open Refine [Dataset]. http://doi.org/10.5281/zenodo.6423839
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6423839
Dataset updated
Jan 1, 2016
Authors
Dr Richard Berry; Dr Luc Small; Dr Jeff Christiansen
Description
About this course Do you have messy data from multiple inconsistent sources, or open-responses to questionnaires? Do you want to improve the quality of your data by refining it and using the power of the internet? Open Refine is the perfect partner to Excel. It is a powerful, free tool for exploring, normalising and cleaning datasets, and extending data by accessing the internet through APIs. In this course we’ll work through the various features of Refine, including importing data, faceting, clustering, and calling remote APIs, by working on a fictional but plausible humanities research project. Learning Outcomes Download, install and run Open Refine Import data from csv, text or online sources and create projects Navigate data using the Open Refine interface Explore data by using facets Clean data using clustering Parse data using GREL syntax Extend data using Application Programming Interfaces (APIs) Export project for use in other applications Prerequisites The course has no prerequisites. Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.
popular baby names with data cleaning
kaggle.com
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Real Sourabh Singhal (2023). popular baby names with data cleaning [Dataset]. https://www.kaggle.com/datasets/realsourabhsinghal/popular-baby-names-with-data-cleaning/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Real Sourabh Singhal
Description
It completely data clean excel file to attain accurate data analysis with proper visualization
Call Center Performance MS Excel Analysis
kaggle.com
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oluwabori Abiodun-Johnson (2023). Call Center Performance MS Excel Analysis [Dataset]. https://www.kaggle.com/datasets/oluwaboriaj/call-center-dataset-analysis/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Oluwabori Abiodun-Johnson
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
--------CALL CENTER PERFORMANCE DATASET ANALYSIS--------

This is a self-guided project.

The Call Center dataset contained customer data such as caller id, customer name, date, call channel, city, state, reason for calling, call duration, e.t.c.

I tasked myself with identifying trends and patterns so as to create a summarical overview of the data which can give an overview-level understanding of the data to technical and non-technical viewers.

OBJECTIVES: Create a dashboard (using charts, slicers and KPIs) which can be used to statistically track, monitor and visualize the performance of a Call Center.

SOFTWARE TOOLS USED: Microsoft Excel

ANALYTICAL ACTIONS PERFORMED: Data Importation, Data Processing, Data Cleaning, VLOOKUP Pivot Tables Data Visualization (Dashboard creation) Connection Reporting (connecting slicers to Dashboard)
f
Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping
figshare.com
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maryam Binti Haji Abdul Halim (2025). Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping [Dataset]. http://doi.org/10.6084/m9.figshare.28147451.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.28147451.v1
Dataset updated
Jan 6, 2025
Dataset provided by
figshare
Authors
Maryam Binti Haji Abdul Halim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This project focuses on data mapping, integration, and analysis to support the development and enhancement of six UNCDF operational applications: OrgTraveler, Comms Central, Internal Support Hub, Partnership 360, SmartHR, and TimeTrack. These apps streamline workflows for travel claims, internal support, partnership management, and time tracking within UNCDF.Key Features and Tools:Data Mapping for Salesforce CRM Migration: Structured and mapped data flows to ensure compatibility and seamless migration to Salesforce CRM.Python for Data Cleaning and Transformation: Utilized pandas, numpy, and APIs to clean, preprocess, and transform raw datasets into standardized formats.Power BI Dashboards: Designed interactive dashboards to visualize workflows and monitor performance metrics for decision-making.Collaboration Across Platforms: Integrated Google Collab for code collaboration and Microsoft Excel for data validation and analysis.
i
Agriculture Sample Census Survey 2002-2003 - Tanzania
catalog.ihsn.org
datacatalog.ihsn.org
+1more
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Bureau of Statistics (2019). Agriculture Sample Census Survey 2002-2003 - Tanzania [Dataset]. https://catalog.ihsn.org/catalog/1086
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
National Bureau of Statistics
Office of Chief Government Statistician-Zanzibar
Time period covered
2004
Area covered
Tanzania
Description
Abstract

The 2003 Agriculture Sample Census was designed to meet the data needs of a wide range of users down to district level including policy makers at local, regional and national levels, rural development agencies, funding institutions, researchers, NGOs, farmer organisations, etc. As a result the dataset is both more numerous in its sample and detailed in its scope compared to previous censuses and surveys. To date this is the most detailed Agricultural Census carried out in Africa.

The census was carried out in order to: · Identify structural changes if any, in the size of farm household holdings, crop and livestock production, farm input and implement use. It also seeks to determine if there are any improvements in rural infrastructure and in the level of agriculture household living conditions; · Provide benchmark data on productivity, production and agricultural practices in relation to policies and interventions promoted by the Ministry of Agriculture and Food Security and other stake holders. · Establish baseline data for the measurement of the impact of high level objectives of the Agriculture Sector Development Programme (ASDP), National Strategy for Growth and Reduction of Poverty (NSGRP) and other rural development programs and projects. · Obtain benchmark data that will be used to address specific issues such as: food security, rural poverty, gender, agro-processing, marketing, service delivery, etc.

Geographic coverage

Tanzania Mainland and Zanzibar

Analysis unit

Households

Individuals

Universe

Large scale, small scale and community farms.

Kind of data

Census/enumeration data [cen]

Sampling procedure

The Mainland sample consisted of 3,221 villages. These villages were drawn from the National Master Sample (NMS) developed by the National Bureau of Statistics (NBS) to serve as a national framework for the conduct of household based surveys in the country. The National Master Sample was developed from the 2002 Population and Housing Census. The total Mainland sample was 48,315 agricultural households. In Zanzibar a total of 317 enumeration areas (EAs) were selected and 4,755 agriculture households were covered. Nationwide, all regions and districts were sampled with the exception of three urban districts (two from Mainland and one from Zanzibar).

In both Mainland and Zanzibar, a stratified two stage sample was used. The number of villages/EAs selected for the first stage was based on a probability proportional to the number of villages in each district. In the second stage, 15 households were selected from a list of farming households in each selected Village/EA, using systematic random sampling, with the village chairpersons assisting to locate the selected households.

Mode of data collection

Face-to-face [f2f]

Research instrument

The census covered agriculture in detail as well as many other aspects of rural development and was conducted using three different questionnaires: • Small scale questionnaire • Community level questionnaire • Large scale farm questionnaire

The small scale farm questionnaire was the main census instrument and it includes questions related to crop and livestock production and practices; population demographics; access to services, resources and infrastructure; and issues on poverty, gender and subsistence versus profit making production unit.

The community level questionnaire was designed to collect village level data such as access and use of common resources, community tree plantation and seasonal farm gate prices.

The large scale farm questionnaire was administered to large farms either privately or corporately managed.

Questionnaire Design The questionnaires were designed following user meetings to ensure that the questions asked were in line with users data needs. Several features were incorporated into the design of the questionnaires to increase the accuracy of the data: • Where feasible all variables were extensively coded to reduce post enumeration coding error. • The definitions for each section were printed on the opposite page so that the enumerator could easily refer to the instructions whilst interviewing the farmer. • The responses to all questions were placed in boxes printed on the questionnaire, with one box per character. This feature made it possible to use scanning and Intelligent Character Recognition (ICR) technologies for data entry. • Skip patterns were used to reduce unnecessary and incorrect coding of sections which do not apply to the respondent. • Each section was clearly numbered, which facilitated the use of skip patterns and provided a reference for data type coding for the programming of CSPro, SPSS and the dissemination applications.

Cleaning operations

Data processing consisted of the following processes: · Data entry · Data structure formatting · Batch validation · Tabulation

Data Entry Scanning and ICR data capture technology for the small holder questionnaire were used on the Mainland. This not only increased the speed of data entry, it also increased the accuracy due to the reduction of keystroke errors. Interactive validation routines were incorporated into the ICR software to track errors during the verification process. The scanning operation was so successful that it is highly recommended for adoption in future censuses/surveys. In Zanzibar all data was entered manually using CSPro.

Prior to scanning, all questionnaires underwent a manual cleaning exercise. This involved checking that the questionnaire had a full set of pages, correct identification and good handwriting. A score was given to each questionnaire based on the legibility and the completeness of enumeration. This score will be used to assess the quality of enumeration and supervision in order to select the best field staff for future censuses/surveys.

CSPro was used for data entry of all Large Scale Farm and community based questionnaires due to the relatively small number of questionnaires. It was also used to enter data from the 2,880 small holder questionnaires that were rejected by the ICR extraction application.

Data Structure Formatting A program was developed in visual basic to automatically alter the structure of the output from the scanning/extraction process in order to harmonise it with the manually entered data. The program automatically checked and changed the number of digits for each variable, the record type code, the number of questionnaires in the village, the consistency of the Village ID Code and saved the data of one village in a file named after the village code.

Batch Validation A batch validation program was developed in order to identify inconsistencies within a questionnaire. This is in addition to the interactive validation during the ICR extraction process. The procedures varied from simple range checking within each variable to the more complex checking between variables. It took six months to screen, edit and validate the data from the smallholder questionnaires. After the long process of data cleaning, tabulations were prepared based on a pre-designed tabulation plan.

Tabulations Statistical Package for Social Sciences (SPSS) was used to produce the Census tabulations and Microsoft Excel was used to organize the tables and compute additional indicators. Excel was also used to produce charts while ArcView and Freehand were used for the maps.

Analysis and Report Preparation The analysis in this report focuses on regional comparisons, time series and national production estimates. Microsoft Excel was used to produce charts; ArcView and Freehand were used for maps, whereas Microsoft Word was used to compile the report.

Data Quality A great deal of emphasis was placed on data quality throughout the whole exercise from planning, questionnaire design, training, supervision, data entry, validation and cleaning/editing. As a result of this, it is believed that the census is highly accurate and representative of what was experienced at field level during the Census year. With very few exceptions, the variables in the questionnaire are within the norms for Tanzania and they follow expected time series trends when compared to historical data. Standard Errors and Coefficients of Variation for the main variables are presented in the Technical Report (Volume I).

Sampling error estimates

The Sampling Error found on page (21) up to page (22) in the Technical Report for Agriculture Sample Census Survey 2002-2003
d
Data from: Data cleaning and enrichment through data integration: networking...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar (2025). Data cleaning and enrichment through data integration: networking the Italian academia [Dataset]. http://doi.org/10.5061/dryad.wpzgmsbwj
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.wpzgmsbwj
Dataset updated
Feb 25, 2025
Dataset provided by
Dryad Digital Repository
Authors
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar
Description
We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts. , The proposed network is built starting from two distinct data sources:

the entire dataset dump from Semantic Scholar (with particular emphasis on the authors and papers datasets) the entire list of Italian faculty members as maintained by Cineca (under appointment by the Italian Ministry of University and Research).

By means of a custom name-identity recognition algorithm (details are available in the accompanying paper published in Scientific Data), the names of the authors in the Semantic Scholar dataset have been mapped against the names contained in the Cineca dataset and authors with no match (e.g., because of not being part of an Italian university) have been discarded. The remaining authors will compose the nodes of the network, which have been enriched with node-related (i.e., author-related) attributes. In order to build the network edges, we leveraged the papers dataset from Semantic Scholar: specifically, any two authors are said to be connected if there is at least one pap..., , # Data cleaning and enrichment through data integration: networking the Italian academia

https://doi.org/10.5061/dryad.wpzgmsbwj

Manuscript published inÂ Scientific Data with DOI .

Description of the data and file structure

This repository contains two main data files:

edge_data_AGG.csv, the full network in comma-separated edge list format (this file contains mainly temporal co-authorship information);

Coauthorship_Network_AGG.graphml, the full network in GraphML format.Â

along with several supplementary data, listed below, useful only to build the network (i.e., for reproducibility only):

University-City-match.xlsx, an Excel file that maps the name of a university against the city where its respective headquarter is located;

Areas-SS-CINECA-match.xlsx, an Excel file that maps the research areas in Cineca against the research areas in Semantic Scholar.

Description of the main data files

TheÂ `Coauthorship_Networ...
c
Supporting data for "Using the scanning fluid dynamic gauging device to...
repository.cam.ac.uk
bin, xls
Updated Sep 16, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali, Akin; Ward, Glenn; Alam, Zayeem; Wilson, David Ian (2015). Supporting data for "Using the scanning fluid dynamic gauging device to understand the cleaning of baked lard layers" (Ali et al., Journal of Surfactants and Detergents) [Dataset]. http://doi.org/10.17863/CAM.68933
Explore at:
xls(182272 bytes), bin(493625 bytes), bin(302421 bytes), bin(123870 bytes), bin(234148 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.68933
Dataset updated
Sep 16, 2015
Dataset provided by
Apollo
University of Cambridge
Authors
Ali, Akin; Ward, Glenn; Alam, Zayeem; Wilson, David Ian
License
Attribution-ShareAlike 2.0 (CC BY-SA 2.0)https://creativecommons.org/licenses/by-sa/2.0/
License information was derived automatically
Description
These are Microsoft Excel files which contain the data used to generate the plots in the paper. The files are labelled by Figure number: a complete description is given in the paper.
d
Data from: Functional morphology and efficiency of the antenna cleaner in...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Jun 26, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Hackmann; Henry Delacave; Adam Robinson; David Labonte; Walter Federle (2015). Functional morphology and efficiency of the antenna cleaner in Camponotus rufifemur ants [Dataset]. http://doi.org/10.5061/dryad.88q18
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.88q18
Dataset updated
Jun 26, 2015
Dataset provided by
Dryad
Authors
Alexander Hackmann; Henry Delacave; Adam Robinson; David Labonte; Walter Federle
Time period covered
2015
Area covered
Cambridge, UK
Description
Data for manuscript “Functional morphology and efficiency of the antenna cleaner in Camponotus rufifemur ants"Excel file includes 3 data sheets. One sheet for each experiment. The corresponding figures from the manuscript are mentioned above the actual data.Manuscript data.xlsx
R
Use of Open Government Data by Brazilian Public Institutions - Dataset
datarepositorium.uminho.pt
tsv
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Repositório de Dados da Universidade do Minho (2024). Use of Open Government Data by Brazilian Public Institutions - Dataset [Dataset]. http://doi.org/10.34622/datarepositorium/YSZBRR
Explore at:
tsv(72782), tsv(132726), tsv(3978), tsv(2825), tsv(50775), tsv(91790), tsv(47002), tsv(4169)Available download formats
Unique identifier
https://doi.org/10.34622/datarepositorium/YSZBRR
Dataset updated
Aug 23, 2024
Dataset provided by
Repositório de Dados da Universidade do Minho
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains the results of a survey about the use of open government data applied to public agents working in public institutions in Brazil. It has two sets, one with questionnaire responses and metadata and the second with a coding table with interview extracts: 1) In the first dataset, each row holds a response to a questionnaire about the public agent's perceptions of the use and reuse of open government data in Brazilian public institutions. Columns store the questionnaire questions. Data were collected between 8 June and 13 July 2021, and this sample is composed of responses from 40 federal, state, and municipal public administrators. Thus, this dataset contains 40 rows and 158 columns. Data were collected on the LimeSurvey platform, where it was screened for missing values and incomplete responses. After cleaning, data were exported to Excel in tabular format. Questionnaire responses are provided in two files ResultsSurvey_OGDUseBRPubInstitutions_DataSet_PT and ResultsSurvey_OGDUseBRPubInstitutions_DataSet_EN. They contain the same information in Portuguese and English. 2) The second dataset records the code table of the interviews about the benefits, barriers, enablers, and drivers of open government data (OGD) use in Brazilian public institutions. A questionnaire applied to public agents working in Brazilian public institutions was followed up by interviews to broaden an understanding of the use of OGD. Nine interviews were conducted between May 17-31, 2022. This dataset represents the perspective of these public agents. The dataset contains 97 lines and six columns. Each row of the dataset lists the factor code used in the questionnaire, the factor descriptions in Portuguese and English, the interviewee code, the transcription extract of an interviewee narration collected in Portuguese, and the English translation. After collection in Portuguese, interviews were automatically transcribed using the NVivo Transcription software. Then, they were anonymized, and a human reviewed the transcriptions. Interviews were coded using NVivo and used the questionnaire factors to guide coding. Coded extracts were translated to English using Google and Microsoft translators. Then, translated extracts were revised by a human and were used for reporting. The coding table was exported to Excel. Interviews extracts are provided in one file, InterviewsExtracts_OGDUseBR_PublicInstitutions_Dataset.
o
Data from: Skepticism in science and punitive attitudes
openicpsr.org
delimited
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Rydberg; Luke DeZago (2025). Skepticism in science and punitive attitudes [Dataset]. http://doi.org/10.3886/E228541V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E228541V1
Dataset updated
May 4, 2025
Dataset provided by
University of Massachusetts Lowell
Authors
Jason Rydberg; Luke DeZago
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication materials for the manuscript "Skepticism in Science and Punitive Attitudes", published in the Journal of Criminal Justice.Note that the GSS repeated cross sections for 1972 to 2018 are too large to upload here, but they can be accessed from https://gss.norc.org/content/dam/gss/get-the-data/documents/spss/GSS_spss.zipIncluded here are:(A link to the repeated cross-sections data)Each of the 3 wave panels (2006-2010; 2008-2012; 2010-2014)Replication R script for the repeated cross sections cleaning and analysisReplication R script for the panel data cleaning and analysisAn excel spreadsheet with Uniform Crime Report data to merge to the cross sections.
r
Data from: Cleaner fish are potential super-spreaders
researchdata.edu.au
Updated Aug 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Narvaez Pauline; Pauline Narvaez (2022). Cleaner fish are potential super-spreaders [Dataset]. http://doi.org/10.25903/TXVF-1216
Explore at:
Unique identifier
https://doi.org/10.25903/TXVF-1216
Dataset updated
Aug 2, 2022
Dataset provided by
James Cook University
Authors
Narvaez Pauline; Pauline Narvaez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 15, 2019 - Mar 15, 2020
Description
Abstract [Related publication]:

Cleaning symbiosis is critical for maintaining healthy biological communities in tropical marine ecosystems. However, potential negative impacts of mutualism, such as the transmission of pathogens and parasites during cleaning interactions, have rarely been evaluated. Here, we investigated whether the dedicated bluestreak cleaner wrasse Labroides dimidiatus, is susceptible to, and can transmit generalist ectoparasites between client fish. In laboratory experiments, L. dimidiatus were exposed to infective stages of three generalist ectoparasite species with contrasting life-histories. Labroides dimidiatus were susceptible to infection by the gnathiid isopod, Gnathia aureamaculosa, but significantly less susceptible to the ciliate protozoan, Cryptocaryon irritans, and the monogenean flatworm, Neobenedenia girellae, compared to control host species (Coris batuensis or Lates calcarifer). The potential for parasite transmission from a client fish to the cleaner fish was simulated using experimentally transplanted mobile adult (i.e., egg-producing) monogenean flatworms on L. dimidiatus. Parasites remained attached to cleaners for an average of two days, during which parasite egg production continued, but was reduced compared to control fish. Over this timespan, a wild cleaner may engage in several thousand cleaning interactions, providing numerous opportunities for mobile parasites to exploit cleaners as vectors. Our study provides the first experimental evidence that L. dimidiatus exhibits resistance to infective stages of some parasites yet has the potential to temporarily transport adult parasites. We propose that some parasites that evade being eaten by cleaner fish could exploit cleaning interactions as a mechanism for transmission and spread.

Data methods:

In laboratory experiments, we first test the susceptibility of L. dimidiatus to three generalist parasites with contrasting life-histories. To do so, we exposed 20 L. dimidiatus and 20 control individuals (Coris batuensis or Lates calcarifer) to infective stages of a species of gnathiid isopod Gnathia aureamaculosa, a species of monogenean flatworm Neobenedenia girellae and a species of ciliate protozoan Cryptocaryon irritans. We then test whether adult N. girellae remained attached and produced viable eggs when transferred to the skin of live Lab. dimidiatus by manually transplanted adult N. girellae from a donor host to L. dimidiatus. Finally, we test for how long adult N. girellae could survive on L. dimidiatus after being manually transplanted. All data analyses were performed in R version 4.0.2 (R Core Team 2020)
The full methodology is available in the publication shown in the Related Publication link below.

Software/equipment used to create/collect the data: Excel version 2205

Software/equipment used to manipulate/analyse the data: R Studio 2021.09.0
s
Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum...
skyquestt.com
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SkyQuest Technology (2024). Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum Cleaning Robots, Floor Cleaning Robots, Window Cleaning Robots, Pool Cleaning Robots), By Application(Residential, Commercial, Industrial, and others), By Sales Channel(Online, Offline, and Others), By Region - Industry Forecast 2024-2031 [Dataset]. https://www.skyquestt.com/report/cleaning-robot-market
Explore at:
Dataset updated
Apr 16, 2024
Dataset authored and provided by
SkyQuest Technology
License
https://www.skyquestt.com/privacy/https://www.skyquestt.com/privacy/
Time period covered
2024 - 2031
Area covered
Global
Description
Global Cleaning Robot Market size was valued at USD 4.19 billion in 2022 and is poised to grow from USD 4.97 billion in 2023 to USD 12.81 billion by 2031, growing at a CAGR of 22.9% in the forecast period (2024-2031).
d
Shanghai experiment of consequence conditions on effort - Dataset -...
catalogue.data.govt.nz
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2001). Shanghai experiment of consequence conditions on effort - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-10277999
Explore at:
Dataset updated
Feb 1, 2001
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Shanghai
Description
This data set supports the journal paper "Manipulating the consequences of tests: How Shanghai teens react to different consequences", published in Educational Research and Evaluation, v26 (n5-6), pp.221-251.The data were obtained to test the impact of different levels of consequence for taking a test on student test-taking effort. The data are part of the PhD project of Anran Zhao, supervised by Brown & Meissel.The data set is in MS Excel format. Sheet 1 provides an anonymous wide-format data set post-cleaning and missing value analysis of the data.Sheet 2 provides a description of each variable.
Pivot Tables and Charts with HR Data
kaggle.com
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carina Cruz (2025). Pivot Tables and Charts with HR Data [Dataset]. https://www.kaggle.com/datasets/carinacruz/hr-data-using-pivot-tables
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carina Cruz
Description
This project demonstrates the use of data cleaning techniques, Pivot Tables and charts in Excel to answer 3 main questions:

What is the employee age distribution?

What is the workforce gender distribution?

What is the workforce tenure distribution?

It includes 5 sheets:

Employee Data: Raw employee demographics data.

Employee Data_Edited: Raw data in table format and after data cleaning.

Age: Pivot table summarizing data for workforce age distribution and the respective chart.

Gender: Pivot table summarizing data for workforce gender distribution and the respective chart.

Tenure: Pivot table summarizing data for workforce tenure distribution and the respective chart.

You can download the Excel file with all formatting.
KAP WASH 2019 in South Sudan's Ajuong Thok and Pamir Camps - South Sudan
datacatalog.ihsn.org
microdata.unhcr.org
+1more
Updated Oct 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samaritan's Purse (2021). KAP WASH 2019 in South Sudan's Ajuong Thok and Pamir Camps - South Sudan [Dataset]. https://datacatalog.ihsn.org/catalog/9787
Explore at:
Dataset updated
Oct 14, 2021
Dataset provided by
United Nations High Commissioner for Refugeeshttp://www.unhcr.org/
Samaritan's Purse
Time period covered
2019
Area covered
South Sudan
Description
Abstract

A Knowledge, Attitudes and Practices (KAP) survey was conducted in Ajuong Thok and Pamir Refugee Camps in October 2019 to determine the current Water, Sanitation and Hygiene (WASH) conditions as well as hygiene attitudes and practices within the households (HHs) surveyed. The assessment utilized a systematic random sampling method, and a total of 1,474 HHs (735 HHs in Ajuong Thok and 739 HHs in Pamir) were surveyed using mobile data collection (MDC) within a period of 21 days. Data was cleaned and analyzed in Excel. The summary of the results is presented in this report.

The findings show that the overall average number of liters of water per person per day was 23.4, in both Ajuong Thok and Pamir Camps, which was slightly higher than the recommended United Nations High Commissioner for Refugees (UNHCR) minimum standard of at least 20 liters of water available per person per day. This is a slight improvement from the 21 liters reported the previous year. The average HH size was six people. Women comprised 83% of the surveyed respondents and males 17%. Almost all the respondents were refugees, constituting 99.5% (n=1,466). The refugees were aware of the key health and hygiene practices, possibly as a result of routine health and hygiene messages delivered to them by Samaritan´s Purse (SP) and other health partners. Most refugees had knowledge about keeping the water containers clean, washing hands during critical times, safe excreta disposal and disease prevention.

Geographic coverage

Ajuong Thok and Pamir Refugee Camps

Analysis unit

Households

Universe

All households in Ajuong Thok and Pamir Refugee Camps

Kind of data

Sample survey data [ssd]

Sampling procedure

Households were selected using systematic random sampling. Enumerators systematically walked through the camp block by block, row by row, in such a way as to pass each HH. Within blocks, enumerators started at one corner, then systematically used the sampling interval as they walked up and down each of the rows throughout the block, covering every block in Ajuong Thok and Pamir.

In each location, the first HH sampled in a block was generated using an Excel tool customized by UNHCR which generated a Random Start and Sampling Interval.

Mode of data collection

Face-to-face [f2f]

Research instrument

The survey questionnaire used to collect the data consists of the following sections: - Demographics - Water collection and storage - Drinking water hygiene - Hygiene - Sanitation - Messaging - Distribution (NFI) - Diarrhea prevalence, knowledge and health seeking behaviour - Menstrual hygiene

Cleaning operations

The data collected was uploaded to a server at the end of each day. IFormBuilder generated a Microsoft (MS) Excel spreadsheet dataset which was then cleaned and analyzed using MS Excel.

Given that SP is currently implementing a WASH program in Ajuong Thok and Pamir, the assessment data collected in these camps will not only serve as the endline for UNHCR 2018 programming but also as the baseline for 2019 programming.

Data was anonymized through decoding and local suppression.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177

Data Cleaning Sample

Explore at:

151 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.5683/SP3/ZCN177

Dataset updated

Jul 13, 2023

Dataset provided by

Borealis

Authors

Rong Luo

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Sample data for exercises in Further Adventures in Data Cleaning.

Clear search

Close search

Google apps

Main menu

Data Cleaning Sample

Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop

Excel-project: Glassdoor Data Cleaning

Dataset

Contents

Data Cleaning Excel Tutorial

Dataset

Contents

Data from: Facebook Data for Sentiment Analysis

Data from: Cleaning Data with Open Refine

popular baby names with data cleaning

Call Center Performance MS Excel Analysis

Enhancing UNCDF Operations: Power BI Dashboard Development and Data Mapping

Agriculture Sample Census Survey 2002-2003 - Tanzania

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Sampling error estimates

Data from: Data cleaning and enrichment through data integration: networking...

Description of the data and file structure

Description of the main data files

Supporting data for "Using the scanning fluid dynamic gauging device to...

Data from: Functional morphology and efficiency of the antenna cleaner in...

Use of Open Government Data by Brazilian Public Institutions - Dataset

Data from: Skepticism in science and punitive attitudes

Data from: Cleaner fish are potential super-spreaders

Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum...

Shanghai experiment of consequence conditions on effort - Dataset -...

Pivot Tables and Charts with HR Data

KAP WASH 2019 in South Sudan's Ajuong Thok and Pamir Camps - South Sudan

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Data Cleaning Sample