34 datasets found

o
Messy data for data cleaning exercise - Dataset - openAFRICA
open.africa
Updated Oct 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise
Explore at:
Dataset updated
Oct 6, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions

Restaurant Sales-Dirty Data for Cleaning Training

kaggle.com

Updated Jan 25, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ahmed Mohamed (2025). Restaurant Sales-Dirty Data for Cleaning Training [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/restaurant-sales-dirty-data-for-cleaning-training

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 25, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ahmed Mohamed

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Restaurant Sales Dataset with Dirt Documentation

Overview

The Restaurant Sales Dataset with Dirt contains data for 17,534 transactions. The data introduces realistic inconsistencies ("dirt") to simulate real-world scenarios where data may have missing or incomplete information. The dataset includes sales details across multiple categories, such as starters, main dishes, desserts, drinks, and side dishes.

Dataset Use Cases

This dataset is suitable for: - Practicing data cleaning tasks, such as handling missing values and deducing missing information. - Conducting exploratory data analysis (EDA) to study restaurant sales patterns. - Feature engineering to create new variables for machine learning tasks.

Columns Description

Column Name	Description	Example Values
`Order ID`	A unique identifier for each order.	`ORD_123456`
`Customer ID`	A unique identifier for each customer.	`CUST_001`
`Category`	The category of the purchased item.	`Main Dishes`, `Drinks`
`Item`	The name of the purchased item. May contain missing values due to data dirt.	`Grilled Chicken`, `None`
`Price`	The static price of the item. May contain missing values.	`15.0`, `None`
`Quantity`	The quantity of the purchased item. May contain missing values.	`1`, `None`
`Order Total`	The total price for the order (`Price * Quantity`). May contain missing values.	`45.0`, `None`
`Order Date`	The date when the order was placed. Always present.	`2022-01-15`
`Payment Method`	The payment method used for the transaction. May contain missing values due to data dirt.	`Cash`, `None`

Key Characteristics

Data Dirtiness:
- Missing values in key columns (Item, Price, Quantity, Order Total, Payment Method) simulate real-world challenges.
- At least one of the following conditions is ensured for each record to identify an item:
  - Item is present.
  - Price is present.
  - Both Quantity and Order Total are present.
- If Price or Quantity is missing, the other is used to deduce the missing value (e.g., Order Total / Quantity).
Menu Categories and Items:
- Items are divided into five categories:
  - Starters: E.g., Chicken Melt, French Fries.
  - Main Dishes: E.g., Grilled Chicken, Steak.
  - Desserts: E.g., Chocolate Cake, Ice Cream.
  - Drinks: E.g., Coca Cola, Water.
  - Side Dishes: E.g., Mashed Potatoes, Garlic Bread.

3 Time Range: - Orders span from January 1, 2022, to December 31, 2023.

Cleaning Suggestions

Handle Missing Values:
- Fill missing Order Total or Quantity using the formula: Order Total = Price * Quantity.
- Deduce missing Price from Order Total / Quantity if both are available.
Validate Data Consistency:
- Ensure that calculated values (Order Total = Price * Quantity) match.
Analyze Missing Patterns:
- Study the distribution of missing values across categories and payment methods.

Menu Map with Prices and Categories

Category	Item	Price
Starters	Chicken Melt	8.0
Starters	French Fries	4.0
Starters	Cheese Fries	5.0
Starters	Sweet Potato Fries	5.0
Starters	Beef Chili	7.0
Starters	Nachos Grande	10.0
Main Dishes	Grilled Chicken	15.0
Main Dishes	Steak	20.0
Main Dishes	Pasta Alfredo	12.0
Main Dishes	Salmon	18.0
Main Dishes	Vegetarian Platter	14.0
Desserts	Chocolate Cake	6.0
Desserts	Ice Cream	5.0
Desserts	Fruit Salad	4.0
Desserts	Cheesecake	7.0
Desserts	Brownie	6.0
Drinks	Coca Cola	2.5
Drinks	Orange Juice	3.0
Drinks ...

B
Data Cleaning Sample
borealisdata.ca
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
Employment Of India CLeaned and Messy Data
kaggle.com
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SONIA SHINDE (2025). Employment Of India CLeaned and Messy Data [Dataset]. https://www.kaggle.com/datasets/soniaaaaaaaa/employment-of-india-cleaned-and-messy-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SONIA SHINDE
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
India
Description
This dataset presents a dual-version representation of employment-related data from India, crafted to highlight the importance of data cleaning and transformation in any real-world data science or analytics project.

🔹 Dataset Composition:

It includes two parallel datasets: 1. Messy Dataset (Raw) – Represents a typical unprocessed dataset often encountered in data collection from surveys, databases, or manual entries. 2. Cleaned Dataset – This version demonstrates how proper data preprocessing can significantly enhance the quality and usability of data for analytical and visualization purposes.

Each record captures multiple attributes related to individuals in the Indian job market, including: - Age Group
- Employment Status (Employed/Unemployed)
- Monthly Salary (INR)
- Education Level
- Industry Sector
- Years of Experience
- Location
- Perceived AI Risk
- Date of Data Recording

Transformations & Cleaning Applied:

The raw dataset underwent comprehensive transformations to convert it into its clean, analysis-ready form: - Missing Values: Identified and handled using either row elimination (where critical data was missing) or imputation techniques. - Duplicate Records: Identified using row comparison and removed to prevent analytical skew. - Inconsistent Formatting: Unified inconsistent naming in columns (like 'monthly_salary_(inr)' → 'Monthly Salary (INR)'), capitalization, and string spacing. - Incorrect Data Types: Converted columns like salary from string/object to float for numerical analysis. - Outliers: Detected and handled based on domain logic and distribution analysis. - Categorization: Converted numeric ages into grouped age categories for comparative analysis. - Standardization: Uniform labels for employment status, industry names, education, and AI risk levels were applied for visualization clarity.

Purpose & Utility:

This dataset is ideal for learners and professionals who want to understand: - The impact of messy data on visualization and insights - How transformation steps can dramatically improve data interpretation - Practical examples of preprocessing techniques before feeding into ML models or BI tools

It's also useful for: - Training ML models with clean inputs
- Data storytelling with visual clarity
- Demonstrating reproducibility in data cleaning pipelines

By examining both the messy and clean datasets, users gain a deeper appreciation for why “garbage in, garbage out” rings true in the world of data science.
q
Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio
qubeshub.org
Updated Jul 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shelly Gaynor (2020). Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio [Dataset]. http://doi.org/10.25334/DRGD-F069
Explore at:
Unique identifier
https://doi.org/10.25334/DRGD-F069
Dataset updated
Jul 16, 2020
Dataset provided by
QUBES
Authors
Shelly Gaynor
Description
Access and clean an open source herbarium dataset using Excel or RStudio.
GoodReads Small Dataset
kaggle.com
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Fitas (2024). GoodReads Small Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7619407
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7619407
Dataset updated
Feb 13, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Maria Fitas
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
An unclean copy of my GoodReads dataset (as for 2024/02/11) in csv format with 406 entries.

Data types included are integers, floats, strings, data/time and booleans (both in TRUE/FALSE and 0/1 formats).

This is a good dataset to practice cleaning and analysing as it contains missing values, inconsistent formats and outliers.

Disclaimer: Since GoodReads notifies you when there are duplicate entries, which meant I had no duplicate entries, I asked an AI to add 20 random duplicate entries to the data set for the purpose of this project.
messy data after cleaning
kaggle.com
Updated Mar 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Narenrdra Panwar (2025). messy data after cleaning [Dataset]. https://www.kaggle.com/datasets/narenrdrapanwar/messy-data-after-cleaning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 2, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Narenrdra Panwar
Description
Dataset

This dataset was created by Narenrdra Panwar

Contents
A Messy House Price Dataset
kaggle.com
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sina Fadakhah (2025). A Messy House Price Dataset [Dataset]. https://www.kaggle.com/datasets/sinafadakhah/a-messy-house-price-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sina Fadakhah
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset is not valid, but my purpose in uploading it was to fill a gap I felt—the lack of a truly messy dataset. A major part of data science, beyond choosing algorithms and other techniques, is cleaning and preprocessing data. Therefore, this dataset can serve as good practice for learning how to clean a messy dataset.
_labels1.csv. This data set representss the label of the corresponding...
figshare.com
txt
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
naillah gul (2023). _labels1.csv. This data set representss the label of the corresponding samples in data.csv file [Dataset]. http://doi.org/10.6084/m9.figshare.24270088.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24270088.v1
Dataset updated
Oct 9, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
naillah gul
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets contain pixel-level hyperspectral data of six snow and glacier classes. They have been extracted from a Hyperspectral image. The dataset "data.csv" has 5417 * 142 samples belonging to the classes: Clean snow, Dirty ice, Firn, Glacial ice, Ice mixed debris, and Water body. The dataset "_labels1.csv" has corresponding labels of the "data.csv" file. The dataset "RGB.csv" has only 5417 * 3 samples. There are only three band values in this file while "data.csv" has 142 band values.
R
Solar Panels Rgb Dataset
universe.roboflow.com
zip
Updated Jan 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
earthbook (2023). Solar Panels Rgb Dataset [Dataset]. https://universe.roboflow.com/earthbook-zdvbx/solar-panels-rgb/model/1
Explore at:
zipAvailable download formats
Dataset updated
Jan 17, 2023
Dataset authored and provided by
earthbook
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Dirty Bounding Boxes
Description
Here are a few use cases for this project:

Maintenance Monitoring: This model could be implemented in drones or satellite imaging systems to monitor the cleanliness of large solar panel installations. These systems could regularly analyze the status of the panels and notify maintenance staff when cleaning is required to maintain optimal efficiency.

Efficiency Optimization: Determining how much grime or dirt is on a solar panel can help estimate the reduction in efficiency. Using this model, energy companies can better plan cleanups to optimize energy production.

Damage Detection: The identification of dirt and grime on panels can also potentially assist in detecting physical damage or irregularities that could be a sign of bigger issues.

Automated Cleaning: Autonomous cleaning robots could utilize this model to identify dirty panels in real time and target specific areas that need to be cleaned, improving their efficiency and effectiveness.

Environmental Impact Studies: By identifying dirty solar panels, environmental scientists and researchers can analyze patterns, such as dust deposition over time or environmental impact, that might help in furthering research on solar panel placement strategies and environmental adjustments.
N
SweepNYC Street Cleaning
data.cityofnewyork.us
gimi9.com
+1more
application/rdfxml +5
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Sanitation (DSNY) (2025). SweepNYC Street Cleaning [Dataset]. https://data.cityofnewyork.us/w/c23c-uwsm/25te-f2tw?cur=wyBtfPFI4hG
Explore at:
application/rdfxml, csv, xml, json, application/rssxml, tsvAvailable download formats
Dataset updated
Jul 31, 2025
Dataset authored and provided by
Department of Sanitation (DSNY)
Description
This dataset contains NYC Street Centerline (CSCL) physical_IDs which represent segments of streets and the date and time those street segments were last visited by a mechanical broom.

This dataset is connected to SweepNYC (nyc.gov/sweepnyc), a tool maintained by the NYC Department of Sanitation (DSNY) that allows New Yorkers to track the progress of DSNY mechanical brooms. The mechanical broom, also known as a street sweeper, is New York City's first line of defense against dirty curbs. Each one picks up 1,500 lbs. of litter on a single shift. For information on how to file a street sweeping complaint see the article on NYC 311.
Flights
kaggle.com
zip
Updated Jul 3, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Metter (2018). Flights [Dataset]. https://kaggle.com/mmetter/flights
Explore at:
zip(52177108 bytes)Available download formats
Dataset updated
Jul 3, 2018
Authors
Michael Metter
Description
Dataset

This dataset was created by Michael Metter

Contents
R
Aircraft Cleanliness Dataset
universe.roboflow.com
zip
Updated Aug 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aircraft Detection (2022). Aircraft Cleanliness Dataset [Dataset]. https://universe.roboflow.com/aircraft-detection-hpzth/aircraft-cleanliness/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Aug 25, 2022
Dataset authored and provided by
Aircraft Detection
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Dirt Bounding Boxes
Description
Here are a few use cases for this project:

Airport Maintenance Monitoring: The Aircraft Cleanliness model can be used by airport authorities to monitor the cleanliness of aircraft and ensure timely cleaning services. This can help maintain a high standard of hygiene and visual appearance for airplanes while also reducing the risk of corrosion or damage due to accumulated dirt.

Airline Quality Control: Airlines can use the model to monitor and compare the cleanliness of their fleet, ensuring consistent quality associated with their brand. It can be employed to hold cleaning crews accountable and establish benchmarks for cleanliness quality.

Passenger Experience Enhancement: Airline ratings and review platforms can integrate the Aircraft Cleanliness model to rate airlines based on the cleanliness of their airplanes. This information can then be provided to passengers, helping them make informed decisions when choosing airlines.

Cleaning Service Optimization: Cleaning companies specializing in aircraft maintenance can utilize this model to optimize their cleaning services. By detecting specific dirt classes and focusing on those areas, they can save time and resources while providing a more effective cleaning process.

Environmental Impact Analysis: Researchers can use the Aircraft Cleanliness model to study the impact of different environmental conditions on the accumulation of dirt on airplanes. This information can lead to the development of new materials or coatings that help reduce the rate at which dirt and contaminants adhere to the aircraft surface, minimizing cleaning requirements and environmental impacts.
c
Solar Photovoltaics Panel for Dust Detection Dataset
cubig.ai
Updated Oct 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2024). Solar Photovoltaics Panel for Dust Detection Dataset [Dataset]. https://cubig.ai/store/products/518/solar-photovoltaics-panel-for-dust-detection-dataset
Explore at:
Dataset updated
Oct 12, 2024
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Solar Photovoltaics Panel for Dust Detection Dataset is an image dataset designed to classify the presence of dust on the surface of solar panels. It consists of images of clean and dusty (dirty) panels.

2) Data Utilization (1) Characteristics of the Solar Photovoltaics Panel for Dust Detection Dataset: • The dataset contains images capturing the clean and dirty states of solar panels, which can be used to train AI models that detect performance degradation caused by dust accumulation. • The images were collected in outdoor environments, accurately reflecting the real-world conditions of solar power systems.

(2) Applications of the Solar Photovoltaics Panel for Dust Detection Dataset: • Development of automated solar panel diagnostic models: The dataset can be used to train deep learning classification models that automatically determine the cleanliness of solar panels and predict appropriate maintenance timing. • Smart solar power plant monitoring systems: It can support the development of AI-powered monitoring systems that detect dusty panels in real time based on camera data collected from solar power facilities.
R
Dropletdataset 01 03 Dataset
universe.roboflow.com
zip
Updated Sep 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jianan (2022). Dropletdataset 01 03 Dataset [Dataset]. https://universe.roboflow.com/jianan/dropletdataset-01-03/dataset/6
Explore at:
zipAvailable download formats
Dataset updated
Sep 1, 2022
Dataset authored and provided by
Jianan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Droplet Bounding Boxes
Description
Here are a few use cases for this project:

Hygiene Monitoring and Alert System: Implement the "dropletdataset-01-03" model in public spaces, such as restrooms and kitchens, to detect droplets and hands. This can assist in promoting proper handwashing and hygiene practices by automatically alerting facility managers to spills or unclean surfaces.

Hand-droplet Interaction Analysis: Use this model in laboratory settings to study the dynamics of droplets and their interaction with hands. This can help understand the implications of various contact scenarios and inform safety protocols for hazardous materials or in medical environments.

Dry Erase Board Maintenance Assistance: Use the model to identify when a dry erase board has been wiped clean by detecting the presence of droplets and hands. This can be employed in educational settings to automatically trigger reminders for board cleanup or to evaluate the cleanliness of a board after use.

Artistic Rendering Assistance: Employ the "dropletdataset-01-03" model in computer-aided design software to help artists replicate realistic droplet textures and hand markings when creating digital or physical artwork, particularly in scenarios where the artwork involves fluid-like materials or hand gestures.

Robotics and Automation: Incorporate the model in robotic and automated cleaning systems to differentiate between droplets and hands during cleaning processes. This can improve precision and accuracy in maintaining cleanliness while minimizing the chances of unwanted interactions with human operators.
R
Solar_panel_combine Dataset
universe.roboflow.com
zip
Updated Sep 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
home (2022). Solar_panel_combine Dataset [Dataset]. https://universe.roboflow.com/home-ocdun/solar_panel_combine/model/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 4, 2022
Dataset authored and provided by
home
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Solar Panel Solar Panel1 Bounding Boxes
Description
Here are a few use cases for this project:

Solar Panel Maintenance: The model could be used by solar panel service providers to automate the process of assessment and maintenance. By analyzing the state of the panels (clean, unclean, or dusty) it can help them identify which panels need immediate cleaning or service.

Industrial Inspection: In facilities with a large number of solar panels such as solar farms, the model could assist in streamlining routine checks. Rather than manual inspection, images can be taken and analyzed for cleanliness, helping to efficiently allocate cleaning resources and maintain optimum efficiency.

Home Automation Systems: The model could be integrated into smart home systems to alert homeowners when their solar panels are dirty or dusty. It can act as a smart tool for homes using solar energy as one of their primary energy sources.

Drone-based Inspection: For large scale solar installations in hard-to-reach areas (e.g. large roofs, deserts), drones equipped with cameras and the computer vision model can perform inspections. This can be safer and more effective, with the AI determining the status of each panel.

Educational Purposes: This computer vision model could be used as a teaching tool in educational institutions for courses related to renewable energy. It can demonstrate the importance of solar panel cleanliness in energy efficiency, encouraging students to engage with practical, real-world issues in their learning.
Data from: Evaluation of Cleaning and Disinfection Protocols for Commercial...
catalog.data.gov
s.cnmilf.com
Updated Sep 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). Evaluation of Cleaning and Disinfection Protocols for Commercial Farm Equipment Following a Foreign Animal Disease Outbreak [Dataset]. https://catalog.data.gov/dataset/evaluation-of-cleaning-and-disinfection-protocols-for-commercial-farm-equipment-following-
Explore at:
Dataset updated
Sep 1, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Aims: Evaluate the microbiocidal efficacy of a cleaning and disinfection (C&D) treatment using stainless steel coupons applied to three common types of animal mortality transport vehicles when exposed to agricultural conditions. Methods: Metal test coupons, inoculated with bacteriophage MS2, were affixed to the undercarriage of three types of animal mortality transport vehicles at various locations. Coupons were grimed by maneuvering the test vehicles down a series of wet dirt roads. Coupons were attached and extracted at various points to evaluate C&D performance with and without grime. C&D efficacy using a water-supplied pressure washing system and a dilute sodium hypochlorite (NaOCl) solution was determined by comparing the difference in recovered viable virus between positive control coupons and test coupons. This dataset is associated with the following publication: Boe, T., W. Calfee, P. Lemieux, S. Serre, A. Abdel-Hady, M. Monge, D. Aslett, B. Akers, and J. Howard. Evaluation of Cleaning and Disinfection Protocols for Commercial Farm Equipment Following a Foreign Animal Disease Outbreak. Remediation Journal. John Wiley & Sons, Inc., Hoboken, NJ, USA, 33(4): 379-387, (2023).
A
‘US Minimum Wage by State from 1968 to 2020’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘US Minimum Wage by State from 1968 to 2020’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-minimum-wage-by-state-from-1968-to-2020-850a/04ae742e/?iid=018-239&v=presentation
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Analysis of ‘US Minimum Wage by State from 1968 to 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/lislejoem/us-minimum-wage-by-state-from-1968-to-2017 on 12 November 2021.

--- Dataset description provided by original source is as follows ---

US Minimum Wage by State from 1968 to 2020

The Basics

What is this? In the United States, states and the federal government set minimum hourly pay ("minimum wage") that workers can receive to ensure that citizens experience a minimum quality of life. This dataset provides the minimum wage data set by each state and the federal government from 1968 to 2020.

Why did you put this together? While looking online for a clean dataset for minimum wage data by state, I was having trouble finding one. I decided to create one myself and provide it to the community.

Who do we thank for this data? The United States Department of Labor compiles a table of this data on their website. I took the time to clean it up and provide it here for you. :) The GitHub repository (with R Code for the cleaning process) can be found here!

Content

This is a cleaned dataset of US state and federal minimum wages from 1968 to 2020 (including 2020 equivalency values). The data was scraped from the United States Department of Labor's table of minimum wage by state.

Description of Data

The values in the dataset are as follows: - Year: The year of the data. All minimum wage values are as of January 1 except 1968 and 1969, which are as of February 1. - State: The state or territory of the data. - State.Minimum.Wage: The actual State's minimum wage on January 1 of Year. - State.Minimum.Wage.2020.Dollars: The State.Minimum.Wage in 2020 dollars. - Federal.Minimum.Wage: The federal minimum wage on January 1 of Year. - Federal.Minimum.Wage.2020.Dollars: The Federal.Minimum.Wage in 2020 dollars. - Effective.Minimum.Wage: The minimum wage that is enforced in State on January 1 of Year. Because the federal minimum wage takes effect if the State's minimum wage is lower than the federal minimum wage, this is the higher of the two. - Effective.Minimum.Wage.2020.Dollars: The Effective.Minimum.Wage in 2020 dollars. - CPI.Average: The average value of the Consumer Price Index in Year. When I pulled the data from the Bureau of Labor Statistics, I selected the dataset with "all items in U.S. city average, all urban consumers, not seasonally adjusted". - Department.Of.Labor.Uncleaned.Data: The unclean, scraped value from the Department of Labor's website. - Department.Of.Labor.Cleaned.Low.Value: The State's lowest enforced minimum wage on January 1 of Year. If there is only one minimum wage, this and the value for Department.Of.Labor.Cleaned.High.Value are identical. (Some states enforce different minimum wage laws depending on the size of the business. In states where this is the case, generally, smaller businesses have slightly lower minimum wage requirements.) - Department.Of.Labor.Cleaned.Low.Value.2020.Dollars: The Department.Of.Labor.Cleaned.Low.Value in 2020 dollars. - Department.Of.Labor.Cleaned.High.Value: The State's higher enforced minimum wage on January 1 of Year. If there is only one minimum wage, this and the value for Department.Of.Labor.Cleaned.Low.Value are identical. - Department.Of.Labor.Cleaned.High.Value.2020.Dollars: The Department.Of.Labor.Cleaned.High.Value in 2020 dollars. - Footnote: The footnote provided on the Department of Labor's website. See more below.

Data Footnotes

As laws differ significantly from territory to territory, especially relating to whom is protected by minimum wage laws, the following footnotes are located throughout the data in Footnote to add more context to the minimum wage. The original footnotes can be found here.

--- Original source retains full ownership of the source dataset ---
e
Hygiene Council Global Survey on Personal and Household Hygiene, 2011 -...
b2find.eudat.eu
Updated May 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Hygiene Council Global Survey on Personal and Household Hygiene, 2011 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/867d9e07-9298-5248-aec1-56da0caedc84
Explore at:
Dataset updated
May 6, 2023
Description
Abstract copyright UK Data Service and data collection copyright owner. The Hygiene Council Global Survey on Personal and Household Hygiene, 2011 is the first study to highlight the role of manners, orderliness and routine on hygiene behaviours. A global survey on the determinants of personal and household hygiene, with particular reference to hand-washing with soap and cleaning of household surfaces, was conducted in 1000 households in each of twelve countries across the world. A structural equation model of hygiene behaviour and its consequences derived from theory was then estimated for both behaviours. The analysis showed that the frequency of hand washing with soap is strongly tied to how automatically it is performed. Whether or not someone is busy, or tired, can also impact on whether they stop to wash hands. Surface cleaning was strongly linked to possessing a cleaning routine, so, like hand washing, it is primarily determined by non-cognitive causes. It is also inspired by the perception that one is living in a dirty environment, especially if one has a strong sense of contamination, as well as a need to keep one’s surroundings tidy. Being concerned with good manners is also linked to the performance of these behaviours. Those who see others around them as practicing surface cleaning are also more likely to do so themselves. Main Topics: Global determinants of personal and household hygiene behaviour. Multi-stage stratified random sample At least one country was chosen to represent each of the seven continents (UK, USA, Canada, France, Germany, Australia, South Africa, Malaysia, Brazil, Middle East) with the additional of two of the most populated countries in the world (China and India). Within-country, samples were based on standard representative splits of gender, age, household income and geographical region. Face-to-face interview Telephone interview Web-based survey
d
PCCF+: Postal Code Conversion File Plus
dataone.org
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeff Moon (2023). PCCF+: Postal Code Conversion File Plus [Dataset]. http://doi.org/10.5683/SP3/SXIQPW
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/SXIQPW
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Jeff Moon
Description
This hands-on workshop has two parts. The first part covers working with SAS and the Postal Code Conversion File Plus. You'll start with Postal Codes, and leave with Census geography that can be linked to Census demographics. The second part introduces OpenRefine, an open source software platform for cleaning up messy data files. Initially developed by Google, OpenRefine will open your eyes to the beauty of clean data! No previous experience required.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise

Messy data for data cleaning exercise - Dataset - openAFRICA

Explore at:

Dataset updated

Oct 6, 2021

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions

Clear search

Close search

Google apps

Main menu

Messy data for data cleaning exercise - Dataset - openAFRICA

Restaurant Sales-Dirty Data for Cleaning Training

Restaurant Sales Dataset with Dirt Documentation

Overview

Dataset Use Cases

Columns Description

Key Characteristics

Cleaning Suggestions

Menu Map with Prices and Categories

Data Cleaning Sample

Employment Of India CLeaned and Messy Data

🔹 Dataset Composition:

Transformations & Cleaning Applied:

Purpose & Utility:

Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio

GoodReads Small Dataset

messy data after cleaning

Dataset

Contents

A Messy House Price Dataset

_labels1.csv. This data set representss the label of the corresponding...

Solar Panels Rgb Dataset

SweepNYC Street Cleaning

Flights

Dataset

Contents

Aircraft Cleanliness Dataset

Solar Photovoltaics Panel for Dust Detection Dataset

Dropletdataset 01 03 Dataset

Solar_panel_combine Dataset

Data from: Evaluation of Cleaning and Disinfection Protocols for Commercial...

‘US Minimum Wage by State from 1968 to 2020’ analyzed by Analyst-2

US Minimum Wage by State from 1968 to 2020

The Basics

Content

Description of Data

Data Footnotes

Hygiene Council Global Survey on Personal and Household Hygiene, 2011 -...

PCCF+: Postal Code Conversion File Plus

Messy data for data cleaning exercise - Dataset - openAFRICA