34 datasets found
  1. o

    Messy data for data cleaning exercise - Dataset - openAFRICA

    • open.africa
    Updated Oct 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise
    Explore at:
    Dataset updated
    Oct 6, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions

  2. Restaurant Sales-Dirty Data for Cleaning Training

    • kaggle.com
    Updated Jan 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Mohamed (2025). Restaurant Sales-Dirty Data for Cleaning Training [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/restaurant-sales-dirty-data-for-cleaning-training
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ahmed Mohamed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Restaurant Sales Dataset with Dirt Documentation

    Overview

    The Restaurant Sales Dataset with Dirt contains data for 17,534 transactions. The data introduces realistic inconsistencies ("dirt") to simulate real-world scenarios where data may have missing or incomplete information. The dataset includes sales details across multiple categories, such as starters, main dishes, desserts, drinks, and side dishes.

    Dataset Use Cases

    This dataset is suitable for: - Practicing data cleaning tasks, such as handling missing values and deducing missing information. - Conducting exploratory data analysis (EDA) to study restaurant sales patterns. - Feature engineering to create new variables for machine learning tasks.

    Columns Description

    Column NameDescriptionExample Values
    Order IDA unique identifier for each order.ORD_123456
    Customer IDA unique identifier for each customer.CUST_001
    CategoryThe category of the purchased item.Main Dishes, Drinks
    ItemThe name of the purchased item. May contain missing values due to data dirt.Grilled Chicken, None
    PriceThe static price of the item. May contain missing values.15.0, None
    QuantityThe quantity of the purchased item. May contain missing values.1, None
    Order TotalThe total price for the order (Price * Quantity). May contain missing values.45.0, None
    Order DateThe date when the order was placed. Always present.2022-01-15
    Payment MethodThe payment method used for the transaction. May contain missing values due to data dirt.Cash, None

    Key Characteristics

    1. Data Dirtiness:

      • Missing values in key columns (Item, Price, Quantity, Order Total, Payment Method) simulate real-world challenges.
      • At least one of the following conditions is ensured for each record to identify an item:
        • Item is present.
        • Price is present.
        • Both Quantity and Order Total are present.
      • If Price or Quantity is missing, the other is used to deduce the missing value (e.g., Order Total / Quantity).
    2. Menu Categories and Items:

      • Items are divided into five categories:
        • Starters: E.g., Chicken Melt, French Fries.
        • Main Dishes: E.g., Grilled Chicken, Steak.
        • Desserts: E.g., Chocolate Cake, Ice Cream.
        • Drinks: E.g., Coca Cola, Water.
        • Side Dishes: E.g., Mashed Potatoes, Garlic Bread.

    3 Time Range: - Orders span from January 1, 2022, to December 31, 2023.

    Cleaning Suggestions

    1. Handle Missing Values:

      • Fill missing Order Total or Quantity using the formula: Order Total = Price * Quantity.
      • Deduce missing Price from Order Total / Quantity if both are available.
    2. Validate Data Consistency:

      • Ensure that calculated values (Order Total = Price * Quantity) match.
    3. Analyze Missing Patterns:

      • Study the distribution of missing values across categories and payment methods.

    Menu Map with Prices and Categories

    CategoryItemPrice
    StartersChicken Melt8.0
    StartersFrench Fries4.0
    StartersCheese Fries5.0
    StartersSweet Potato Fries5.0
    StartersBeef Chili7.0
    StartersNachos Grande10.0
    Main DishesGrilled Chicken15.0
    Main DishesSteak20.0
    Main DishesPasta Alfredo12.0
    Main DishesSalmon18.0
    Main DishesVegetarian Platter14.0
    DessertsChocolate Cake6.0
    DessertsIce Cream5.0
    DessertsFruit Salad4.0
    DessertsCheesecake7.0
    DessertsBrownie6.0
    DrinksCoca Cola2.5
    DrinksOrange Juice3.0
    Drinks ...
  3. B

    Data Cleaning Sample

    • borealisdata.ca
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  4. Employment Of India CLeaned and Messy Data

    • kaggle.com
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SONIA SHINDE (2025). Employment Of India CLeaned and Messy Data [Dataset]. https://www.kaggle.com/datasets/soniaaaaaaaa/employment-of-india-cleaned-and-messy-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SONIA SHINDE
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    India
    Description

    This dataset presents a dual-version representation of employment-related data from India, crafted to highlight the importance of data cleaning and transformation in any real-world data science or analytics project.

    🔹 Dataset Composition:

    It includes two parallel datasets: 1. Messy Dataset (Raw) – Represents a typical unprocessed dataset often encountered in data collection from surveys, databases, or manual entries. 2. Cleaned Dataset – This version demonstrates how proper data preprocessing can significantly enhance the quality and usability of data for analytical and visualization purposes.

    Each record captures multiple attributes related to individuals in the Indian job market, including: - Age Group
    - Employment Status (Employed/Unemployed)
    - Monthly Salary (INR)
    - Education Level
    - Industry Sector
    - Years of Experience
    - Location
    - Perceived AI Risk
    - Date of Data Recording

    Transformations & Cleaning Applied:

    The raw dataset underwent comprehensive transformations to convert it into its clean, analysis-ready form: - Missing Values: Identified and handled using either row elimination (where critical data was missing) or imputation techniques. - Duplicate Records: Identified using row comparison and removed to prevent analytical skew. - Inconsistent Formatting: Unified inconsistent naming in columns (like 'monthly_salary_(inr)' → 'Monthly Salary (INR)'), capitalization, and string spacing. - Incorrect Data Types: Converted columns like salary from string/object to float for numerical analysis. - Outliers: Detected and handled based on domain logic and distribution analysis. - Categorization: Converted numeric ages into grouped age categories for comparative analysis. - Standardization: Uniform labels for employment status, industry names, education, and AI risk levels were applied for visualization clarity.

    Purpose & Utility:

    This dataset is ideal for learners and professionals who want to understand: - The impact of messy data on visualization and insights - How transformation steps can dramatically improve data interpretation - Practical examples of preprocessing techniques before feeding into ML models or BI tools

    It's also useful for: - Training ML models with clean inputs
    - Data storytelling with visual clarity
    - Demonstrating reproducibility in data cleaning pipelines

    By examining both the messy and clean datasets, users gain a deeper appreciation for why “garbage in, garbage out” rings true in the world of data science.

  5. q

    Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio

    • qubeshub.org
    Updated Jul 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shelly Gaynor (2020). Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio [Dataset]. http://doi.org/10.25334/DRGD-F069
    Explore at:
    Dataset updated
    Jul 16, 2020
    Dataset provided by
    QUBES
    Authors
    Shelly Gaynor
    Description

    Access and clean an open source herbarium dataset using Excel or RStudio.

  6. GoodReads Small Dataset

    • kaggle.com
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Fitas (2024). GoodReads Small Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7619407
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Maria Fitas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    An unclean copy of my GoodReads dataset (as for 2024/02/11) in csv format with 406 entries.

    Data types included are integers, floats, strings, data/time and booleans (both in TRUE/FALSE and 0/1 formats).

    This is a good dataset to practice cleaning and analysing as it contains missing values, inconsistent formats and outliers.

    Disclaimer: Since GoodReads notifies you when there are duplicate entries, which meant I had no duplicate entries, I asked an AI to add 20 random duplicate entries to the data set for the purpose of this project.

  7. messy data after cleaning

    • kaggle.com
    Updated Mar 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Narenrdra Panwar (2025). messy data after cleaning [Dataset]. https://www.kaggle.com/datasets/narenrdrapanwar/messy-data-after-cleaning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Narenrdra Panwar
    Description

    Dataset

    This dataset was created by Narenrdra Panwar

    Contents

  8. A Messy House Price Dataset

    • kaggle.com
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sina Fadakhah (2025). A Messy House Price Dataset [Dataset]. https://www.kaggle.com/datasets/sinafadakhah/a-messy-house-price-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sina Fadakhah
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset is not valid, but my purpose in uploading it was to fill a gap I felt—the lack of a truly messy dataset. A major part of data science, beyond choosing algorithms and other techniques, is cleaning and preprocessing data. Therefore, this dataset can serve as good practice for learning how to clean a messy dataset.

  9. _labels1.csv. This data set representss the label of the corresponding...

    • figshare.com
    txt
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    naillah gul (2023). _labels1.csv. This data set representss the label of the corresponding samples in data.csv file [Dataset]. http://doi.org/10.6084/m9.figshare.24270088.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 9, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    naillah gul
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets contain pixel-level hyperspectral data of six snow and glacier classes. They have been extracted from a Hyperspectral image. The dataset "data.csv" has 5417 * 142 samples belonging to the classes: Clean snow, Dirty ice, Firn, Glacial ice, Ice mixed debris, and Water body. The dataset "_labels1.csv" has corresponding labels of the "data.csv" file. The dataset "RGB.csv" has only 5417 * 3 samples. There are only three band values in this file while "data.csv" has 142 band values.

  10. R

    Solar Panels Rgb Dataset

    • universe.roboflow.com
    zip
    Updated Jan 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    earthbook (2023). Solar Panels Rgb Dataset [Dataset]. https://universe.roboflow.com/earthbook-zdvbx/solar-panels-rgb/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset authored and provided by
    earthbook
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Dirty Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Maintenance Monitoring: This model could be implemented in drones or satellite imaging systems to monitor the cleanliness of large solar panel installations. These systems could regularly analyze the status of the panels and notify maintenance staff when cleaning is required to maintain optimal efficiency.

    2. Efficiency Optimization: Determining how much grime or dirt is on a solar panel can help estimate the reduction in efficiency. Using this model, energy companies can better plan cleanups to optimize energy production.

    3. Damage Detection: The identification of dirt and grime on panels can also potentially assist in detecting physical damage or irregularities that could be a sign of bigger issues.

    4. Automated Cleaning: Autonomous cleaning robots could utilize this model to identify dirty panels in real time and target specific areas that need to be cleaned, improving their efficiency and effectiveness.

    5. Environmental Impact Studies: By identifying dirty solar panels, environmental scientists and researchers can analyze patterns, such as dust deposition over time or environmental impact, that might help in furthering research on solar panel placement strategies and environmental adjustments.

  11. N

    SweepNYC Street Cleaning

    • data.cityofnewyork.us
    • gimi9.com
    • +1more
    application/rdfxml +5
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Sanitation (DSNY) (2025). SweepNYC Street Cleaning [Dataset]. https://data.cityofnewyork.us/w/c23c-uwsm/25te-f2tw?cur=wyBtfPFI4hG
    Explore at:
    application/rdfxml, csv, xml, json, application/rssxml, tsvAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset authored and provided by
    Department of Sanitation (DSNY)
    Description

    This dataset contains NYC Street Centerline (CSCL) physical_IDs which represent segments of streets and the date and time those street segments were last visited by a mechanical broom.

    This dataset is connected to SweepNYC (nyc.gov/sweepnyc), a tool maintained by the NYC Department of Sanitation (DSNY) that allows New Yorkers to track the progress of DSNY mechanical brooms. The mechanical broom, also known as a street sweeper, is New York City's first line of defense against dirty curbs. Each one picks up 1,500 lbs. of litter on a single shift. For information on how to file a street sweeping complaint see the article on NYC 311.

  12. Flights

    • kaggle.com
    zip
    Updated Jul 3, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Metter (2018). Flights [Dataset]. https://kaggle.com/mmetter/flights
    Explore at:
    zip(52177108 bytes)Available download formats
    Dataset updated
    Jul 3, 2018
    Authors
    Michael Metter
    Description

    Dataset

    This dataset was created by Michael Metter

    Contents

  13. R

    Aircraft Cleanliness Dataset

    • universe.roboflow.com
    zip
    Updated Aug 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aircraft Detection (2022). Aircraft Cleanliness Dataset [Dataset]. https://universe.roboflow.com/aircraft-detection-hpzth/aircraft-cleanliness/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 25, 2022
    Dataset authored and provided by
    Aircraft Detection
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Dirt Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Airport Maintenance Monitoring: The Aircraft Cleanliness model can be used by airport authorities to monitor the cleanliness of aircraft and ensure timely cleaning services. This can help maintain a high standard of hygiene and visual appearance for airplanes while also reducing the risk of corrosion or damage due to accumulated dirt.

    2. Airline Quality Control: Airlines can use the model to monitor and compare the cleanliness of their fleet, ensuring consistent quality associated with their brand. It can be employed to hold cleaning crews accountable and establish benchmarks for cleanliness quality.

    3. Passenger Experience Enhancement: Airline ratings and review platforms can integrate the Aircraft Cleanliness model to rate airlines based on the cleanliness of their airplanes. This information can then be provided to passengers, helping them make informed decisions when choosing airlines.

    4. Cleaning Service Optimization: Cleaning companies specializing in aircraft maintenance can utilize this model to optimize their cleaning services. By detecting specific dirt classes and focusing on those areas, they can save time and resources while providing a more effective cleaning process.

    5. Environmental Impact Analysis: Researchers can use the Aircraft Cleanliness model to study the impact of different environmental conditions on the accumulation of dirt on airplanes. This information can lead to the development of new materials or coatings that help reduce the rate at which dirt and contaminants adhere to the aircraft surface, minimizing cleaning requirements and environmental impacts.

  14. c

    Solar Photovoltaics Panel for Dust Detection Dataset

    • cubig.ai
    Updated Oct 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2024). Solar Photovoltaics Panel for Dust Detection Dataset [Dataset]. https://cubig.ai/store/products/518/solar-photovoltaics-panel-for-dust-detection-dataset
    Explore at:
    Dataset updated
    Oct 12, 2024
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Solar Photovoltaics Panel for Dust Detection Dataset is an image dataset designed to classify the presence of dust on the surface of solar panels. It consists of images of clean and dusty (dirty) panels.

    2) Data Utilization (1) Characteristics of the Solar Photovoltaics Panel for Dust Detection Dataset: • The dataset contains images capturing the clean and dirty states of solar panels, which can be used to train AI models that detect performance degradation caused by dust accumulation. • The images were collected in outdoor environments, accurately reflecting the real-world conditions of solar power systems.

    (2) Applications of the Solar Photovoltaics Panel for Dust Detection Dataset: • Development of automated solar panel diagnostic models: The dataset can be used to train deep learning classification models that automatically determine the cleanliness of solar panels and predict appropriate maintenance timing. • Smart solar power plant monitoring systems: It can support the development of AI-powered monitoring systems that detect dusty panels in real time based on camera data collected from solar power facilities.

  15. R

    Dropletdataset 01 03 Dataset

    • universe.roboflow.com
    zip
    Updated Sep 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianan (2022). Dropletdataset 01 03 Dataset [Dataset]. https://universe.roboflow.com/jianan/dropletdataset-01-03/dataset/6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 1, 2022
    Dataset authored and provided by
    Jianan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Droplet Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Hygiene Monitoring and Alert System: Implement the "dropletdataset-01-03" model in public spaces, such as restrooms and kitchens, to detect droplets and hands. This can assist in promoting proper handwashing and hygiene practices by automatically alerting facility managers to spills or unclean surfaces.

    2. Hand-droplet Interaction Analysis: Use this model in laboratory settings to study the dynamics of droplets and their interaction with hands. This can help understand the implications of various contact scenarios and inform safety protocols for hazardous materials or in medical environments.

    3. Dry Erase Board Maintenance Assistance: Use the model to identify when a dry erase board has been wiped clean by detecting the presence of droplets and hands. This can be employed in educational settings to automatically trigger reminders for board cleanup or to evaluate the cleanliness of a board after use.

    4. Artistic Rendering Assistance: Employ the "dropletdataset-01-03" model in computer-aided design software to help artists replicate realistic droplet textures and hand markings when creating digital or physical artwork, particularly in scenarios where the artwork involves fluid-like materials or hand gestures.

    5. Robotics and Automation: Incorporate the model in robotic and automated cleaning systems to differentiate between droplets and hands during cleaning processes. This can improve precision and accuracy in maintaining cleanliness while minimizing the chances of unwanted interactions with human operators.

  16. R

    Solar_panel_combine Dataset

    • universe.roboflow.com
    zip
    Updated Sep 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    home (2022). Solar_panel_combine Dataset [Dataset]. https://universe.roboflow.com/home-ocdun/solar_panel_combine/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 4, 2022
    Dataset authored and provided by
    home
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Solar Panel Solar Panel1 Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Solar Panel Maintenance: The model could be used by solar panel service providers to automate the process of assessment and maintenance. By analyzing the state of the panels (clean, unclean, or dusty) it can help them identify which panels need immediate cleaning or service.

    2. Industrial Inspection: In facilities with a large number of solar panels such as solar farms, the model could assist in streamlining routine checks. Rather than manual inspection, images can be taken and analyzed for cleanliness, helping to efficiently allocate cleaning resources and maintain optimum efficiency.

    3. Home Automation Systems: The model could be integrated into smart home systems to alert homeowners when their solar panels are dirty or dusty. It can act as a smart tool for homes using solar energy as one of their primary energy sources.

    4. Drone-based Inspection: For large scale solar installations in hard-to-reach areas (e.g. large roofs, deserts), drones equipped with cameras and the computer vision model can perform inspections. This can be safer and more effective, with the AI determining the status of each panel.

    5. Educational Purposes: This computer vision model could be used as a teaching tool in educational institutions for courses related to renewable energy. It can demonstrate the importance of solar panel cleanliness in energy efficiency, encouraging students to engage with practical, real-world issues in their learning.

  17. Data from: Evaluation of Cleaning and Disinfection Protocols for Commercial...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Sep 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2024). Evaluation of Cleaning and Disinfection Protocols for Commercial Farm Equipment Following a Foreign Animal Disease Outbreak [Dataset]. https://catalog.data.gov/dataset/evaluation-of-cleaning-and-disinfection-protocols-for-commercial-farm-equipment-following-
    Explore at:
    Dataset updated
    Sep 1, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Aims: Evaluate the microbiocidal efficacy of a cleaning and disinfection (C&D) treatment using stainless steel coupons applied to three common types of animal mortality transport vehicles when exposed to agricultural conditions. Methods: Metal test coupons, inoculated with bacteriophage MS2, were affixed to the undercarriage of three types of animal mortality transport vehicles at various locations. Coupons were grimed by maneuvering the test vehicles down a series of wet dirt roads. Coupons were attached and extracted at various points to evaluate C&D performance with and without grime. C&D efficacy using a water-supplied pressure washing system and a dilute sodium hypochlorite (NaOCl) solution was determined by comparing the difference in recovered viable virus between positive control coupons and test coupons. This dataset is associated with the following publication: Boe, T., W. Calfee, P. Lemieux, S. Serre, A. Abdel-Hady, M. Monge, D. Aslett, B. Akers, and J. Howard. Evaluation of Cleaning and Disinfection Protocols for Commercial Farm Equipment Following a Foreign Animal Disease Outbreak. Remediation Journal. John Wiley & Sons, Inc., Hoboken, NJ, USA, 33(4): 379-387, (2023).

  18. A

    ‘US Minimum Wage by State from 1968 to 2020’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘US Minimum Wage by State from 1968 to 2020’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-minimum-wage-by-state-from-1968-to-2020-850a/04ae742e/?iid=018-239&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Analysis of ‘US Minimum Wage by State from 1968 to 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/lislejoem/us-minimum-wage-by-state-from-1968-to-2017 on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    US Minimum Wage by State from 1968 to 2020

    The Basics

    • What is this? In the United States, states and the federal government set minimum hourly pay ("minimum wage") that workers can receive to ensure that citizens experience a minimum quality of life. This dataset provides the minimum wage data set by each state and the federal government from 1968 to 2020.

    • Why did you put this together? While looking online for a clean dataset for minimum wage data by state, I was having trouble finding one. I decided to create one myself and provide it to the community.

    • Who do we thank for this data? The United States Department of Labor compiles a table of this data on their website. I took the time to clean it up and provide it here for you. :) The GitHub repository (with R Code for the cleaning process) can be found here!

    Content

    This is a cleaned dataset of US state and federal minimum wages from 1968 to 2020 (including 2020 equivalency values). The data was scraped from the United States Department of Labor's table of minimum wage by state.

    Description of Data

    The values in the dataset are as follows: - Year: The year of the data. All minimum wage values are as of January 1 except 1968 and 1969, which are as of February 1. - State: The state or territory of the data. - State.Minimum.Wage: The actual State's minimum wage on January 1 of Year. - State.Minimum.Wage.2020.Dollars: The State.Minimum.Wage in 2020 dollars. - Federal.Minimum.Wage: The federal minimum wage on January 1 of Year. - Federal.Minimum.Wage.2020.Dollars: The Federal.Minimum.Wage in 2020 dollars. - Effective.Minimum.Wage: The minimum wage that is enforced in State on January 1 of Year. Because the federal minimum wage takes effect if the State's minimum wage is lower than the federal minimum wage, this is the higher of the two. - Effective.Minimum.Wage.2020.Dollars: The Effective.Minimum.Wage in 2020 dollars. - CPI.Average: The average value of the Consumer Price Index in Year. When I pulled the data from the Bureau of Labor Statistics, I selected the dataset with "all items in U.S. city average, all urban consumers, not seasonally adjusted". - Department.Of.Labor.Uncleaned.Data: The unclean, scraped value from the Department of Labor's website. - Department.Of.Labor.Cleaned.Low.Value: The State's lowest enforced minimum wage on January 1 of Year. If there is only one minimum wage, this and the value for Department.Of.Labor.Cleaned.High.Value are identical. (Some states enforce different minimum wage laws depending on the size of the business. In states where this is the case, generally, smaller businesses have slightly lower minimum wage requirements.) - Department.Of.Labor.Cleaned.Low.Value.2020.Dollars: The Department.Of.Labor.Cleaned.Low.Value in 2020 dollars. - Department.Of.Labor.Cleaned.High.Value: The State's higher enforced minimum wage on January 1 of Year. If there is only one minimum wage, this and the value for Department.Of.Labor.Cleaned.Low.Value are identical. - Department.Of.Labor.Cleaned.High.Value.2020.Dollars: The Department.Of.Labor.Cleaned.High.Value in 2020 dollars. - Footnote: The footnote provided on the Department of Labor's website. See more below.

    Data Footnotes

    As laws differ significantly from territory to territory, especially relating to whom is protected by minimum wage laws, the following footnotes are located throughout the data in Footnote to add more context to the minimum wage. The original footnotes can be found here.

    --- Original source retains full ownership of the source dataset ---

  19. e

    Hygiene Council Global Survey on Personal and Household Hygiene, 2011 -...

    • b2find.eudat.eu
    Updated May 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Hygiene Council Global Survey on Personal and Household Hygiene, 2011 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/867d9e07-9298-5248-aec1-56da0caedc84
    Explore at:
    Dataset updated
    May 6, 2023
    Description

    Abstract copyright UK Data Service and data collection copyright owner. The Hygiene Council Global Survey on Personal and Household Hygiene, 2011 is the first study to highlight the role of manners, orderliness and routine on hygiene behaviours. A global survey on the determinants of personal and household hygiene, with particular reference to hand-washing with soap and cleaning of household surfaces, was conducted in 1000 households in each of twelve countries across the world. A structural equation model of hygiene behaviour and its consequences derived from theory was then estimated for both behaviours. The analysis showed that the frequency of hand washing with soap is strongly tied to how automatically it is performed. Whether or not someone is busy, or tired, can also impact on whether they stop to wash hands. Surface cleaning was strongly linked to possessing a cleaning routine, so, like hand washing, it is primarily determined by non-cognitive causes. It is also inspired by the perception that one is living in a dirty environment, especially if one has a strong sense of contamination, as well as a need to keep one’s surroundings tidy. Being concerned with good manners is also linked to the performance of these behaviours. Those who see others around them as practicing surface cleaning are also more likely to do so themselves. Main Topics: Global determinants of personal and household hygiene behaviour. Multi-stage stratified random sample At least one country was chosen to represent each of the seven continents (UK, USA, Canada, France, Germany, Australia, South Africa, Malaysia, Brazil, Middle East) with the additional of two of the most populated countries in the world (China and India). Within-country, samples were based on standard representative splits of gender, age, household income and geographical region. Face-to-face interview Telephone interview Web-based survey

  20. d

    PCCF+: Postal Code Conversion File Plus

    • dataone.org
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeff Moon (2023). PCCF+: Postal Code Conversion File Plus [Dataset]. http://doi.org/10.5683/SP3/SXIQPW
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Jeff Moon
    Description

    This hands-on workshop has two parts. The first part covers working with SAS and the Postal Code Conversion File Plus. You'll start with Postal Codes, and leave with Census geography that can be linked to Census demographics. The second part introduces OpenRefine, an open source software platform for cleaning up messy data files. Initially developed by Google, OpenRefine will open your eyes to the beauty of clean data! No previous experience required.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise

Messy data for data cleaning exercise - Dataset - openAFRICA

Explore at:
Dataset updated
Oct 6, 2021
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions

Search
Clear search
Close search
Google apps
Main menu