100+ datasets found

o
Messy data for data cleaning exercise - Dataset - openAFRICA
open.africa
Updated Oct 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise
Explore at:
Dataset updated
Oct 6, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions
B
Data Cleaning Sample
borealisdata.ca
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
D
Data Cleaning Tools Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Data Cleaning Tools Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-cleaning-tools-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Cleaning Tools Market Outlook

As of 2023, the global market size for data cleaning tools is estimated at $2.5 billion, with projections indicating that it will reach approximately $7.1 billion by 2032, reflecting a robust CAGR of 12.1% during the forecast period. This growth is primarily driven by the increasing importance of data quality in business intelligence and analytics workflows across various industries.

The growth of the data cleaning tools market can be attributed to several critical factors. Firstly, the exponential increase in data generation across industries necessitates efficient tools to manage data quality. Poor data quality can result in significant financial losses, inefficient business processes, and faulty decision-making. Organizations recognize the value of clean, accurate data in driving business insights and operational efficiency, thereby propelling the adoption of data cleaning tools. Additionally, regulatory requirements and compliance standards also push companies to maintain high data quality standards, further driving market growth.

Another significant growth factor is the rising adoption of AI and machine learning technologies. These advanced technologies rely heavily on high-quality data to deliver accurate results. Data cleaning tools play a crucial role in preparing datasets for AI and machine learning models, ensuring that the data is free from errors, inconsistencies, and redundancies. This surge in the use of AI and machine learning across various sectors like healthcare, finance, and retail is driving the demand for efficient data cleaning solutions.

The proliferation of big data analytics is another critical factor contributing to market growth. Big data analytics enables organizations to uncover hidden patterns, correlations, and insights from large datasets. However, the effectiveness of big data analytics is contingent upon the quality of the data being analyzed. Data cleaning tools help in sanitizing large datasets, making them suitable for analysis and thus enhancing the accuracy and reliability of analytics outcomes. This trend is expected to continue, fueling the demand for data cleaning tools.

In terms of regional growth, North America holds a dominant position in the data cleaning tools market. The region's strong technological infrastructure, coupled with the presence of major market players and a high adoption rate of advanced data management solutions, contributes to its leadership. However, the Asia Pacific region is anticipated to witness the highest growth rate during the forecast period. The rapid digitization of businesses, increasing investments in IT infrastructure, and a growing focus on data-driven decision-making are key factors driving the market in this region.

As organizations strive to maintain high data quality standards, the role of an Email List Cleaning Service becomes increasingly vital. These services ensure that email databases are free from invalid addresses, duplicates, and outdated information, thereby enhancing the effectiveness of marketing campaigns and communications. By leveraging sophisticated algorithms and validation techniques, email list cleaning services help businesses improve their email deliverability rates and reduce the risk of being flagged as spam. This not only optimizes marketing efforts but also protects the reputation of the sender. As a result, the demand for such services is expected to grow alongside the broader data cleaning tools market, as companies recognize the importance of maintaining clean and accurate contact lists.

Component Analysis

The data cleaning tools market can be segmented by component into software and services. The software segment encompasses various tools and platforms designed for data cleaning, while the services segment includes consultancy, implementation, and maintenance services provided by vendors.

The software segment holds the largest market share and is expected to continue leading during the forecast period. This dominance can be attributed to the increasing adoption of automated data cleaning solutions that offer high efficiency and accuracy. These software solutions are equipped with advanced algorithms and functionalities that can handle large volumes of data, identify errors, and correct them without manual intervention. The rising adoption of cloud-based data cleaning software further bolsters this segment, as it offers scalability and ease of
food data cleaning
kaggle.com
zip
Updated Apr 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AbdElRahman16 (2024). food data cleaning [Dataset]. https://www.kaggle.com/datasets/abdelrahman16/food-n
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 13, 2024
Authors
AbdElRahman16
Description
Dataset

This dataset was created by AbdElRahman16

Contents
h
codeparrot-clean
huggingface.co
Updated Dec 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CodeParrot (2021). codeparrot-clean [Dataset]. https://huggingface.co/datasets/codeparrot/codeparrot-clean
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2021
Dataset provided by
Good Engineering, Inc
Authors
CodeParrot
Description
CodeParrot 🦜 Dataset Cleaned

What is it?

A dataset of Python files from Github. This is the deduplicated version of the codeparrot.

Processing

The original dataset contains a lot of duplicated and noisy data. Therefore, the dataset was cleaned with the following steps:

Deduplication Remove exact matches

Filtering Average line length < 100 Maximum line length < 1000 Alpha numeric characters fraction > 0.25 Remove auto-generated files (keyword search)

For… See the full description on the dataset page: https://huggingface.co/datasets/codeparrot/codeparrot-clean.
q
Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio
qubeshub.org
Updated Jul 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shelly Gaynor (2020). Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio [Dataset]. http://doi.org/10.25334/DRGD-F069
Explore at:
Unique identifier
https://doi.org/10.25334/DRGD-F069
Dataset updated
Jul 16, 2020
Dataset provided by
QUBES
Authors
Shelly Gaynor
Description
Access and clean an open source herbarium dataset using Excel or RStudio.
Employment Of India CLeaned and Messy Data
kaggle.com
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SONIA SHINDE (2025). Employment Of India CLeaned and Messy Data [Dataset]. https://www.kaggle.com/datasets/soniaaaaaaaa/employment-of-india-cleaned-and-messy-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SONIA SHINDE
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
India
Description
This dataset presents a dual-version representation of employment-related data from India, crafted to highlight the importance of data cleaning and transformation in any real-world data science or analytics project.

🔹 Dataset Composition:

It includes two parallel datasets: 1. Messy Dataset (Raw) – Represents a typical unprocessed dataset often encountered in data collection from surveys, databases, or manual entries. 2. Cleaned Dataset – This version demonstrates how proper data preprocessing can significantly enhance the quality and usability of data for analytical and visualization purposes.

Each record captures multiple attributes related to individuals in the Indian job market, including: - Age Group
- Employment Status (Employed/Unemployed)
- Monthly Salary (INR)
- Education Level
- Industry Sector
- Years of Experience
- Location
- Perceived AI Risk
- Date of Data Recording

Transformations & Cleaning Applied:

The raw dataset underwent comprehensive transformations to convert it into its clean, analysis-ready form: - Missing Values: Identified and handled using either row elimination (where critical data was missing) or imputation techniques. - Duplicate Records: Identified using row comparison and removed to prevent analytical skew. - Inconsistent Formatting: Unified inconsistent naming in columns (like 'monthly_salary_(inr)' → 'Monthly Salary (INR)'), capitalization, and string spacing. - Incorrect Data Types: Converted columns like salary from string/object to float for numerical analysis. - Outliers: Detected and handled based on domain logic and distribution analysis. - Categorization: Converted numeric ages into grouped age categories for comparative analysis. - Standardization: Uniform labels for employment status, industry names, education, and AI risk levels were applied for visualization clarity.

Purpose & Utility:

This dataset is ideal for learners and professionals who want to understand: - The impact of messy data on visualization and insights - How transformation steps can dramatically improve data interpretation - Practical examples of preprocessing techniques before feeding into ML models or BI tools

It's also useful for: - Training ML models with clean inputs
- Data storytelling with visual clarity
- Demonstrating reproducibility in data cleaning pipelines

By examining both the messy and clean datasets, users gain a deeper appreciation for why “garbage in, garbage out” rings true in the world of data science.
Students Performance | Clean Dataset
kaggle.com
Updated Oct 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Roshan Riaz (2024). Students Performance | Clean Dataset [Dataset]. https://www.kaggle.com/datasets/muhammadroshaanriaz/students-performance-dataset-cleaned
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhammad Roshan Riaz
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
https://www.insidehighered.com/sites/default/files/2024-02/GettyImages-1072191138.jpg" alt="image"> The dataset includes:

Gender: Useful for analyzing performance differences between male and female students.

Race/Ethnicity: Allows analysis of academic performance trends across different racial or ethnic groups.

Parental Level of Education: Indicates the educational background of the student's family.

Lunch: Shows whether students receive a free or reduced lunch, which is often a socioeconomic indicator.

Test Preparation Course: This tells whether students completed a test prep course, which could impact their performance.

Math Score: Provides a measure of each student’s performance in math, used to calculate averages or trends across various demographics.

Reading Score: Measures performance in reading, allowing for insights into literacy and comprehension levels among students.

Writing Score: Evaluates students' writing skills, which can be analyzed to assess overall literacy and expression.
d
Daily Tasks Park Cleaning Records
catalog.data.gov
data.cityofnewyork.us
+2more
Updated Jul 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Daily Tasks Park Cleaning Records [Dataset]. https://catalog.data.gov/dataset/daily-tasks-park-cleaning-records
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
data.cityofnewyork.us
Description
This data contains cleaning records for City property under the jurisdiction of or maintained by NYC Parks. It includes records of tasks such as opening parks, cleaning and restocking comfort stations, and removing graffiti, litter and natural debris. For the User Guide, please follow this link For the Data Dictionary, please follow this link
h
alpaca-cleaned
huggingface.co
kaggle.com
Updated Mar 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gene Ruebsamen (2023). alpaca-cleaned [Dataset]. https://huggingface.co/datasets/yahma/alpaca-cleaned
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 30, 2023
Authors
Gene Ruebsamen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Alpaca-Cleaned

Repository: https://github.com/gururise/AlpacaDataCleaned

Dataset Description

This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an answer.

"instruction":"Summarize the… See the full description on the dataset page: https://huggingface.co/datasets/yahma/alpaca-cleaned.
B
Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop
borealisdata.ca
search.dataone.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucia Costanzo; Vivek Jadon (2024). Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop [Dataset]. http://doi.org/10.5683/SP3/FF6AI9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/FF6AI9
Dataset updated
Jul 19, 2024
Dataset provided by
Borealis
Authors
Lucia Costanzo; Vivek Jadon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
d
Coresignal | Clean Data | Company Data | AI-Enriched Datasets | Global /...
datarade.ai
.json, .csv
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coresignal, Coresignal | Clean Data | Company Data | AI-Enriched Datasets | Global / 35M+ Records / Updated Weekly [Dataset]. https://datarade.ai/data-products/coresignal-clean-data-company-data-ai-enriched-datasets-coresignal
Explore at:
.json, .csvAvailable download formats
Dataset authored and provided by
Coresignal
Area covered
Hungary, Guatemala, Guinea-Bissau, Saint Barthélemy, Niue, Panama, Chile, Namibia, Guadeloupe, Andorra
Description
This clean dataset is a refined version of our company datasets, consisting of 35M+ data records.

It’s an excellent data solution for companies with limited data engineering capabilities and those who want to reduce their time to value. You get filtered, cleaned, unified, and standardized B2B data. After cleaning, this data is also enriched by leveraging a carefully instructed large language model (LLM).

AI-powered data enrichment offers more accurate information in key data fields, such as company descriptions. It also produces over 20 additional data points that are very valuable to B2B businesses. Enhancing and highlighting the most important information in web data contributes to quicker time to value, making data processing much faster and easier.

For your convenience, you can choose from multiple data formats (Parquet, JSON, JSONL, or CSV) and select suitable delivery frequency (quarterly, monthly, or weekly).

Coresignal is a leading public business data provider in the web data sphere with an extensive focus on firmographic data and public employee profiles. More than 3B data records in different categories enable companies to build data-driven products and generate actionable insights. Coresignal is exceptional in terms of data freshness, with 890M+ records updated monthly for unprecedented accuracy and relevance.
w
Dataset of book subjects that contain Data cleaning and exploration with...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Data cleaning and exploration with machine learning : clean data with machine learning algorithms and techniques [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Data+cleaning+and+exploration+with+machine+learning+:+clean+data+with+machine+learning+algorithms+and+techniques&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 3 rows and is filtered where the books is Data cleaning and exploration with machine learning : clean data with machine learning algorithms and techniques. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
R
Unclean Floor Dataset
universe.roboflow.com
zip
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CompVision (2025). Unclean Floor Dataset [Dataset]. https://universe.roboflow.com/compvision-bfglv/clean-unclean-floor
Explore at:
zipAvailable download formats
Dataset updated
Jan 29, 2025
Dataset authored and provided by
CompVision
Variables measured
Unclean Floor Bounding Boxes
Description
Unclean Floor

## Overview Unclean Floor is a dataset for object detection tasks - it contains Unclean Floor annotations for 1,194 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
D
CNVVE Dataset clean audio samples
darus.uni-stuttgart.de
Updated Feb 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramin Hedeshy; Raphael Menges; Steffen Staab (2024). CNVVE Dataset clean audio samples [Dataset]. http://doi.org/10.18419/DARUS-3898
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-3898
Dataset updated
Feb 13, 2024
Dataset provided by
DaRUS
Authors
Ramin Hedeshy; Raphael Menges; Steffen Staab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
BMBF
BMWK/ESF
Description
This CNVVE Dataset contains clean audio samples encompassing six distinct classes of voice expressions, namely “Uh-huh” or “mm-hmm”, “Uh-uh” or “mm-mm”, “Hush” or “Shh”, “Psst”, “Ahem”, and Continuous humming, e.g., “hmmm.” Audio samples of each class are found in the respective folders. These audio samples have undergone a thorough cleaning process. The raw samples are published in https://doi.org/10.18419/darus-3897. Initially, we applied the Google WebRTC voice activity detection (VAD) algorithm on the given audio files to remove noise or silence from the collected voice signals. The intensity was set to "2", which could be a value between "1" and "3". However, because of variations in the data, some files required additional manual cleaning. These outliers, characterized by sharp click sounds (such as those occurring at the end of recordings), were addressed. The samples are recorded through a dedicated website for data collection that defines the purpose and type of voice data by providing example recordings to participants as well as the expressions’ written equivalent, e.g., “Uh-huh”. Audio recordings were automatically saved in the .wav format and kept anonymous, with a sampling rate of 48 kHz and a bit depth of 32 bits. For more info, please check the paper or feel free to contact the authors for any inquiries.
Dataset for DIY Air Cleaner Efficacy Testing
catalog.data.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
Updated Dec 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). Dataset for DIY Air Cleaner Efficacy Testing [Dataset]. https://catalog.data.gov/dataset/dataset-for-diy-air-cleaner-efficacy-testing
Explore at:
Dataset updated
Dec 3, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This dataset includes air cleaner performance data in the form of clean air delivery rates for Do-it-yourself (DIY) air cleaners of varying design. Data on the usability characteristics, such as the noise level, power consumption, and costs to purchase and operate are included. This dataset is associated with the following publication: Holder, A., H. Halliday, and P. Virtaranta. Impact of do-it-yourself air cleaner design on the reduction of simulated wildfire smoke in a controlled chamber environment. INDOOR AIR. Blackwell Publishing, Malden, MA, USA, 32(11): NA, (2022).
Mushroom Cleaned dataset
kaggle.com
Updated Nov 26, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yogesh Singla (2020). Mushroom Cleaned dataset [Dataset]. https://www.kaggle.com/datasets/yogeshkumarsingla/mushroom-cleaned-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yogesh Singla
Description
Dataset

This dataset was created by Yogesh Singla

Contents
h
gpt4-llm-cleaned-chatml
huggingface.co
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksey Korshuk (2023). gpt4-llm-cleaned-chatml [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/gpt4-llm-cleaned-chatml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 26, 2023
Authors
Aleksey Korshuk
Description
Dataset Card for "gpt4-llm-cleaned-chatml"

Data preprocessing pipeline: https://github.com/AlekseyKorshuk/chat-data-pipeline
h
github-code-clean
huggingface.co
opendatalab.com
Updated Sep 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CodeParrot (2024). github-code-clean [Dataset]. https://huggingface.co/datasets/codeparrot/github-code-clean
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 2, 2024
Dataset authored and provided by
CodeParrot
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The GitHub Code clean dataset in a more filtered version of codeparrot/github-code dataset, it consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in almost 1TB of text data.
R
Clean Dataset
universe.roboflow.com
zip
Updated Jun 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cleanroom (2024). Clean Dataset [Dataset]. https://universe.roboflow.com/cleanroom-at4av/clean-g93xl/dataset/11
Explore at:
zipAvailable download formats
Dataset updated
Jun 19, 2024
Dataset authored and provided by
cleanroom
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Ok Bounding Boxes
Description
Clean

## Overview Clean is a dataset for object detection tasks - it contains Ok annotations for 290 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).

Facebook

Twitter

Click to copy link

Link copied

Cite

(2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise

Messy data for data cleaning exercise - Dataset - openAFRICA

Explore at:

Dataset updated

Oct 6, 2021

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions

Clear search

Close search

Google apps

Main menu

Messy data for data cleaning exercise - Dataset - openAFRICA

Data Cleaning Sample

Data Cleaning Tools Market Report | Global Forecast From 2025 To 2033

Data Cleaning Tools Market Outlook

Component Analysis

food data cleaning

Dataset

Contents

codeparrot-clean

Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio

Employment Of India CLeaned and Messy Data

🔹 Dataset Composition:

Transformations & Cleaning Applied:

Purpose & Utility:

Students Performance | Clean Dataset

Daily Tasks Park Cleaning Records

alpaca-cleaned

Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop

Coresignal | Clean Data | Company Data | AI-Enriched Datasets | Global /...

Dataset of book subjects that contain Data cleaning and exploration with...

Unclean Floor Dataset

Unclean Floor

CNVVE Dataset clean audio samples

Dataset for DIY Air Cleaner Efficacy Testing

Mushroom Cleaned dataset

Dataset

Contents

gpt4-llm-cleaned-chatml

github-code-clean

Clean Dataset

Clean

Messy data for data cleaning exercise - Dataset - openAFRICA