100+ datasets found

Retail Product Dataset with Missing Values
kaggle.com
zip
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values
Explore at:
zip(47826 bytes)Available download formats
Dataset updated
Feb 17, 2025
Authors
Himel Sarder
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage

This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.
Data Cleaning - Feature Imputation
kaggle.com
zip
Updated Aug 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mr.Machine (2022). Data Cleaning - Feature Imputation [Dataset]. https://www.kaggle.com/datasets/ilayaraja07/data-cleaning-feature-imputation
Explore at:
zip(116097 bytes)Available download formats
Dataset updated
Aug 13, 2022
Authors
Mr.Machine
Description
Data Cleaning or Data cleansing is to clean the data by imputing missing values, smoothing noisy data, and identifying or removing outliers. In general, the missing values are found due to collection error or data is corrupted.

Here some info in details :Feature Engineering - Handling Missing Value

Wine_Quality.csv dataset have the numerical missing data, and students_Performance.mv.csv dataset have Numerical and categorical missing data's.
Handling Missing Data Example Dataset
kaggle.com
zip
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PRINCE1204 (2025). Handling Missing Data Example Dataset [Dataset]. https://www.kaggle.com/prince1204/handling-missing-data-example-dataset
Explore at:
zip(10211 bytes)Available download formats
Dataset updated
Aug 21, 2025
Authors
PRINCE1204
Description
📊 Dataset Description – Handling Missing Data

This dataset contains 1,000 employee records across different departments and cities, designed for practicing data cleaning, preprocessing, and handling missing values in real-world scenarios.

🔹 Features (Columns)

ID (Integer): Unique identifier for each employee.

Age (Float): Age of the employee (some values are missing).

Salary (Float): Annual salary of the employee in USD (some values are missing).

Experience (Float): Total years of professional experience (some values are missing).

Department (Categorical): Department of the employee (e.g., IT, Sales, Finance, Admin) – contains missing values.

City (Categorical): Work location of the employee (e.g., London, Berlin, New York) – contains missing values.

🔹 Missing Data Information

Columns Age, Salary, Experience, Department, and City contain around 100 missing values each.

The dataset is ideal for testing different missing data handling techniques, such as:

Mean / Median / Mode imputation

Random sampling imputation

Forward / Backward filling

Predictive modeling approaches

🔹 Use Cases

🧹 Practice data cleaning & preprocessing for ML projects.

🔧 Explore imputation techniques for both numerical and categorical data.

🤖 Build predictive models while handling incomplete datasets.

🎓 Great for educational purposes, tutorials, and workshops on missing data handling.
Handling of missing values in python
kaggle.com
zip
Updated Jul 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xodeum (2022). Handling of missing values in python [Dataset]. https://www.kaggle.com/datasets/xodeum/handling-of-missing-values-in-python
Explore at:
zip(2634 bytes)Available download formats
Dataset updated
Jul 3, 2022
Authors
xodeum
Description
In this Datasets i simply showed the handling of missing values in your data with help of python libraries such as NumPy and pandas. You can also see the use of Nan and Non values. Detecting, dropping and filling of null values.
Handle Missing Values
kaggle.com
zip
Updated Oct 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Safacan Metin (2020). Handle Missing Values [Dataset]. https://www.kaggle.com/safacanmetin/handle-missing-values
Explore at:
zip(1806 bytes)Available download formats
Dataset updated
Oct 24, 2020
Authors
Safacan Metin
Description
Dataset

This dataset was created by Safacan Metin

Contents
After handling missing value
kaggle.com
zip
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tuệ Nguyễn (2025). After handling missing value [Dataset]. https://www.kaggle.com/datasets/tuesdaymatched/after-handling-missing-value
Explore at:
zip(19472837 bytes)Available download formats
Dataset updated
Jul 14, 2025
Authors
Tuệ Nguyễn
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Tuệ Nguyễn

Released under Apache 2.0

Contents
handling missing data
kaggle.com
zip
Updated May 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pankesh Patel (2019). handling missing data [Dataset]. https://www.kaggle.com/pankeshpatel/handling-missing-data
Explore at:
zip(2643 bytes)Available download formats
Dataset updated
May 18, 2019
Authors
Pankesh Patel
Description
Dataset

This dataset was created by Pankesh Patel

Contents
Techniques for Handling Missing Data in ML(CLV)
kaggle.com
zip
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zaid Mohammed Ibrahim (2024). Techniques for Handling Missing Data in ML(CLV) [Dataset]. https://www.kaggle.com/datasets/zaidibrahim/techniques-for-handling-missing-data-in-mlclv
Explore at:
zip(61214 bytes)Available download formats
Dataset updated
May 6, 2024
Authors
Zaid Mohammed Ibrahim
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Zaid Mohammed Ibrahim

Released under MIT

Contents
a guide to handle missing values for ML Model
kaggle.com
zip
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Feroz Shinwari (2025). a guide to handle missing values for ML Model [Dataset]. https://www.kaggle.com/datasets/ferozshahshinwari/a-guide-to-handle-missing-values-for-ml-model/code
Explore at:
zip(36646 bytes)Available download formats
Dataset updated
Feb 10, 2025
Authors
Feroz Shinwari
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Feroz Shinwari

Released under Apache 2.0

Contents
Cleaning Practice with Errors & Missing Values
kaggle.com
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zuhair khan (2025). Cleaning Practice with Errors & Missing Values [Dataset]. https://www.kaggle.com/datasets/zuhairkhan13/cleaning-practice-with-errors-and-missing-values
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Zuhair khan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is designed specifically for beginners and intermediate learners to practice data cleaning techniques using Python and Pandas.

It includes 500 rows of simulated employee data with intentional errors such as:

Missing values in Age and Salary

Typos in email addresses (@gamil.com)

Inconsistent city name casing (e.g., lahore, Karachi)

Extra spaces in department names (e.g., " HR ")

✅ Skills You Can Practice:

Detecting and handling missing data

String cleaning and formatting

Removing duplicates

Validating email formats

Standardizing categorical data

You can use this dataset to build your own data cleaning notebook, or use it in interviews, assessments, and tutorials.
Spaceship Titanic | No missing values
kaggle.com
zip
Updated Mar 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sardor Abdirayimov (2022). Spaceship Titanic | No missing values [Dataset]. https://www.kaggle.com/datasets/sardorabdirayimov/spaceship-titanic-no-missing-values
Explore at:
zip(284931 bytes)Available download formats
Dataset updated
Mar 12, 2022
Authors
Sardor Abdirayimov
Description
Context

Dataset is final solution for dealing with missing values in the Spaceship Titanic competition. Kaggle Notebook: https://www.kaggle.com/sardorabdirayimov/best-way-of-dealing-with-missing-values-titanic-2/
Data from: Missing data handling methods
kaggle.com
zip
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krisztián Boros (2024). Missing data handling methods [Dataset]. https://www.kaggle.com/datasets/krisztinboros/missing-data-handling-methods
Explore at:
zip(6274510 bytes)Available download formats
Dataset updated
Jul 6, 2024
Authors
Krisztián Boros
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset for the paper "Identifying missing data handling methods with text mining".
It contains the type of missing data handling method used by a given paper.

Column description

id: ID of the article
origin: Source journal
pub_year: Publication year
discipline: Discipline category of the article based on origin
about_missing: Is the article about missing data handling? (0 - no, 1 - yes)
imputation: Was some kind of imputation technique used in the article? (0 - no, 1 - yes)
advanced: Was some kind of advanced imputation technique used in the article? (0 - no, 1 - yes)
deletion: Was some kind of deletion technique used in the article? (0 - no, 1 - yes)
text_tokens: Snipped out parts from the original articles
Finding_And_Visualizing_Missing_Data_Python
kaggle.com
zip
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). Finding_And_Visualizing_Missing_Data_Python [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/finding-and-visualizing-missing-data-python
Explore at:
zip(371581 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
• This dataset is designed for learning how to identify missing data in Python.
• It focuses on techniques to detect null, NaN, and incomplete values.
• It includes examples of visualizing missing data patterns using Python libraries.
• Useful for beginners practicing data preprocessing and data cleaning.
• Helps users understand missing data handling methods for machine learning workflows.
• Supports practical exploration of datasets before model training.
Class_Grades
kaggle.com
zip
Updated Oct 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Kumar (2022). Class_Grades [Dataset]. https://www.kaggle.com/himanshu2222/classgrades
Explore at:
zip(1501 bytes)Available download formats
Dataset updated
Oct 10, 2022
Authors
Himanshu Kumar
Description
Dataset

This dataset was created by Himanshu Kumar

Contents
Fix the Gaps: Data Hospital Simulation
kaggle.com
zip
Updated Nov 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajarajeswari P (2025). Fix the Gaps: Data Hospital Simulation [Dataset]. https://www.kaggle.com/datasets/rajarajeswariprr/fix-the-gaps-data-hospital-simulation
Explore at:
zip(24673 bytes)Available download formats
Dataset updated
Nov 25, 2025
Authors
Rajarajeswari P
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Activity Title: "Fix the Gaps: Data Hospital Simulation" (This activity is created for students to practice techniques to handle missing data)

Description: Provide each team with a “broken patient record” dataset (incomplete entries with NaNs or blanks). Teams act as data doctors: • Diagnose the type of missingness (MCAR, MAR, MNAR) • Choose suitable imputation techniques (mean, median, KNN, regression) • Compare outcomes from different methods

Tools: Jupyter notebook / Pandas

Outcome: Group presentation on the impact of imputation and justification of the method used.
Data from: loan Prediction
kaggle.com
zip
Updated Jan 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deep Jani (2022). loan Prediction [Dataset]. https://www.kaggle.com/deepjani/ipl-matches
Explore at:
zip(5197 bytes)Available download formats
Dataset updated
Jan 12, 2022
Authors
Deep Jani
Description
Dataset

This dataset was created by Deep Jani

Released under Data files © Original Authors

Contents

Cafe Sales - Dirty Data for Cleaning Training

kaggle.com

zip

Updated Jan 17, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ahmed Mohamed (2025). Cafe Sales - Dirty Data for Cleaning Training [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/cafe-sales-dirty-data-for-cleaning-training

Explore at:

zip(113510 bytes)Available download formats

Dataset updated

Jan 17, 2025

Authors

Ahmed Mohamed

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dirty Cafe Sales Dataset

Overview

The Dirty Cafe Sales dataset contains 10,000 rows of synthetic data representing sales transactions in a cafe. This dataset is intentionally "dirty," with missing values, inconsistent data, and errors introduced to provide a realistic scenario for data cleaning and exploratory data analysis (EDA). It can be used to practice cleaning techniques, data wrangling, and feature engineering.

File Information

File Name: dirty_cafe_sales.csv
Number of Rows: 10,000
Number of Columns: 8

Columns Description

Column Name	Description	Example Values
`Transaction ID`	A unique identifier for each transaction. Always present and unique.	`TXN_1234567`
`Item`	The name of the item purchased. May contain missing or invalid values (e.g., "ERROR").	`Coffee`, `Sandwich`
`Quantity`	The quantity of the item purchased. May contain missing or invalid values.	`1`, `3`, `UNKNOWN`
`Price Per Unit`	The price of a single unit of the item. May contain missing or invalid values.	`2.00`, `4.00`
`Total Spent`	The total amount spent on the transaction. Calculated as `Quantity * Price Per Unit`.	`8.00`, `12.00`
`Payment Method`	The method of payment used. May contain missing or invalid values (e.g., `None`, "UNKNOWN").	`Cash`, `Credit Card`
`Location`	The location where the transaction occurred. May contain missing or invalid values.	`In-store`, `Takeaway`
`Transaction Date`	The date of the transaction. May contain missing or incorrect values.	`2023-01-01`

Data Characteristics

Missing Values:
- Some columns (e.g., Item, Payment Method, Location) may contain missing values represented as None or empty cells.
Invalid Values:
- Some rows contain invalid entries like "ERROR" or "UNKNOWN" to simulate real-world data issues.
Price Consistency:
- Prices for menu items are consistent but may have missing or incorrect values introduced.

Menu Items

The dataset includes the following menu items with their respective price ranges:

Item	Price($)
Coffee	2
Tea	1.5
Sandwich	4
Salad	5
Cake	3
Cookie	1
Smoothie	4
Juice	3

Use Cases

This dataset is suitable for: - Practicing data cleaning techniques such as handling missing values, removing duplicates, and correcting invalid entries. - Exploring EDA techniques like visualizations and summary statistics. - Performing feature engineering for machine learning workflows.

Cleaning Steps Suggestions

To clean this dataset, consider the following steps: 1. Handle Missing Values: - Fill missing numeric values with the median or mean. - Replace missing categorical values with the mode or "Unknown."

Handle Invalid Values:
- Replace invalid entries like "ERROR" and "UNKNOWN" with NaN or appropriate values.
Date Consistency:
- Ensure all dates are in a consistent format.
- Fill missing dates with plausible values based on nearby records.
Feature Engineering:
- Create new columns, such as Day of the Week or Transaction Month, for further analysis.

License

This dataset is released under the CC BY-SA 4.0 License. You are free to use, share, and adapt it, provided you give appropriate credit.

Feedback

If you have any questions or feedback, feel free to reach out through the dataset's discussion board on Kaggle.

TidY_PracticE_DatasetS
kaggle.com
zip
Updated Jun 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DEBALINA MITRA (2023). TidY_PracticE_DatasetS [Dataset]. https://www.kaggle.com/datasets/debalinamitra/tidy-practice-datasets
Explore at:
zip(139335 bytes)Available download formats
Dataset updated
Jun 24, 2023
Authors
DEBALINA MITRA
Description
Original dataset that is shared on Github can be found here. These are hands on practice datasets that were linked through the Coursera Guided Project Certificate Course for Handling Missing Values in R, a part of Coursera Project Network. The datasets links were shared by the original author and instructor of the course Arimoro Olayinka Imisioluwa.

Things you could do with this dataset: As a beginner in R, these datasets helped me to get a hang over making data clean and tidy and handling missing values(only numeric) using R. Good for anyone looking for a beginner to intermediate level understanding in these subjects.

Here are my notebooks as kernels using these datasets and using a few more preloaded datasets in R, as suggested by the instructor. TidY DatA Practice MissinG DatA HandlinG - NumeriC
Clean cafe sales dataset
kaggle.com
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Majeedat Babalola (2025). Clean cafe sales dataset [Dataset]. https://www.kaggle.com/datasets/majeedatbabalola/clean-cafe-sales-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Majeedat Babalola
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset contains sales records from a café. Initially, it was messy, with missing values represented as NaN, UNKNOWN, and ERROR. The following cleaning steps were applied: 1. Handling Missing Values Replaced missing values with appropriate statistics: i. Mode for categorical columns (Item, Payment Method, and Location). ii. Mean for numerical columns (Quantity). iii. Median for temporal data (Transaction Date).

2. Price Standardization Inconsistent values in the Price per Unit column were corrected by filling them with the appropriate consistent price from the dataset.

3. Data Type Conversion Converted all columns to their appropriate data types (e.g., datetime for transaction dates, numeric for quantities and prices, categorical for items, payment methods, and locations)
🏦 Credit Approval Dataset
kaggle.com
zip
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marta Arroyo (2024). 🏦 Credit Approval Dataset [Dataset]. https://www.kaggle.com/datasets/martaarroyo/credit-approval-dataset
Explore at:
zip(9081 bytes)Available download formats
Dataset updated
Jan 3, 2024
Authors
Marta Arroyo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dive into this specially curated dataset on credit card applications 📊.

An interesting approach to privacy has been taken in this dataset– every name and value has been creatively altered to ensure confidentiality 🔒.

What's inside?

A diverse collection of data that's sure to pique your interest. You'll encounter a range of continuous variables, giving you a glimpse into quantitative insights 📈.

Then, there are categorical variables – some with just a handful of options offering a neat, compact view, and others with a plethora of choices, adding layers of complexity and richness.

But here's where it gets even more intriguing – the dataset has been intentionally peppered with additional missing values 💡.

This isn't your average dataset; it's a playground for those who love a good data challenge.

The goal?

To equip you with real-world skills in handling and imputing missing data 🧩. You'll learn to navigate through these informational gaps, employing various imputation techniques to unveil the hidden stories within the data.

This dataset isn't just about understanding credit card applications 💳. It's a journey into the heart of data analysis and machine learning 🤖.

Whether you're a beginner eager to learn the ropes or an experienced data enthusiast looking to refine your skills, this dataset offers a unique opportunity. It challenges you to apply theoretical knowledge to practical scenarios, transforming abstract concepts into tangible skills.

So, if you're ready to test your mettle against real-world data puzzles, this is your chance. Unleash your analytical prowess, explore diverse imputation strategies, and uncover the secrets hidden in incomplete data. Welcome to a world where data tells a story, and you're the storyteller 🌐

Facebook

Twitter

Click to copy link

Link copied

Cite

Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values

Retail Product Dataset with Missing Values

A dataset with numerical categorical values structured missing data for analysis

Explore at:

zip(47826 bytes)Available download formats

Dataset updated

Feb 17, 2025

Authors

Himel Sarder

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage

This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.

Clear search

Close search

Google apps

Main menu

Retail Product Dataset with Missing Values

Data Cleaning - Feature Imputation

Handling Missing Data Example Dataset

📊 Dataset Description – Handling Missing Data

🔹 Features (Columns)

🔹 Missing Data Information

🔹 Use Cases

Handling of missing values in python

Handle Missing Values

Dataset

Contents

After handling missing value

Dataset

Contents

handling missing data

Dataset

Contents

Techniques for Handling Missing Data in ML(CLV)

Dataset

Contents

a guide to handle missing values for ML Model

Dataset

Contents

Cleaning Practice with Errors & Missing Values

Spaceship Titanic | No missing values

Context

Data from: Missing data handling methods

Column description

Finding_And_Visualizing_Missing_Data_Python

Class_Grades

Dataset

Contents

Fix the Gaps: Data Hospital Simulation

Data from: loan Prediction

Dataset

Contents

Cafe Sales - Dirty Data for Cleaning Training

Dirty Cafe Sales Dataset

Overview

File Information

Columns Description

Data Characteristics

Menu Items

Use Cases

Cleaning Steps Suggestions

License

Feedback

TidY_PracticE_DatasetS

Clean cafe sales dataset

🏦 Credit Approval Dataset

Retail Product Dataset with Missing Values

A dataset with numerical categorical values structured missing data for analysis