Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
What Can Pandas Do?
Pandas gives you answers about the data. Like:
Is there a correlation between two or more columns?
What is average value?
Max value?
Min value?
Facebook
TwitterThis dataset was created by Sahil
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Rajni Arora
Released under Apache 2.0
Facebook
TwitterThis dataset was created by TuringTinkerer
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created in 2025 by the CATReloaded team in the Data Science Circle at Mansoura University, Faculty of Engineering, Egypt.
The dataset was originally prepared as the supporting material for a pandas practice notebook. That notebook was designed as a practical task after Corey Schafer’s YouTube pandas course
The goal was to create a comprehensive pandas challenge that includes almost every skill you might need when working with pandas. The idea is that you can save the code and revisit it later whenever you need a reference.
Anyone just starting with pandas
Learners who want a structured challenge to test and refresh their skills
People looking for a practice task they can build on, enhance, or adapt
👉 "https://www.kaggle.com/code/seifhafez/pandas-exercise/edit">Link to Notebook
The task may contain non-beginner-friendly questions, so don’t worry if they take some time.
I plan to provide solutions/answers when I have free time to write them down.
If anyone from the community shares model answers, I’ll be very grateful. I will gladly give credit and mention those contributions so others can benefit from them too.
You are welcome to design new tasks or variations using this dataset or notebook, as long as credit is kept to the CATReloaded team.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19471804%2F9dcd0bfb323cfa328e83bd8a2b7944a7%2F458741397_513503334603832_744753795589333817_n.jpg?generation=1758812067506227&alt=media" alt="">
Facebook
TwitterThis dataset was created by Purvansh Singh
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If you need a practice dataset to improve your skills then use this dataset.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2681031%2F9dfecc01e0d719b732e69389b592de91%2Fpython%20panda2.jpg?generation=1599246568164860&alt=media" alt="">
I created this dataset for my notebook Getting started with Dictionary and Pandas. To help people improve their dictionary and panda skills. https://www.kaggle.com/brendan45774/getting-started-with-dictionary-and-pandas
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is used to practice Pandas for beginners
This dataset is presented with some errors which is needed to be fixed. You can use this dataset to practice: Cleaning NaN values with basic Pandas techniques.
I have this dataset from w3school
Facebook
TwitterThis dataset was created by Aman Sharma
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Akarsh Tiwari
Released under Apache 2.0
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by nishant nandkeolyar
Released under Apache 2.0
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by AyanStark
Released under MIT
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset consists of three one-dimensional datasets, each containing a series of values. One-dimensional data is characterized by its simplicity, making it an ideal starting point for those new to data analysis and manipulation. With the power of Pandas Series, you can perform a wide range of operations and functions to gain insights and derive valuable information from these datasets.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset is on Meals. Containing the meal name, area, category, and tags for each meal. The shape of the data frame is 731 x 4. The meals are listed in alphabetical order.
Facebook
TwitterThis dataset was created by Deependra Verma
Facebook
TwitterThis dataset has been created for educational purposes, specifically to help learners practice SQL-like operations using Python’s pandas library. It is ideal for beginners who want to improve their data manipulation, querying, and transformation skills in a notebook environment such as Kaggle.
The dataset simulates a simple personnel and department system. It includes two tables:
personel: Contains employee data such as names, ages, salaries, and department IDs. departman: Contains department IDs and corresponding department names. Throughout this project, key SQL operations have been demonstrated with their pandas equivalents. These include:
Basic commands like SELECT, INSERT, UPDATE, DELETE Table structure operations: ALTER, DROP, TRUNCATE, COPY Filtering and logical expressions: WHERE, AND, OR, IN, IS NULL, BETWEEN, LIKE Aggregations and sorting: COUNT(), ORDER BY, LIMIT, DISTINCT String functions: LOWER, TRIM, REPLACE, SPLIT, LENGTH Joins: INNER JOIN, LEFT JOIN Comparison operators: =, !=, <, > The goal is to provide a hands-on, interactive environment for practicing SQL logic using real Python code. This dataset does not represent real individuals or businesses — it is entirely fictional and meant for training, teaching, and experimentation purposes only.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I have taught many students to use Pandas. Often, many lacked context to apply their newly acquired skills. This dataset will help new learners work on their Pandas skills.
This dataset contains 13 columns and 6889 rows. The data is at a unique customer level. Each customers transaction amount and number of transactions information is present in a separate column (or unpivoted). Also, the data contains its first and last transaction date.
To be added.
I was inspired by creating contextual questions that will help students learn Pandas faster.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed specifically for beginners and intermediate learners to practice data cleaning techniques using Python and Pandas.
It includes 500 rows of simulated employee data with intentional errors such as:
Missing values in Age and Salary
Typos in email addresses (@gamil.com)
Inconsistent city name casing (e.g., lahore, Karachi)
Extra spaces in department names (e.g., " HR ")
✅ Skills You Can Practice:
Detecting and handling missing data
String cleaning and formatting
Removing duplicates
Validating email formats
Standardizing categorical data
You can use this dataset to build your own data cleaning notebook, or use it in interviews, assessments, and tutorials.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Choi Ji Woo
Released under MIT
Facebook
TwitterThis Dataset is created for practicing and learing Pandas
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
What Can Pandas Do?
Pandas gives you answers about the data. Like:
Is there a correlation between two or more columns?
What is average value?
Max value?
Min value?