24 datasets found

Data analysis with pandas and python
kaggle.com
zip
Updated Apr 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
乡TOBY乡 (2023). Data analysis with pandas and python [Dataset]. https://www.kaggle.com/datasets/toby000/data-analysis-with-pandas-and-python
Explore at:
zip(701073 bytes)Available download formats
Dataset updated
Apr 16, 2023
Authors
乡TOBY乡
Description
This dataset includes data that is provided in the Udemy course "Data Analysis with Pandas and Python" by Boris Paskhaver.
Sales_2019_Analysis
kaggle.com
zip
Updated Jan 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Đức Phát Trương (2023). Sales_2019_Analysis [Dataset]. https://www.kaggle.com/datasets/cphttrng/sales-2019-analysis
Explore at:
zip(2504483 bytes)Available download formats
Dataset updated
Jan 15, 2023
Authors
Đức Phát Trương
Description
Dataset

This dataset was created by Đức Phát Trương

Contents
Practice Dataset
kaggle.com
zip
Updated Sep 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seif Hafez (2025). Practice Dataset [Dataset]. https://www.kaggle.com/datasets/seifhafez/practice-dataset
Explore at:
zip(13049 bytes)Available download formats
Dataset updated
Sep 20, 2025
Authors
Seif Hafez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was created in 2025 by the CATReloaded team in the Data Science Circle at Mansoura University, Faculty of Engineering, Egypt.

The dataset was originally prepared as the supporting material for a pandas practice notebook. That notebook was designed as a practical task after Corey Schafer’s YouTube pandas course

The goal was to create a comprehensive pandas challenge that includes almost every skill you might need when working with pandas. The idea is that you can save the code and revisit it later whenever you need a reference.

I believe this task could be useful for:

Anyone just starting with pandas

Learners who want a structured challenge to test and refresh their skills

People looking for a practice task they can build on, enhance, or adapt

📌 The full task is available in the pinned notebook here:

👉 "https://www.kaggle.com/code/seifhafez/pandas-exercise/edit">Link to Notebook

💡 Notes:

The task may contain non-beginner-friendly questions, so don’t worry if they take some time.

I plan to provide solutions/answers when I have free time to write them down.

If anyone from the community shares model answers, I’ll be very grateful. I will gladly give credit and mention those contributions so others can benefit from them too.

You are welcome to design new tasks or variations using this dataset or notebook, as long as credit is kept to the CATReloaded team.

📖 To explore more of what we do, check out the CATReloaded Roadmap on GitHub:

🧭 Data Science Roadmap

🌐 All Roadmaps

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19471804%2F9dcd0bfb323cfa328e83bd8a2b7944a7%2F458741397_513503334603832_744753795589333817_n.jpg?generation=1758812067506227&alt=media" alt="">
Panda Code 2
kaggle.com
zip
Updated Jul 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jay (2020). Panda Code 2 [Dataset]. https://www.kaggle.com/jakobw/panda-code-2
Explore at:
zip(5641817960 bytes)Available download formats
Dataset updated
Jul 25, 2020
Authors
Jay
Description
Dataset

This dataset was created by Jay

Released under Data files © Original Authors

Contents
Toys Images
kaggle.com
zip
Updated Nov 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Hisham Abdelzaher (2025). Toys Images [Dataset]. https://www.kaggle.com/datasets/mh0386/toys-images
Explore at:
zip(53485433 bytes)Available download formats
Dataset updated
Nov 6, 2025
Authors
Mohamed Hisham Abdelzaher
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset comprises a diverse collection of images featuring two classes of toys. Images of 105 Panda and 150 Rabbit Toys. It offers versatility for researchers and developers interested in creating AI models capable of generating realistic and novel toy-related images. It includes labelled categories for ease of classification and can be a valuable resource for advancing the capabilities of generative AI in the realm of playful and imaginative content creation and classification between the Panda and Rabbit class.
Python Recipes
kaggle.com
zip
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olga Belitskaya (2022). Python Recipes [Dataset]. https://www.kaggle.com/datasets/olgabelitskaya/python-recipes/code
Explore at:
zip(72646 bytes)Available download formats
Dataset updated
Aug 1, 2022
Authors
Olga Belitskaya
Description
\[\color{#ff35fe}{\mathbb{Context}}\]

The main idea is to create collections with standard code recipes.

\[\color{#ff35fe}{\mathbb{Content}}\]

Files with the .py (and similar) formats.

\[\color{#ff35fe}{\mathbb{Acknowledgments}}\]

Many thanks for the user comments.

\[\color{#ff35fe}{\mathbb{Inspiration}}\]

Could this data be a time saver in data processing?

\[\color{#ff35fe}{\mathbb{Russian \; Notes \; Python \; DataAnalysis}}\]

Тема 2

2.1 Введение в профессию «Аналитик данных»

2.2 Введение в программирование на языке Python

2.3 Синтаксис языка программирования Python

2.4 Типы данных в Python Часть 1

2.5 Типы данных в Python Часть 2

2.6 Python Standard Library

2.7 Преобразование типов данных в Python

2.8 Python: ввод и вывод

2.9 Управляющая конструкция if и тернарные операторы

2.10 Управляющяя конструкция циклы

2.11 Управляющяя конструкция исключения

2.12 Строки и методы их обработки

2.13 Списки. Кортежи

2.14 Множества и словари

2.15 Сочетание последовательных типов

2.16 - 2.18 В разработке

2.19 Функции и итераторы

2.20 Функции sorted(), map(), filter(), reduce()

2.21 - 2.24 В разработке

2.25 Объектно–ориентированное программирование (ООП)

2.26 Упражнения в объектно–ориентированном программировании

2.27 В разработке

2.28 Python NumPy Часть 1

2.29 Python NumPy Часть 2

2.30 Python NumPy Часть 3

2.31 Pandas - Типы и структура данных

2.32 Pandas - Простейшие операции

2.33 Pandas - Трансформация данных

2.34 - 2.36 В разработке

2.37 Python SciPy

2.38 Python Matplotlib Часть 1 Регулирование парамеров

2.39 Python Matplotlib Часть 2 Композиция графиков

2.40 Python Matplotlib Часть 3 Графическое проектирование

2.41, 2.42 В разработке

2.43 Графика. Обзор Python и других инструментов. Часть 1

2.44 Графика. Обзор Python и других инструментов. Часть 2

Тема 3

3.1 Этапы анализа данных

3.2 Типы данных

3.3 Измерительные шкалы в аналитике

Тема 4

4.1 Исследовательский анализ данных

[4.2...
8 Nations' YouTube Data Videos Trends: 3200
kaggle.com
zip
Updated May 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
suat selvi (2025). 8 Nations' YouTube Data Videos Trends: 3200 [Dataset]. https://www.kaggle.com/datasets/suatselvi/8-nations-youtube-data-videos-trends-3200
Explore at:
zip(484381 bytes)Available download formats
Dataset updated
May 9, 2025
Authors
suat selvi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
YouTube
Description
youtube data analysis course videos 8 countries

Overview

This dataset includes ~3200 YouTube videos focused on data analysis from 8 countries (~400 videos per country). Featuring data from Turkey, USA, Russia, Italy, France, Germany, Japan, and Spain, each video provides 8 key features. Ideal for global data science trend analysis!

Content

Countries: Turkey (TR), USA (US), Russia (RU), Italy (IT), France (FR), Germany (DE), Japan (JP), Spain (ES)

Features (8 columns, example):

title`: Video title

views_count: Total views

comment_count: Total comments

likes_count: Total likes -'dislike_count':Total dislikes

Additional Features (if added):

country_code: Country code (e.g., TR, US)

country_name: Full country name

like_view_ratio: Likes-to-views ratio

Size: ~3200 rows, 8+ columns

Format: CSV files

all_countries.csv: Combined dataset

Country-specific files (e.g., TR_videos.csv, US_videos.csv)

Potential Use Cases

Compare engagement (views, likes) across Turkey, USA, Russia, and other nations.

Analyze trending data science topics using tags from different countries.

Study how publish_date impacts video popularity in each region.

Visualize country-specific trends with Seaborn or Matplotlib.

Data Preparation

Cleaning: Missing values in likes and views filled with median/zero. NaN in tags set to "Unknown".

Standardization: publish_date formatted as YYYY-MM-DD.

Structure: Includes individual country files and a combined all_countries.csv.

Notes

Data collected from YouTube API (or specify your method).

Some metrics may be incomplete due to API limitations.

Feedback and suggestions are welcome!

Get Started

Load the dataset with Pandas: ```python import pandas as pd df = pd.read_csv('all_countries.csv')

Example: Top 5 videos by views

print(df.sort_values('views', ascending=False).head())
PANDA init class model1
kaggle.com
zip
Updated Jul 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iafoss (2020). PANDA init class model1 [Dataset]. https://www.kaggle.com/datasets/iafoss/panda-init-class-model1
Explore at:
zip(8248473039 bytes)Available download formats
Dataset updated
Jul 24, 2020
Authors
Iafoss
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset

This dataset was created by Iafoss

Released under Attribution 4.0 International (CC BY 4.0)

Contents
panda 40c
kaggle.com
zip
Updated Jul 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitry A. Grechka (2020). panda 40c [Dataset]. https://www.kaggle.com/dgrechka/panda-40c
Explore at:
zip(83316746 bytes)Available download formats
Dataset updated
Jul 22, 2020
Authors
Dmitry A. Grechka
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Dataset

This dataset was created by Dmitry A. Grechka

Released under Attribution 3.0 Unported (CC BY 3.0)

Contents
injection-molding-QA
kaggle.com
huggingface.co
zip
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Keser (2024). injection-molding-QA [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/injection-molding-qa/data
Explore at:
zip(2998736 bytes)Available download formats
Dataset updated
Apr 2, 2024
Authors
Mustafa Keser
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
injection-molding-QA

Description

This dataset contains questions and answers related to injection molding, focusing on topics such as 'Materials', 'Techniques', 'Machinery', 'Troubleshooting', 'Safety','Design','Maintenance','Manufacturing','Development','R&D'. The dataset is provided in CSV format with two columns: Questions and Answers.

Usage

Researchers, practitioners, and enthusiasts in the field of injection molding can utilize this dataset for tasks such as:

Natural Language Processing (NLP) tasks such as question answering, text generation, and summarization.

Training and evaluation of machine learning models for understanding and generating responses related to injection molding.

Example pandas

import pandas as pd # Load the dataset dataset = pd.read_csv('injection_molds_dataset.csv') # Display the first few rows print(dataset. Head())

Example datasets

from datasets import load_dataset # Load the dataset dataset = load_dataset("mustafakeser/injection-molding-QA") # Display dataset info print(dataset) # Accessing the first few examples print(dataset['train'][:5]) #or dataset['train'].to_pandas()

Columns

Questions: Contains questions related to injection molding.

Answers: Provides detailed answers to the corresponding questions.

Citation

If you use this dataset in your work, please consider citing it as:

@misc{injectionmold_dataset, author = {Your Name}, title = {Injection Molds Dataset}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face Datasets}, howpublished = {\url{link to the dataset}}, }

Huggingface

https://huggingface.co/datasets/mustafakeser/injection-molding-QA mustafakeser/injection-molding-QA

Notes

This dataset curated with gemini-1.0-pro

license: apache-2.0
panda 37c
kaggle.com
zip
Updated Jul 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitry A. Grechka (2020). panda 37c [Dataset]. https://www.kaggle.com/datasets/dgrechka/panda-37c
Explore at:
zip(83370085 bytes)Available download formats
Dataset updated
Jul 21, 2020
Authors
Dmitry A. Grechka
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Dataset

This dataset was created by Dmitry A. Grechka

Released under Attribution 3.0 Unported (CC BY 3.0)

Contents
Mlcourse.ai-2020
kaggle.com
zip
Updated Oct 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anas qais (2020). Mlcourse.ai-2020 [Dataset]. https://www.kaggle.com/anasqais/mlcourseai2020
Explore at:
zip(15881 bytes)Available download formats
Dataset updated
Oct 14, 2020
Authors
anas qais
Description
Open Machine Learning Course mlcourse.ai is designed to perfectly balance theory and practice; therefore, each topic is followed by an assignment with a deadline in a week. You can also take part in several Kaggle Inclass competitions held during the course and write your own tutorials. The next session launches in September, 2019. For more info go to the mlcourse.ai main page. Outline This is the list of published articles on medium.com (English), habr.com (Russian), and jqr.com (Chinese). See Kernels of this Dataset for the same material in English. 1. Exploratory Data Analysis with Pandas uk ru, cn, Kaggle Kernel 2. Visual Data Analysis with Python uk ru, cn, Kaggle Kernels: part1, part2 3. Classification, Decision Trees and k Nearest Neighbors uk, ru, cn, Kaggle Kernel 4. Linear Classification and Regression uk, ru, cn, Kaggle Kernels: part1, part2, part3, part4, part5 5. Bagging and Random Forest uk, ru, cn, Kaggle Kernels: part1, part2, part3 6. Feature Engineering and Feature Selection uk, ru, cn, Kaggle Kernel 7. Unsupervised Learning: Principal Component Analysis and Clustering uk, ru, cn, Kaggle Kernel 8. Vowpal Wabbit: Learning with Gigabytes of Data uk, ru, cn, Kaggle Kernel 9. Time Series Analysis with Python, part 1 uk, ru, cn. Predicting future with Facebook Prophet, part 2 uk, cn Kaggle Kernels: part1, part2 10. Gradient Boosting uk, ru, cn, Kaggle Kernel Assignments Each topic is followed by an assignment. See demo versions in this Dataset. Solutions will be discussed in the upcoming run of the course. Kaggle competitions 1. Catch Me If You Can: Intruder Detection through Webpage Session Tracking. Kaggle Inclass 2. How good is your Medium article? Kaggle Inclass Rating Throughout the course we are maintaining a student rating. It takes into account credits scored in assignments and Kaggle competitions. Top students (according to the final rating) will be listed on a special Wiki page. Community Discussions between students are held in the #mlcourse_ai channel of the OpenDataScience Slack team. A registration form will be shared prior to the start of the new session Collaboration You can publish Kernels using this Dataset. But please respect others' interests: don't share solutions to assignments and well-performing solutions for Kaggle Inclass competitions. If you notice any typos/errors in course material, please open an Issue or make a pull request in the course repo. The course is free but you can support organizers by making a pledge on Patreon (monthly support) or a one-time payment on Ko-fi
parquetfile-python-25k
kaggle.com
zip
Updated Feb 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JS (2024). parquetfile-python-25k [Dataset]. https://www.kaggle.com/datasets/jayshah1234/parquetfile-python-25k
Explore at:
zip(20082586 bytes)Available download formats
Dataset updated
Feb 25, 2024
Authors
JS
Description
Go to hf and search for flytech/python-codes-25k and download parquet file and upload the dataset on kaggle and call it by pandas
🏃🏻‍♂️ Long-distance running dataset
kaggle.com
zip
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2024). 🏃🏻‍♂️ Long-distance running dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/long-distance-running-dataset
Explore at:
zip(393989255 bytes)Available download formats
Dataset updated
Mar 7, 2024
Authors
mexwell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About

This dataset contains 10,703,690 records of running training during 2019 and 2020, from 36,412 athletes from around the world. The records were obtained through web scraping of a large social network for athletes on the internet.

The data with the athletes' activities are contained in dataframe objects (tabular data) and saved in the Parquet file format using the Pandas library, part of the Python ecosystem for data science. Each Pandas dataframe contains the following data (as different columns) for each athlete (as different rows), the first word identifies the name of the column in the dataframe: - datetime: date of the running activity; - athlete: a computer-generated ID for the athlete (integer); - distance: distance of running (floating-point number, in kilometers); - duration: duration of running (floating-point number, in minutes); - gender: gender (string 'M' of 'F'); - age_group: age interval (one of the strings '18 - 34', '35 - 54', or '55 +'); - country: country of origin of the athlete (string); - major: marathon(s) and year(s) the athlete ran (comma-separated list of strings).

For convenience, we created files with the athletes' activities data sampled at different frequencies: day 'd', week 'w', month 'm', and quarter 'q' (i.e., there are files with the distance and duration of running accumulated at each day, week, month, and quarter) for each year, 2019 and 2020. Accordingly, the files are named 'run_ww_yyyy_f.parquet', where 'yyyy' is '2019' or '2020' and 'f' is 'd', 'w', 'm' or 'q' (without quotes). The dataset also contains data with different government’s stringency indexes for the COVID-19 pandemic. These data are saved as text files and were obtained from https://ourworldindata.org/covid-stringency-index.

Acknowlegement

Foto von sporlab auf Unsplash
FBI Hate Crimes in USA (1991-2020)
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan (2021). FBI Hate Crimes in USA (1991-2020) [Dataset]. https://www.kaggle.com/jonathanrevere/fbi-hate-crimes-in-usa-19912020
Explore at:
zip(8929139 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Jonathan
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Area covered
United States
Description
Background

I recently finished the offered courses in Python and Pandas and wanted to practice sorting, creating dataframes, and grouping. I decided to use the hate crime data that is offered by the FBI. To practice, I preemptively separated the full csv file for each territory and state for ease of use by anyone that wants to access their states data right away. Also it provided good practice for coding.

Content

These datasets contain the date of the crime, what kind of crime it was, the offenders race, the victim's race, victim counts (if the victim was a minor or adult), what state and city the crime occurred in, and so on.

Also included is the methodology file so that you can see more context of the data itself and how it was collected.

Acknowledgements

I thought this would be a totally new dataset that has yet been uploaded to kaggle, but I did notice another dataset here, but that hasn't been updated in 2 years. But, I would like to thank that author since it did help me structure how to actually write this out😃 .

Further credit to the FBI for collecting this data which can be found here.

And of course thanks to kaggle for the free courses.

Inspiration

You can use this for several questions to track what years (or decade) had the highest concentration of hate crimes. Also, you can use the full csv file to organize by region for a similar question. But if you want to concentrate on your state, then that is also doable, just download the appropriate table. You can then find what areas in your state had the most hate crimes.

You can also figure out what's the most common hate crime victim over a specific timeframe.

Code I used

(Any feedback is appreciated!) ```

import relevant packages

import pandas as pd

load dataset

hate_crime = pd.read_csv(filepath)

list of states

states = ['AL','AK','AZ', 'AR','CA','CO','CT','DC','DE','FL','FS','GA','GM','HI','IA','ID', 'IL','IN','KS','KY','LA','MD','ME','MI','MN','MO','MS','MT','NB','NC','ND','NH', 'NJ','NM','NV','NY','OH','OK','OR','PA','RI','SC','SC','SD','TN','TX','UT','VA', 'VT','WA','WI','WV','WY']

create DataFrames of just other States

def create_DataFrame(State_Abbr): ''' Parameters ---------- State_Abbr : TYPE == STR USER ENTERS STATE ABBREVIATION

Returns ------- DATAFRAME OF HATE CRIMES IN THAT STATE ''' #overall this step is unnecessary because I'm not making an executable or anything if type(State_Abbr) != str or len(State_Abbr) != 2 or State_Abbr not in states: print('Please Enter the State Abbreviation for the desired state') else: #here's the useful bits ^_^ hate_df = pd.DataFrame(hate_crime.loc[hate_crime.STATE_ABBR == State_Abbr]) return hate_df

def create_csv(state_lst): ''' Parameters ---------- def create_csv[state_lst] : Input state list to create seperate csv files for each state.

Returns ------- A csv of hate crimes within individual states ''' for state in state_lst: df = create_DataFrame(state) df.to_csv('Hate Crimes in {} 1991-2020.csv'.format(state)) return
Bird Species Image Classification Dataset
kaggle.com
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evil Spirit05 (2025). Bird Species Image Classification Dataset [Dataset]. https://www.kaggle.com/datasets/evilspirit05/birds-species-prediction
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Evil Spirit05
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains high-quality images of six distinct bird species, curated for use in image classification, computer vision, and biodiversity research tasks. Each bird species included in this dataset is well-represented, making it ideal for training and evaluating deep learning models.

Label Species Name Image Count
1 American Goldfinch 143
2 Emperor Penguin 139
3 Downy Woodpecker 137
4 Flamingo 132
5 Carmine Bee-eater 131
6 Barn Owl 129

📂 Dataset Highlights: * Total Images: 811 * Classes: 6 unique bird species * Balanced Labels: Nearly equal distribution across classes * Use Cases: Image classification, model benchmarking, transfer learning, educational projects, biodiversity analysis

🧠 Potential Applications: * Training deep learning models like CNNs for bird species recognition * Fine-tuning pre-trained models using a small and balanced dataset * Educational projects in ornithology and computer vision * Biodiversity and wildlife conservation tech solutions

🛠️ Suggested Tools: * Python (Pandas, NumPy, Matplotlib) * TensorFlow / PyTorch for model development * OpenCV for image preprocessing * Streamlit for creating interactive demos
Musical Scale Classification Dataset using Chroma
kaggle.com
zip
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om Avashia (2025). Musical Scale Classification Dataset using Chroma [Dataset]. https://www.kaggle.com/datasets/omavashia/synthetic-scale-chromagraph-tensor-dataset
Explore at:
zip(392580911 bytes)Available download formats
Dataset updated
Apr 8, 2025
Authors
Om Avashia
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
Dataset Description

Musical Scale Dataset: 1900+ Chroma Tensors Labeled by Scale

This dataset contains 1900+ unique synthetic musical audio samples generated from melodies in each of the 24 Western scales (12 major and 12 minor). Each sample has been converted into a chroma tensor, a 12-dimensional pitch class representation commonly used in music information retrieval (MIR) and deep learning tasks.

What’s Inside

chroma_tensor: A JSON-safe formatted of a PyTorch tensor with shape [1, 12, T], where:

12 = the 12 pitch classes (C, C#, D, ... B)

T = time steps

scale_index: An integer label from 0–23 identifying the scale the sample belongs to

Use Cases

This dataset is ideal for: - Training deep learning models (CNNs, MLPs) to classify musical scales - Exploring pitch-class distributions in Western tonal music - Prototyping models for music key detection, chord prediction, or tonal analysis - Teaching or demonstrating chromagram-based ML workflows

Labels

Index Scale
0 C major
1 C# major
... ...
11 B major
12 C minor
... ...
23 B minor

Quick Load Example (PyTorch)

Chroma tensors are of shape [1, 12, T], where: - 1 is the channel dimension (for CNN input) - 12 represents the 12 pitch classes (C through B) - T is the number of time frames

import torch import pandas as pd from tqdm import tqdm df = pd.read_csv("/content/scale_dataset.csv") # Reconstruct chroma tensors X = [torch.tensor(eval(row)).reshape(1, 12, -1) for row in tqdm(df['chroma_tensor'])] y = df['scale_index'].tolist()

Alternatively, you could directly load the chroma tensors and target scale indices using the .pt file.

import torch import pandas as pd data = torch.load("chroma_tensors.pt") X_pt = data['X'] # list of [1, 12, 302] tensors y_pt = data['y'] # list of scale indices

How It Was Built

Notes generated from random melodies using music21

MIDI converted to WAV via FluidSynth

Chromagrams extracted with librosa.feature.chroma_stft

Tensors flattened and saved alongside scale index labels

File Format

Column Type Description
chroma_tensor str Flattened 1D chroma tensor [1×12×T]
scale_index int Label from 0 to 23

Notes

Data is synthetic but musically valid and well-balanced

Each of the 24 scales appears 300 times

All tensors have fixed length (T) for easy batching
Bank Fraud Dataset
kaggle.com
zip
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Borna B (2023). Bank Fraud Dataset [Dataset]. https://www.kaggle.com/datasets/mohammadbolandraftar/my-dataset
Explore at:
zip(11896802 bytes)Available download formats
Dataset updated
Aug 28, 2023
Authors
Borna B
Description
The dataset has one training dataset, one testing (unseen) dataset, which is unlabeled, and a clickstream dataset, all interconnected through a common identifier known as "SESSION_ID." This identifier allows us to link user actions across the datasets. A session involves client online banking activities like signing in, updating passwords, viewing products, or adding items to the cart.

Majority of fraud cases add new shipping address, or change password. you can do visualization to get more insights about the nature of frauds.

I also added 2 datasets named "train/test_dataset_combined" which are the merged version of the train and test datasets based on the "SESSION_ID" column. For more information, please refer to this link: https://www.kaggle.com/code/mohammadbolandraftar/combine-datasets-in-pandas

In addition, I added the cleaned dataset after doing EDA. For more information about the EDA process, please refer to this link: https://www.kaggle.com/code/mohammadbolandraftar/a-deep-dive-into-fraud-detection-through-eda
Butterfly class names
kaggle.com
zip
Updated Jun 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Panda (2024). Butterfly class names [Dataset]. https://www.kaggle.com/datasets/alekstop/butterfly-class-names
Explore at:
zip(940 bytes)Available download formats
Dataset updated
Jun 24, 2024
Authors
Panda
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Panda

Released under Apache 2.0

Contents

Label	Species Name	Image Count
1	American Goldfinch	143
2	Emperor Penguin	139
3	Downy Woodpecker	137
4	Flamingo	132
5	Carmine Bee-eater	131
6	Barn Owl	129

Index	Scale
0	C major
1	C# major
...	...
11	B major
12	C minor
...	...
23	B minor

Column	Type	Description
`chroma_tensor`	`str`	Flattened 1D chroma tensor `[1×12×T]`
`scale_index`	`int`	Label from 0 to 23

MMM Weekly Data - Geo:India

kaggle.com

zip

Updated Jul 18, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

SubhagatoAdak (2025). MMM Weekly Data - Geo:India [Dataset]. https://www.kaggle.com/datasets/subhagatoadak/mmm-weekly-data-geoindia

Explore at:

zip(2463044 bytes)Available download formats

Dataset updated

Jul 18, 2025

Authors

SubhagatoAdak

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Area covered

India

Description

Synthetic India FMCG MMM Dataset (Weekly, 3 Years, Multi-Geo / Multi-Channel)

Subtitle: 3-Year Weekly Multi-Channel FMCG Marketing Mix Panel for India Grain: Week-ending Saturday × Geography × Brand × SKU Span: 156 weeks (2 Jul 2022 – 27 Jun 2025) Scope: 8 Indian geographies • 3 brands × 3 SKUs each (9 SKUs) • Full marketing, trade, price, distribution & macro controls • AI creative quality scores for digital banners.

This dataset is synthetic but behaviorally realistic, generated to help analysts experiment with Marketing Mix Modeling (MMM), media effectiveness, price/promo analytics, distribution effects, and hierarchical causal inference without using proprietary commercial data.

Why This Dataset?

Real MMM training data is rarely public due to confidentiality. This synthetic panel:

Mirrors common FMCG (CPG) category dynamics in India (festive spikes, monsoon effects, geo scale differences).
Includes paid media channels (TV, YouTube, Facebook, Instagram, Print, Radio).
Captures promotions & trade levers (feature, display, temporary price reduction, trade spend).
Provides distribution & availability metrics (Weighted Distribution, Numeric Distribution, TDP, NOS).
Includes pricing (MRP, Net Price under TPR).
Adds macro signals (CPI, GDP, Festival Index, Rainfall Index) aligned to India’s seasonality.
Introduces AI Content Scores (Facebook & Instagram banner creative quality) — letting you explore creative × media interaction models.
Delivered at a granular panel (Geo × Brand × SKU) suitable for pooled, hierarchical, or Bayesian MMM workflows.

Files

File	Description
`synthetic_mmm_weekly_india_SAT.csv`	Main dataset. 11,232 rows × 28 columns. Weekly (week-ending Saturday).

(If you also upload the Monday version, note it clearly and point users to which to use.)

Quick Start

import pandas as pd

df = pd.read_csv("/kaggle/input/synthetic-india-fmcg-mmm/synthetic_mmm_weekly_india_SAT.csv",
         parse_dates=["Week"])

df.info()
df.head()

Aggregate to Geo-Brand Weekly

geo_brand = (
  df.groupby(["Week","Geo","Brand"], as_index=False)
   .sum(numeric_only=True)
)

Create Modeling-Friendly Features

Example: log-transform sales value, normalize media, build price index.

import numpy as np

m = geo_brand.copy()
m["log_sales_val"] = np.log1p(m["Sales_Value"])
m["price_index"] = m["Net_Price"] / m.groupby(["Geo","Brand"])["Net_Price"].transform("mean")

Calendar Notes

Week variable = week-ending Saturday (Pandas freq W-SAT).
First week: 2022-07-02; last week: 2025-06-27 (depending on 156-week span anchor).

To derive a week-start (Sunday) date:

df["Week_Start"] = df["Week"] - pd.Timedelta(days=6)

Data Dictionary

Key Dimensions

Column	Type	Description
Week	date	Week-ending Saturday timestamp.
Geo	categorical	8 rollups: NORTH, SOUTH, EAST, WEST, CENTRAL, NORTHEAST, METRO_DELHI, METRO_MUMBAI.
Brand	categorical	BrandA / BrandB / BrandC.
SKU	categorical	Brand-level SKU IDs (3 per brand).

Commercial Outcomes

Column	Type	Notes
Sales_Units	float	Modeled weekly unit sales after macro, distribution, price, promo & media effects. Lognormal noise added.
Sales_Value	float	Sales_Units × Net_Price. Use for revenue MMM or ROI analyses.

Pricing

Column	Type	Notes
MRP	float	Baseline list price (per-unit). Drifts with CPI & brand positioning.
Net_Price	float	Effective real...

Facebook

Twitter

Click to copy link

Link copied

Cite

乡TOBY乡 (2023). Data analysis with pandas and python [Dataset]. https://www.kaggle.com/datasets/toby000/data-analysis-with-pandas-and-python

Data analysis with pandas and python

This is a dataset which is provided in Udemy course.

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

zip(701073 bytes)Available download formats

Dataset updated

Apr 16, 2023

Authors

乡TOBY乡

Description

This dataset includes data that is provided in the Udemy course "Data Analysis with Pandas and Python" by Boris Paskhaver.

Clear search

Close search

Google apps

Main menu

Data analysis with pandas and python

Sales_2019_Analysis

Dataset

Contents

Practice Dataset

I believe this task could be useful for:

📌 The full task is available in the pinned notebook here:

💡 Notes:

📖 To explore more of what we do, check out the CATReloaded Roadmap on GitHub:

Panda Code 2

Dataset

Contents

Toys Images

Python Recipes

\[\color{#ff35fe}{\mathbb{Context}}\]

\[\color{#ff35fe}{\mathbb{Content}}\]

\[\color{#ff35fe}{\mathbb{Acknowledgments}}\]

\[\color{#ff35fe}{\mathbb{Inspiration}}\]

\[\color{#ff35fe}{\mathbb{Russian \; Notes \; Python \; DataAnalysis}}\]

Тема 3

Тема 4

8 Nations' YouTube Data Videos Trends: 3200

youtube data analysis course videos 8 countries

Overview

Content

Potential Use Cases

Data Preparation

Notes

Get Started

Example: Top 5 videos by views

PANDA init class model1

Dataset

Contents

panda 40c

Dataset

Contents

injection-molding-QA

injection-molding-QA

Description

Usage

Example pandas

Example datasets

Columns

Citation

Huggingface

Notes

This dataset curated with gemini-1.0-pro

license: apache-2.0

panda 37c

Dataset

Contents

Mlcourse.ai-2020

parquetfile-python-25k

🏃🏻‍♂️ Long-distance running dataset

About

Acknowlegement

FBI Hate Crimes in USA (1991-2020)

Background

Content

Acknowledgements

Inspiration

Code I used

import relevant packages

load dataset

list of states

create DataFrames of just other States

Bird Species Image Classification Dataset

This dataset contains high-quality images of six distinct bird species, curated for use in image classification, computer vision, and biodiversity research tasks. Each bird species included in this dataset is well-represented, making it ideal for training and evaluating deep learning models.

Musical Scale Classification Dataset using Chroma

Dataset Description

What’s Inside

Use Cases

Labels

Quick Load Example (PyTorch)

How It Was Built

File Format

Notes

Bank Fraud Dataset

Butterfly class names

Dataset