17 datasets found

Cats and Dogs
kaggle.com
zip
Updated Nov 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suresh Maheshwari (2025). Cats and Dogs [Dataset]. https://www.kaggle.com/datasets/sureshmaheshwari021/cats-and-dogs
Explore at:
zip(140359211 bytes)Available download formats
Dataset updated
Nov 6, 2025
Authors
Suresh Maheshwari
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
🐾 Cats and Dogs Image Dataset

Scraped image dataset for classification

📘 Overview

This dataset contains images of cats and dogs collected using a custom web scraper from Google Images. It is designed for educational and research purposes, ideal for experimenting with image classification models, transfer learning, or deep learning architectures like CNNs.

📂 Dataset Details

Categories: 🐱 Cats, 🐶 Dogs

Total Images: ~600 (≈300 per class)

Image Type: Real photos only (filtered for quality)

File Format: JPG / PNG

Average Resolution: Around 300×300 px

🎯 Applications

You can use this dataset for:

Training and testing CNN models

Practicing transfer learning with models like ResNet, VGG16, or MobileNet

Exploring data augmentation and preprocessing techniques

Performing EDA (Exploratory Data Analysis) on image datasets

⚠️ Disclaimer

All images were obtained via public web search results and are shared strictly for educational and non-commercial use. Please verify image rights before any commercial application.

📜 License

CC0: Public Domain ✅ Free to use, modify, and share — no attribution required.
Evaluate AI Models for Breast Cancer Screening & Image Class - EDA
ai.tracebloc.io
json
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tracebloc (2025). Evaluate AI Models for Breast Cancer Screening & Image Class - EDA [Dataset]. https://ai.tracebloc.io/explore/ai-breast-cancer-screening-and-image-classification?tab=exploratory-data-analysis
Explore at:
jsonAvailable download formats
Dataset updated
Dec 3, 2025
Dataset provided by
Tracebloc GmbH
Authors
tracebloc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Missing Values
Measurement technique
Statistical and exploratory data analysis
Description
Benchmark and compare 3rd-party AI models for breast cancer screening image classification. Focus on sensitivity, false-positive control and enterprise-grade de
Bird Species Image Classification Dataset
kaggle.com
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evil Spirit05 (2025). Bird Species Image Classification Dataset [Dataset]. https://www.kaggle.com/datasets/evilspirit05/birds-species-prediction
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Evil Spirit05
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains high-quality images of six distinct bird species, curated for use in image classification, computer vision, and biodiversity research tasks. Each bird species included in this dataset is well-represented, making it ideal for training and evaluating deep learning models.

Label Species Name Image Count
1 American Goldfinch 143
2 Emperor Penguin 139
3 Downy Woodpecker 137
4 Flamingo 132
5 Carmine Bee-eater 131
6 Barn Owl 129

📂 Dataset Highlights: * Total Images: 811 * Classes: 6 unique bird species * Balanced Labels: Nearly equal distribution across classes * Use Cases: Image classification, model benchmarking, transfer learning, educational projects, biodiversity analysis

🧠 Potential Applications: * Training deep learning models like CNNs for bird species recognition * Fine-tuning pre-trained models using a small and balanced dataset * Educational projects in ornithology and computer vision * Biodiversity and wildlife conservation tech solutions

🛠️ Suggested Tools: * Python (Pandas, NumPy, Matplotlib) * TensorFlow / PyTorch for model development * OpenCV for image preprocessing * Streamlit for creating interactive demos
Multi-Class Images for Weather Classification
kaggle.com
zip
Updated Jan 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Somesh Sharma (2021). Multi-Class Images for Weather Classification [Dataset]. https://www.kaggle.com/datasets/somesh24/multiclass-images-for-weather-classification/data
Explore at:
zip(95740914 bytes)Available download formats
Dataset updated
Jan 14, 2021
Authors
Somesh Sharma
Description
Multi-class weather dataset(MWD) for image classification is a valuable dataset used in the research paper entitled “Multi-class weather recognition from still image using heterogeneous ensemble method”. The dataset provides a platform for outdoor weather analysis by extracting various features for recognizing different weather conditions.

Research Paper: https://web.cse.ohio-state.edu/~zhang.7804/Cheng_NC2016.pdf
f
Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...
acs.figshare.com
xlsx
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford (2023). The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE) [Dataset]. http://doi.org/10.1021/acs.jcim.1c00244.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.1c00244.s002
Dataset updated
Jun 8, 2023
Dataset provided by
ACS Publications
Authors
Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
R
Cdd Dataset
universe.roboflow.com
zip
Updated Sep 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hakuna matata (2023). Cdd Dataset [Dataset]. https://universe.roboflow.com/hakuna-matata/cdd-g8a6g/model/3
Explore at:
zipAvailable download formats
Dataset updated
Sep 5, 2023
Dataset authored and provided by
hakuna matata
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cumcumber Diease Detection Bounding Boxes
Description
Project Documentation: Cucumber Disease Detection

Title and Introduction Title: Cucumber Disease Detection

Introduction: A machine learning model for the automatic detection of diseases in cucumber plants is to be developed as part of the "Cucumber Disease Detection" project. This research is crucial because it tackles the issue of early disease identification in agriculture, which can increase crop yield and cut down on financial losses. To train and test the model, we use a dataset of pictures of cucumber plants.

Problem Statement Problem Definition: The research uses image analysis methods to address the issue of automating the identification of diseases, including Downy Mildew, in cucumber plants. Effective disease management in agriculture depends on early illness identification.

Importance: Early disease diagnosis helps minimize crop losses, stop the spread of diseases, and better allocate resources in farming. Agriculture is a real-world application of this concept.

Goals and Objectives: Develop a machine learning model to classify cucumber plant images into healthy and diseased categories. Achieve a high level of accuracy in disease detection. Provide a tool for farmers to detect diseases early and take appropriate action.

Data Collection and Preprocessing Data Sources: The dataset comprises of pictures of cucumber plants from various sources, including both healthy and damaged specimens.

Data Collection: Using cameras and smartphones, images from agricultural areas were gathered.

Data Preprocessing: Data cleaning to remove irrelevant or corrupted images. Handling missing values, if any, in the dataset. Removing outliers that may negatively impact model training. Data augmentation techniques applied to increase dataset diversity.

Exploratory Data Analysis (EDA) The dataset was examined using visuals like scatter plots and histograms. The data was examined for patterns, trends, and correlations. Understanding the distribution of photos of healthy and ill plants was made easier by EDA.

Methodology Machine Learning Algorithms:

Convolutional Neural Networks (CNNs) were chosen for image classification due to their effectiveness in handling image data. Transfer learning using pre-trained models such as ResNet or MobileNet may be considered. Train-Test Split:

The dataset was split into training and testing sets with a suitable ratio. Cross-validation may be used to assess model performance robustly.

Model Development The CNN model's architecture consists of layers, units, and activation operations. On the basis of experimentation, hyperparameters including learning rate, batch size, and optimizer were chosen. To avoid overfitting, regularization methods like dropout and L2 regularization were used.

Model Training During training, the model was fed the prepared dataset across a number of epochs. The loss function was minimized using an optimization method. To ensure convergence, early halting and model checkpoints were used.

Model Evaluation Evaluation Metrics:

Accuracy, precision, recall, F1-score, and confusion matrix were used to assess model performance. Results were computed for both training and test datasets. Performance Discussion:

The model's performance was analyzed in the context of disease detection in cucumber plants. Strengths and weaknesses of the model were identified.

Results and Discussion Key project findings include model performance and disease detection precision. a comparison of the many models employed, showing the benefits and drawbacks of each. challenges that were faced throughout the project and the methods used to solve them.

Conclusion recap of the project's key learnings. the project's importance to early disease detection in agriculture should be highlighted. Future enhancements and potential research directions are suggested.

References Library: Pillow,Roboflow,YELO,Sklearn,matplotlib Datasets:https://data.mendeley.com/datasets/y6d3z6f8z9/1

Code Repository https://universe.roboflow.com/hakuna-matata/cdd-g8a6g

Rafiur Rahman Rafit EWU 2018-3-60-111
Minecraft Block Texture Image Dataset
kaggle.com
zip
Updated Sep 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Urvish Ahir (2025). Minecraft Block Texture Image Dataset [Dataset]. https://www.kaggle.com/datasets/urvishahir/minecraft-block-textures-dataset
Explore at:
zip(548608 bytes)Available download formats
Dataset updated
Sep 8, 2025
Authors
Urvish Ahir
Description
Description :

A full set of 1,083 block texture images from the legendary game Minecraft, version 1.21.8.

Each image is a .png file representing the visual texture of a block used in the game.

A metadata CSV file (metadata.csv) is also included, containing details for all images such as: - file_name — the image filename - block_name — cleaned human-readable block name - variant — side, top, bottom, or base variant - avg_color — average RGB color of the block texture

This CSV makes it easier to use the dataset for computer vision, ML projects, clustering, or color analysis.

Use Cases :

Computer Vision Projects : Train image classification models to recognize Minecraft blocks or similar pixelated game textures.

Generative Models (GANs, Diffusion) : Use as training data for texture synthesis, block-style image generation, or AI art based on Minecraft aesthetics.

Augmented Reality / Game Modding Tools : Utilize textures in prototyping AR Minecraft-style games or Minecraft modding tools.

Clustering & Similarity Analysis : Apply unsupervised learning (e.g., K-means, t-SNE) to group similar block textures based on visual features.

Data Preprocessing & Feature Extraction Practice : Practice converting image datasets into usable features for downstream ML tasks (e.g., flattening, embeddings).

Exploratory Data Analysis (EDA) : Analyze visual color distribution, texture density, or image metadata to understand visual design patterns in Minecraft.

https://cdn.mos.cms.futurecdn.net/v2/t:0,l:0,cw:1920,ch:1080,q:80,w:1920/eYFCTxvu8Gq63C2eWZANBT.jpg" alt="">
🏍️Pak Bike Image Dataset
kaggle.com
zip
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohid Abdul Rehman (2023). 🏍️Pak Bike Image Dataset [Dataset]. https://www.kaggle.com/datasets/mohidabdulrehman/pak-bike-image-dataset/code
Explore at:
zip(13585563 bytes)Available download formats
Dataset updated
Sep 26, 2023
Authors
Mohid Abdul Rehman
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Pakistan
Description
Bikes are annotated in Tensorflow Object Detection format:

The dataset consists of a total of 184 images, and two CSV files for annotation purposes accompany it. These images have been meticulously annotated in the Tensorflow Object Detection format, providing valuable information for object recognition tasks. Notably, this dataset predominantly features images of bikes from Pakistan, making it a valuable resource for research and applications related to Pakistani bike recognition and classification.
Pokémon Cards
kaggle.com
zip
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priyam Choksi (2024). Pokémon Cards [Dataset]. https://www.kaggle.com/datasets/priyamchoksi/pokemon-cards
Explore at:
zip(1701514 bytes)Available download formats
Dataset updated
Jul 23, 2024
Authors
Priyam Choksi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Pokémon Card Dataset offers detailed information about various Pokémon cards, including images, attributes, and descriptions. This dataset is perfect for analyzing Pokémon card features, developing machine learning models, and enhancing gaming experiences.

Dataset Details:

ID: Unique identifier for each card.

Image URL: Link to a high-resolution image of the card.

Caption: Detailed description of the card, including type, rarity, and special features.

Name: The name of the Pokémon on the card.

HP (Hit Points): The health points of the Pokémon.

Set Name: The name of the card set to which the card belongs.

How to Use This Dataset:

Card Analysis: Compare and analyze Pokémon cards based on attributes, rarity, and card set.

Image Recognition: Use images and descriptions to train models for card recognition and classification.

Sentiment Analysis: Analyze the text in card captions to extract sentiments and opinions about the Pokémon.

Card Recommendation: Build recommendation systems that suggest Pokémon cards based on users’ interests and past collections.
Banana Tree Disease Detection New&Update Dataset
kaggle.com
zip
Updated Feb 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuvo Kumar Basak-4004 (2025). Banana Tree Disease Detection New&Update Dataset [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/banana-tree-disease-detection-new-and-update-dataset
Explore at:
zip(582633057 bytes)Available download formats
Dataset updated
Feb 6, 2025
Authors
Shuvo Kumar Basak-4004
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset is a collection of images representing various conditions of bananas, specifically aimed at training machine learning models for image classification or augmentation tasks. The dataset is organized into multiple subfolders, each representing a different condition or class of bananas. These classes include:

Healthy Bananas Bananas with Fusarium Wilt Bananas with Natural Leaf Death Bananas with Rhizome Root Issues Each image in the dataset is initially stored in its respective class folder and typically contains a banana or bananas under different conditions, viewed from different angles, and possibly with varying levels of resolution or lighting.

The dataset is then processed for various machine learning tasks like classification, detection, or augmentation. Specifically, this dataset is aimed at providing a variety of augmented images to ensure a more robust training set, which is critical for improving the generalization performance of machine learning models.

Related : Shuvo, Shuvo Kumar Basak (2025), “Banana_Tree_Disease_Detection_Dataset(BTDDD)”, Mendeley Data, V2, doi: 10.17632/vp2xnb8zmb.2

I, Shuvo Kumar Basak, have created and curated the Dataset. This dataset is freely available for research, educational, and non-commercial purposes.

Free Access to the Dataset: This is available free of charge to all individuals and organizations for educational and research use. This is to support the advancement of knowledge and studies related to biodiversity, machine learning, and related fields.

Future Collaboration and Data Requests: While the dataset is provided free of charge, I encourage individuals and organizations to contact me directly if they need access to additional related data, further assistance, or if they plan on expanding their research in the future.

If you require any new data or specific related datasets, feel free to reach out to me, Shuvo Kumar Basak, for collaboration. I am happy to assist with additional data collection, cleaning, resizing, or other related services at a reasonable cost.

Paid Services - Hire for Data Collection: If you or your organization need custom data collection or wish to obtain related datasets beyond what is included in this collection, I offer a paid service to gather new data according to your specific requirements. This includes: Custom data collection for other tree species or related botanical data.

Data cleaning, resizing, and preprocessing to make the data ready for analysis.

Please contact me for a custom quote based on your specific needs. I will work with you to provide high-quality, tailored datasets to support your research, project, or business needs. Terms and Conditions: The dataset is intended for academic, research, and non-commercial purposes only. Redistribution or commercial use of the dataset without prior written consent is not permitted. Proper attribution to Shuvo Kumar Basak as the creator of the dataset should be provided when using the dataset in publications, projects, or other works.

**More Dataset:: ** 1. https://www.kaggle.com/shuvokumarbasak4004/datasets 2. https://www.kaggle.com/shuvokumarbasak2030 …………………………………..Note for Researchers Using the dataset………………………………………………………………………

This dataset was created by Shuvo Kumar Basak. If you use this dataset for your research or academic purposes, please ensure to cite this dataset appropriately. If you have published your research using this dataset, please share a link to your paper. Good Luck.

Label	Species Name	Image Count
1	American Goldfinch	143
2	Emperor Penguin	139
3	Downy Woodpecker	137
4	Flamingo	132
5	Carmine Bee-eater	131
6	Barn Owl	129

Binary Classification Data for Apnea Detection

kaggle.com

zip

Updated Sep 15, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Akmol Masud Ayon (2024). Binary Classification Data for Apnea Detection [Dataset]. https://www.kaggle.com/datasets/masud1901/binary-classification-data-for-apnea-detection/code

Explore at:

zip(95161400 bytes)Available download formats

Dataset updated

Sep 15, 2024

Authors

Akmol Masud Ayon

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Binary Classification Data for Sleep Apnea Detection

📋 Dataset Overview

This dataset contains high-quality, preprocessed CWT (Continuous Wavelet Transform) scalogram images for binary classification of sleep apnea events from single-lead ECG signals. The dataset is derived from the PhysioNet Apnea-ECG Database and has been carefully filtered using multiple quality metrics to ensure optimal performance in deep learning models.

🎯 Purpose

This dataset was created to train and evaluate the DREAM (Deep Residual-Enabled Apnea Monitor) model, achieving 99.93% accuracy in sleep apnea detection. It is ideal for: - Sleep apnea detection research - ECG-based biosignal classification - Medical image classification tasks - Explainable AI in healthcare - Benchmark comparisons with state-of-the-art models

📊 Dataset Statistics

Metric	Value
Total Images	Balanced dataset
Image Format	PNG
Image Dimensions	128 × 180 pixels
Color Channels	Grayscale (1 channel)
Classes	2 (Apnea, Non-Apnea)
Source	PhysioNet Apnea-ECG Database
Preprocessing	CWT + Multi-metric filtering
Quality Assurance	SNR, Entropy, Contrast, Skewness, Kurtosis filtering

🔬 Data Generation Pipeline

1. Signal Acquisition

Source: PhysioNet Apnea-ECG Database
70 ECG recordings (35 training, 35 test sets)
Single-lead ECG sampled at 100 Hz
Minute-by-minute annotations (apnea/normal)

2. Continuous Wavelet Transform (CWT)

Wavelet: Morlet wavelet
Scales: 1-128
Time Window: 60 seconds per segment
Output: 128×180 scalogram images
Purpose: Transform time-series ECG into time-frequency representations

3. Exploratory Data Analysis (EDA)

Comprehensive analysis using multiple quality metrics: - Signal-to-Noise Ratio (SNR): Signal quality assessment - Entropy: Information content measurement - Contrast: Visual distinction evaluation - Skewness: Distribution asymmetry - Kurtosis: Distribution peakedness - Intensity Ranges: Pixel value distribution

4. Intelligent Filtering Criteria

Apnea Images (SNR < 7.5)

Captures high variability and pronounced signal fluctuations
Selects most apnea-like samples for clear class separation
Reduces false positives from borderline ambiguous samples

Scientific Justification: - Machine Learning Theory: Clean class boundaries improve classification accuracy by eliminating "grey zone" samples - Clinical Decision-Making: Prioritizes reducing false negatives in diagnostic contexts - Biosignal Processing: Standard practice for medical signal quality control

Non-Apnea Images (SNR ≥ 7.5)

Maintains high-quality normal breathing patterns
Eliminates noise artifacts that could mimic apnea
Ensures clear distinction from apnea events

📁 Dataset Structure

Binary_Classification_Apnea/
├── apnea/      # Apnea event images
│  ├── image_001.png
│  ├── image_002.png
│  └── ...
└── non_apnea/    # Normal breathing images
  ├── image_001.png
  ├── image_002.png
  └── ...

File Naming Convention

Images are named sequentially
Each image represents a 60-second ECG segment
Binary classification: apnea (class 1) vs. non-apnea (class 0)

🎯 Use Cases

Primary Applications

Sleep Apnea Detection: Binary classification of apnea events
Medical Image Classification: Transfer learning for similar tasks
Explainable AI Research: Grad-CAM and visualization studies
Benchmark Comparisons: Evaluate new architectures
Clinical Decision Support: Prototype development for diagnosis

Research Areas

Deep learning for healthcare
Biosignal processing and analysis
Explainable AI in medical imaging
Non-invasive sleep disorder diagnosis
Wearable health monitoring systems

🚀 Quick Start

Load Dataset in Python

import os
import numpy as np
from PIL import Image
from sklearn.model_s...

Bipolar vs non Bipolar Handwriting
kaggle.com
zip
Updated Mar 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmadali Jamali (2023). Bipolar vs non Bipolar Handwriting [Dataset]. https://www.kaggle.com/datasets/ahmadalijamali/bipolar-vs-non-bipolar-handwriting
Explore at:
zip(5713624 bytes)Available download formats
Dataset updated
Mar 29, 2023
Authors
Ahmadali Jamali
Description
Bipolar is a psychological mental disorder. In this particular search, I have analysed the handwriting of bipolar disorder by image processing techniques. All Image data has been gathered from the particular survey. For finding what is Bipolar general definition please check the nature article: https://www.nature.com/articles/s41380-021-01091-4 The language of handwriting is Persian.

The paper on this dataset is available in bellow link: https://doi.org/10.22060/ajmc.2024.22576.1176
NASA Mars Rover
kaggle.com
zip
Updated Oct 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kush Tripathi (2023). NASA Mars Rover [Dataset]. https://www.kaggle.com/datasets/kushtripathi/nasa-mars-rover-captured-images-and-its-details
Explore at:
zip(101585155 bytes)Available download formats
Dataset updated
Oct 8, 2023
Authors
Kush Tripathi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Title: Exploring Mars: A Comprehensive Dataset of Rover Photos and Metadata Description

This dataset provides an extensive collection of Mars rover images paired with in-depth metadata. Sourced from various Mars missions, this dataset is a treasure trove for anyone interested in space exploration, planetary science, or computer vision.

Components:

Photos: A curated set of high-definition images taken by different cameras onboard Mars rovers. These images capture a variety of terrains, weather conditions, and other Martian phenomena.

Details: A detailed CSV file accompanies these images, containing rich metadata like the type of camera used, the corresponding Martian sol, Earth date, and the rover responsible for each image.

Dataset Origin

The dataset was compiled from various Mars missions conducted over the years. Special care has been taken to include a diverse set of images to enable a wide range of analyses and applications. Objective

As a learner delving into the field of Computer Vision, my objectives for this project are multi-fold:

Data Analysis: To perform exploratory data analysis (EDA) to understand the distribution of images based on attributes like camera type, date, and rover.

Color Analysis: To identify and visualize dominant colors across different sets of images. This could provide insights into Martian geology.

Texture and Pattern Recognition: To classify Martian terrains using texture and pattern recognition techniques.

Machine Learning: To potentially develop a predictive model that could classify images into predefined categories based on their features.

Research Questions

Which camera types have contributed the most to the dataset?

What can the dominant colors in the images tell us about Mars?

Can we classify Martian terrains into categories like rocky, sandy, and icy?

Is there a correlation between the type of terrain and other variables like camera type or date?

Tools and Technologies

I plan to utilize Python for this project, particularly libraries like OpenCV for image processing, Pandas for data manipulation, and Matplotlib/Seaborn for data visualization. For machine learning tasks, I will likely use scikit-learn or TensorFlow.

Learning and Development

This project serves as both a learning exercise and a stepping stone toward more complex computer vision projects. I aim to document my learning journey, challenges, and milestones in a series of Kaggle notebooks. Collaboration and Feedback

I warmly invite the Kaggle community to offer suggestions, critiques, or even collaborate on this venture. Your insights could be invaluable in enhancing the depth and breadth of this project.
gaming_laptops_(2025)_amazon_web_scraping_data
kaggle.com
zip
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gojo@69 (2025). gaming_laptops_(2025)_amazon_web_scraping_data [Dataset]. https://www.kaggle.com/datasets/gojo69/gaming-laptops-2025-amazon-web-scraping-data/discussion
Explore at:
zip(12619 bytes)Available download formats
Dataset updated
Jun 9, 2025
Authors
Gojo@69
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset contains curated information about gaming laptops listed on Amazon India under a ₹1,00,000 budget, collected via web scraping in 2024. It includes laptop titles, pricing details, customer ratings, discounts, and review counts — useful for anyone looking to analyze laptop market trends, price-performance ratios, or build recommendation models.

The data has been cleaned and formatted for easy analysis, with all pricing and ratings converted to numerical format. Whether you're a gamer on a budget or a data analyst exploring tech e-commerce, this dataset is ready for EDA.

| --------------------- | ---------------------------------------- | | Product Title | Full name of the laptop (brand + specs) | | Product Price | Final price in INR (cleaned) | | Original Price | Original price before discount | | Discount Percentage | Discount percentage (if any) | | Product Rating | Average customer rating (out of 5) | | Number of Ratings | Total number of user ratings | | Product Image URL | Link to the product image (optional use) |

🔍 Use Cases 📈 Price trend analysis on Indian laptops

⚖️ Budget vs specs comparison

🛒 E-commerce analytics for gaming laptops

🤖 Input for machine learning models (recommendation/classification)

📊 EDA or visualization dashboards

⚠️ Disclaimer This dataset was created for educational and research purposes only. It contains publicly available data collected from Amazon.in. The author does not claim ownership of any product information. This work is not affiliated with or endorsed by Amazon.
Colombo Cafes 🍵: Ratings & Insights Dataset
kaggle.com
zip
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kanchana1990 (2024). Colombo Cafes 🍵: Ratings & Insights Dataset [Dataset]. https://www.kaggle.com/datasets/kanchana1990/colombo-cafes-ratings-and-insights-dataset/data
Explore at:
zip(11961 bytes)Available download formats
Dataset updated
Feb 14, 2024
Authors
Kanchana1990
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
Colombo
Description
The "Colombo Cafes 🍵: Ratings & Insights Dataset" offers a detailed exploration of the café scene in Colombo, designed to cater to a variety of data science projects. This dataset is primed for in-depth analyses, from understanding consumer preferences to predictive modeling despite its concise size.

Overview

This dataset provides a snapshot of Colombo's vibrant café culture, encapsulating key data points that reflect the diversity and richness of the city's coffee spots. It serves as an essential tool for those looking to delve into the dynamics of the café industry in Colombo.

Data Science Applications

The dataset's structured format and comprehensive data points make it an ideal candidate for a range of data science applications. Researchers and analysts can employ this dataset for exploratory data analysis, sentiment analysis, trend identification, and even for developing sophisticated machine learning models aimed at predicting café popularity or customer preferences.

Column Descriptors

After refinement, the dataset comprises several key columns: - title: The unique name of each café, crucial for identifying and distinguishing between different establishments. - totalScore: The average customer rating for each café, on a scale from 1 to 5, with missing values imputed with the dataset's mean score to ensure data consistency. - reviewsCount: The number of reviews for each café, providing insights into customer engagement; missing values are set to zero. - street, city, countryCode: These columns offer detailed location information for each café, enabling geographic analyses and mapping. - website: The official website URL for the café, where available, providing a direct link for further information. - phone: The contact number for each café, facilitating communication and inquiries. - categoryName: The classification of the establishment (e.g., Café, Coffee Shop), useful for segmenting and analyzing the data.

Ethically Mined Data

This dataset is compiled with strict adherence to ethical data mining practices, ensuring the privacy and confidentiality of all sourced information while maintaining high standards of data accuracy and integrity.

Acknowledgements

Gratitude is extended to platforms such as Google, whose repositories of user-generated content have been invaluable in assembling this detailed dataset. Their contribution has been instrumental in capturing the essence of Colombo's café culture.

Image Acknowledgement

The dataset is enhanced by visual elements that depict the ambiance and aesthetic of Colombo's cafés, with specific recognition given to an image accessible here, enriching the narrative and offering a visual representation of the data contained within.
Predict Term Deposit
kaggle.com
zip
Updated Nov 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Predict Term Deposit [Dataset]. https://www.kaggle.com/aslanahmedov/predict-term-deposit
Explore at:
zip(588608 bytes)Available download formats
Dataset updated
Nov 29, 2021
Authors
Aslan Ahmedov
Description
Predict Term Deposit

Introduction

Bank has multiple banking products that it sells to customer such as saving account, credit cards, investments etc. It wants to which customer will purchase its credit cards. For the same it has various kind of information regarding the demographic details of the customer, their banking behavior etc. Once it can predict the chances that customer will purchase a product, it wants to use the same to make pre-payment to the authors.

In this part I will demonstrate how to build a model, to predict which clients will subscribing to a term deposit, with inception of machine learning. In the ﬁrst part we will deal with the description and visualization of the analysed data, and in the second we will go to data classiﬁcation models.

Strategy

-Desire target -Data Understanding -Preprocessing Data -Machine learning Model -Prediction -Comparing Results

Desire Target

Predict if a client will subscribe (yes/no) to a term deposit — this is defined as a classification problem.

Data

The dataset (Assignment-2_data.csv) used in this assignment contains bank customers’ data. File name: Assignment-2_Data File format: . csv Numbers of Row: 45212 Numbers of Attributes: 17 non- empty conditional attributes attributes and one decision attribute.

https://user-images.githubusercontent.com/91852182/143783430-eafd25b0-6d40-40b8-ac5b-1c4f67ca9e02.png"> https://user-images.githubusercontent.com/91852182/143783451-3e49b817-29a6-4108-b597-ce35897dda4a.png">

Exploratory Data Analysis (EDA)

Data pre-processing is a main step in Machine Learning as the useful information which can be derived it from data set directly affects the model quality so it is extremely important to do at least necessary preprocess for our data before feeding it into our model.

In this assignment, we are going to utilize python to develop a predictive machine learning model. First, we will import some important and necessary libraries.

Below we are can see that there are various numerical and categorical columns. The most important column here is y, which is the output variable (desired target): this will tell us if the client subscribed to a term deposit(binary: ‘yes’,’no’).

https://user-images.githubusercontent.com/91852182/143783456-78c22016-149b-4218-a4a5-765ca348f069.png">

We must to check missing values in our dataset if we do have any and do, we have any duplicated values or not.

https://user-images.githubusercontent.com/91852182/143783471-a8656640-ec57-4f38-8905-35ef6f3e7f30.png">

We can see that in 'age' 9 missing values and 'balance' as well 3 values missed. In this case based that our dataset it has around 45k row I will remove them from dataset. on Pic 1 and 2 you will see before and after.

https://user-images.githubusercontent.com/91852182/143783474-b3898011-98e3-43c8-bd06-2cfcde714694.png">

From the above analysis we can see that only 5289 people out of 45200 have subscribed which is roughly 12%. We can see that our dataset highly unbalanced. we need to take it as a note.

https://user-images.githubusercontent.com/91852182/143783534-a05020a8-611d-4da1-98cf-4fec811cb5d8.png">

Our list of categorical variables.

https://user-images.githubusercontent.com/91852182/143783542-d40006cd-4086-4707-a683-f654a8cb2205.png">

Our list of numerical variables.

https://user-images.githubusercontent.com/91852182/143783551-6b220f99-2c4d-47d0-90ab-18ede42a4ae5.png">

"Age" Q-Q Plots and Box Plot.

In above boxplot we can see that some point in very young age and as well impossible age. So,

https://user-images.githubusercontent.com/91852182/143783564-ad0e2a27-5df5-4e04-b5d7-6d218cabd405.png"> https://user-images.githubusercontent.com/91852182/143783589-5abf0a0b-8bab-4192-98c8-d2e04f32a5c5.png">

Now, we don’t have issues on this feature so we can use it

https://user-images.githubusercontent.com/91852182/143783599-5205eddb-a0f5-446d-9f45-cc1adbfcce67.png"> https://user-images.githubusercontent.com/91852182/143783601-e520d59c-3b21-4627-a9bb-cac06f415a1e.png">

"Duration" Q-Q Plots and Box Plot

https://user-images.githubusercontent.com/91852182/143783634-03e5a584-a6fb-4bcb-8dc5-1f3cc50f9507.png"> https://user-images.githubusercontent.com/91852182/143783640-f6e71323-abbe-49c1-9935-35ffb2d10569.png">

This attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes...
Kazakhstan Candles Import Data 2022
kaggle.com
Updated Sep 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sultan Sagynov (2023). Kazakhstan Candles Import Data 2022 [Dataset]. https://www.kaggle.com/datasets/sultansagynov/kazakhstan-candles-import-data-2022/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sultan Sagynov
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
Kazakhstan
Description
This dataset shows data on the import of candles to Kazakhstan for 2022. Points to analyze:

-**Supplier countries (Production Country Code):** market share in terms of volume and financing

-**Supplier companies (Code of Sender)**: price per kilogram, market share in terms of quantity and financing; type of transportation

The following points are important for understanding the candle market in Kazakhstan:

-**Recipient companies (Name of Receiver)**: market share in terms of volume and financing

Please note that company names may be duplicated in differen ways in the raw data, for example "TOO FLIP.KZ " and "FLIP.KZ ТОО". These two companies are the same, but are considered different because of the misnaming, but their data should be combined. The same instructions should be applied to other receivers with multiple names to get a clear image.

-**Product classification (Product Description)**: which products are the most popular

Again, please note that the names are "messed", one product may be classified differently, which must be classified correctly to see an objective pattern, although this may not be an easy task.

The main task of this project is to determine the most popular type of product, we believe that data analysis and data science tools can help us answer this interesting question.

If you have any questions, please let us know in the comments, we will be happy to discuss them.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Suresh Maheshwari (2025). Cats and Dogs [Dataset]. https://www.kaggle.com/datasets/sureshmaheshwari021/cats-and-dogs

Cats and Dogs

Scraped image dataset for classification

Explore at:

zip(140359211 bytes)Available download formats

Dataset updated

Nov 6, 2025

Authors

Suresh Maheshwari

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

🐾 Cats and Dogs Image Dataset

Scraped image dataset for classification

📘 Overview

This dataset contains images of cats and dogs collected using a custom web scraper from Google Images. It is designed for educational and research purposes, ideal for experimenting with image classification models, transfer learning, or deep learning architectures like CNNs.

📂 Dataset Details

Categories: 🐱 Cats, 🐶 Dogs

Total Images: ~600 (≈300 per class)

Image Type: Real photos only (filtered for quality)

File Format: JPG / PNG

Average Resolution: Around 300×300 px

🎯 Applications

You can use this dataset for:

Training and testing CNN models

Practicing transfer learning with models like ResNet, VGG16, or MobileNet

Exploring data augmentation and preprocessing techniques

Performing EDA (Exploratory Data Analysis) on image datasets

⚠️ Disclaimer

All images were obtained via public web search results and are shared strictly for educational and non-commercial use. Please verify image rights before any commercial application.

📜 License

CC0: Public Domain ✅ Free to use, modify, and share — no attribution required.

Clear search

Close search

Google apps

Main menu

Cats and Dogs

Evaluate AI Models for Breast Cancer Screening & Image Class - EDA

Bird Species Image Classification Dataset

This dataset contains high-quality images of six distinct bird species, curated for use in image classification, computer vision, and biodiversity research tasks. Each bird species included in this dataset is well-represented, making it ideal for training and evaluating deep learning models.

Multi-Class Images for Weather Classification

Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...

Cdd Dataset

Minecraft Block Texture Image Dataset

Description :

Use Cases :

🏍️Pak Bike Image Dataset

Pokémon Cards

Banana Tree Disease Detection New&Update Dataset

Binary Classification Data for Apnea Detection

Binary Classification Data for Sleep Apnea Detection

📋 Dataset Overview

🎯 Purpose

📊 Dataset Statistics

🔬 Data Generation Pipeline

1. Signal Acquisition

2. Continuous Wavelet Transform (CWT)

3. Exploratory Data Analysis (EDA)

4. Intelligent Filtering Criteria

Apnea Images (SNR < 7.5)

Non-Apnea Images (SNR ≥ 7.5)

📁 Dataset Structure

File Naming Convention

🎯 Use Cases

Primary Applications

Research Areas

🚀 Quick Start

Load Dataset in Python

Bipolar vs non Bipolar Handwriting

NASA Mars Rover

gaming_laptops_(2025)_amazon_web_scraping_data

Colombo Cafes 🍵: Ratings & Insights Dataset

Overview

Data Science Applications

Column Descriptors

Ethically Mined Data

Acknowledgements

Image Acknowledgement

Predict Term Deposit

Predict Term Deposit

Introduction

Strategy

Desire Target

Data

Exploratory Data Analysis (EDA)

"Age" Q-Q Plots and Box Plot.

"Duration" Q-Q Plots and Box Plot

Kazakhstan Candles Import Data 2022

Cats and Dogs

Scraped image dataset for classification