Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
🐾 Cats and Dogs Image Dataset
Scraped image dataset for classification
📘 Overview
This dataset contains images of cats and dogs collected using a custom web scraper from Google Images. It is designed for educational and research purposes, ideal for experimenting with image classification models, transfer learning, or deep learning architectures like CNNs.
📂 Dataset Details
Categories: 🐱 Cats, 🐶 Dogs
Total Images: ~600 (≈300 per class)
Image Type: Real photos only (filtered for quality)
File Format: JPG / PNG
Average Resolution: Around 300×300 px
🎯 Applications
You can use this dataset for:
Training and testing CNN models
Practicing transfer learning with models like ResNet, VGG16, or MobileNet
Exploring data augmentation and preprocessing techniques
Performing EDA (Exploratory Data Analysis) on image datasets
⚠️ Disclaimer
All images were obtained via public web search results and are shared strictly for educational and non-commercial use. Please verify image rights before any commercial application.
📜 License
CC0: Public Domain ✅ Free to use, modify, and share — no attribution required.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benchmark and compare 3rd-party AI models for breast cancer screening image classification. Focus on sensitivity, false-positive control and enterprise-grade de
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
| Label | Species Name | Image Count |
|---|---|---|
| 1 | American Goldfinch | 143 |
| 2 | Emperor Penguin | 139 |
| 3 | Downy Woodpecker | 137 |
| 4 | Flamingo | 132 |
| 5 | Carmine Bee-eater | 131 |
| 6 | Barn Owl | 129 |
📂 Dataset Highlights: * Total Images: 811 * Classes: 6 unique bird species * Balanced Labels: Nearly equal distribution across classes * Use Cases: Image classification, model benchmarking, transfer learning, educational projects, biodiversity analysis
🧠 Potential Applications: * Training deep learning models like CNNs for bird species recognition * Fine-tuning pre-trained models using a small and balanced dataset * Educational projects in ornithology and computer vision * Biodiversity and wildlife conservation tech solutions
🛠️ Suggested Tools: * Python (Pandas, NumPy, Matplotlib) * TensorFlow / PyTorch for model development * OpenCV for image preprocessing * Streamlit for creating interactive demos
Facebook
TwitterMulti-class weather dataset(MWD) for image classification is a valuable dataset used in the research paper entitled “Multi-class weather recognition from still image using heterogeneous ensemble method”. The dataset provides a platform for outdoor weather analysis by extracting various features for recognizing different weather conditions.
Research Paper: https://web.cse.ohio-state.edu/~zhang.7804/Cheng_NC2016.pdf
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Documentation: Cucumber Disease Detection
Introduction: A machine learning model for the automatic detection of diseases in cucumber plants is to be developed as part of the "Cucumber Disease Detection" project. This research is crucial because it tackles the issue of early disease identification in agriculture, which can increase crop yield and cut down on financial losses. To train and test the model, we use a dataset of pictures of cucumber plants.
Importance: Early disease diagnosis helps minimize crop losses, stop the spread of diseases, and better allocate resources in farming. Agriculture is a real-world application of this concept.
Goals and Objectives: Develop a machine learning model to classify cucumber plant images into healthy and diseased categories. Achieve a high level of accuracy in disease detection. Provide a tool for farmers to detect diseases early and take appropriate action.
Data Collection: Using cameras and smartphones, images from agricultural areas were gathered.
Data Preprocessing: Data cleaning to remove irrelevant or corrupted images. Handling missing values, if any, in the dataset. Removing outliers that may negatively impact model training. Data augmentation techniques applied to increase dataset diversity.
Exploratory Data Analysis (EDA) The dataset was examined using visuals like scatter plots and histograms. The data was examined for patterns, trends, and correlations. Understanding the distribution of photos of healthy and ill plants was made easier by EDA.
Methodology Machine Learning Algorithms:
Convolutional Neural Networks (CNNs) were chosen for image classification due to their effectiveness in handling image data. Transfer learning using pre-trained models such as ResNet or MobileNet may be considered. Train-Test Split:
The dataset was split into training and testing sets with a suitable ratio. Cross-validation may be used to assess model performance robustly.
Model Development The CNN model's architecture consists of layers, units, and activation operations. On the basis of experimentation, hyperparameters including learning rate, batch size, and optimizer were chosen. To avoid overfitting, regularization methods like dropout and L2 regularization were used.
Model Training During training, the model was fed the prepared dataset across a number of epochs. The loss function was minimized using an optimization method. To ensure convergence, early halting and model checkpoints were used.
Model Evaluation Evaluation Metrics:
Accuracy, precision, recall, F1-score, and confusion matrix were used to assess model performance. Results were computed for both training and test datasets. Performance Discussion:
The model's performance was analyzed in the context of disease detection in cucumber plants. Strengths and weaknesses of the model were identified.
Results and Discussion Key project findings include model performance and disease detection precision. a comparison of the many models employed, showing the benefits and drawbacks of each. challenges that were faced throughout the project and the methods used to solve them.
Conclusion recap of the project's key learnings. the project's importance to early disease detection in agriculture should be highlighted. Future enhancements and potential research directions are suggested.
References Library: Pillow,Roboflow,YELO,Sklearn,matplotlib Datasets:https://data.mendeley.com/datasets/y6d3z6f8z9/1
Code Repository https://universe.roboflow.com/hakuna-matata/cdd-g8a6g
Rafiur Rahman Rafit EWU 2018-3-60-111
Facebook
TwitterA full set of 1,083 block texture images from the legendary game Minecraft, version 1.21.8.
Each image is a .png file representing the visual texture of a block used in the game.
A metadata CSV file (metadata.csv) is also included, containing details for all images such as:
- file_name — the image filename
- block_name — cleaned human-readable block name
- variant — side, top, bottom, or base variant
- avg_color — average RGB color of the block texture
This CSV makes it easier to use the dataset for computer vision, ML projects, clustering, or color analysis.
- Computer Vision Projects : Train image classification models to recognize Minecraft blocks or similar pixelated game textures.
- Generative Models (GANs, Diffusion) : Use as training data for texture synthesis, block-style image generation, or AI art based on Minecraft aesthetics.
- Augmented Reality / Game Modding Tools : Utilize textures in prototyping AR Minecraft-style games or Minecraft modding tools.
- Clustering & Similarity Analysis : Apply unsupervised learning (e.g., K-means, t-SNE) to group similar block textures based on visual features.
- Data Preprocessing & Feature Extraction Practice : Practice converting image datasets into usable features for downstream ML tasks (e.g., flattening, embeddings).
- Exploratory Data Analysis (EDA) : Analyze visual color distribution, texture density, or image metadata to understand visual design patterns in Minecraft.
https://cdn.mos.cms.futurecdn.net/v2/t:0,l:0,cw:1920,ch:1080,q:80,w:1920/eYFCTxvu8Gq63C2eWZANBT.jpg" alt="">
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Bikes are annotated in Tensorflow Object Detection format:
The dataset consists of a total of 184 images, and two CSV files for annotation purposes accompany it. These images have been meticulously annotated in the Tensorflow Object Detection format, providing valuable information for object recognition tasks. Notably, this dataset predominantly features images of bikes from Pakistan, making it a valuable resource for research and applications related to Pakistani bike recognition and classification.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Pokémon Card Dataset offers detailed information about various Pokémon cards, including images, attributes, and descriptions. This dataset is perfect for analyzing Pokémon card features, developing machine learning models, and enhancing gaming experiences.
Dataset Details:
How to Use This Dataset:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset is a collection of images representing various conditions of bananas, specifically aimed at training machine learning models for image classification or augmentation tasks. The dataset is organized into multiple subfolders, each representing a different condition or class of bananas. These classes include:
Healthy Bananas Bananas with Fusarium Wilt Bananas with Natural Leaf Death Bananas with Rhizome Root Issues Each image in the dataset is initially stored in its respective class folder and typically contains a banana or bananas under different conditions, viewed from different angles, and possibly with varying levels of resolution or lighting.
The dataset is then processed for various machine learning tasks like classification, detection, or augmentation. Specifically, this dataset is aimed at providing a variety of augmented images to ensure a more robust training set, which is critical for improving the generalization performance of machine learning models.
Related : Shuvo, Shuvo Kumar Basak (2025), “Banana_Tree_Disease_Detection_Dataset(BTDDD)”, Mendeley Data, V2, doi: 10.17632/vp2xnb8zmb.2
I, Shuvo Kumar Basak, have created and curated the Dataset. This dataset is freely available for research, educational, and non-commercial purposes.
Free Access to the Dataset: This is available free of charge to all individuals and organizations for educational and research use. This is to support the advancement of knowledge and studies related to biodiversity, machine learning, and related fields.
Future Collaboration and Data Requests: While the dataset is provided free of charge, I encourage individuals and organizations to contact me directly if they need access to additional related data, further assistance, or if they plan on expanding their research in the future.
If you require any new data or specific related datasets, feel free to reach out to me, Shuvo Kumar Basak, for collaboration. I am happy to assist with additional data collection, cleaning, resizing, or other related services at a reasonable cost.
Paid Services - Hire for Data Collection: If you or your organization need custom data collection or wish to obtain related datasets beyond what is included in this collection, I offer a paid service to gather new data according to your specific requirements. This includes: Custom data collection for other tree species or related botanical data.
Data cleaning, resizing, and preprocessing to make the data ready for analysis.
Please contact me for a custom quote based on your specific needs. I will work with you to provide high-quality, tailored datasets to support your research, project, or business needs. Terms and Conditions: The dataset is intended for academic, research, and non-commercial purposes only. Redistribution or commercial use of the dataset without prior written consent is not permitted. Proper attribution to Shuvo Kumar Basak as the creator of the dataset should be provided when using the dataset in publications, projects, or other works.
**More Dataset:: ** 1. https://www.kaggle.com/shuvokumarbasak4004/datasets 2. https://www.kaggle.com/shuvokumarbasak2030 …………………………………..Note for Researchers Using the dataset………………………………………………………………………
This dataset was created by Shuvo Kumar Basak. If you use this dataset for your research or academic purposes, please ensure to cite this dataset appropriately. If you have published your research using this dataset, please share a link to your paper. Good Luck.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains high-quality, preprocessed CWT (Continuous Wavelet Transform) scalogram images for binary classification of sleep apnea events from single-lead ECG signals. The dataset is derived from the PhysioNet Apnea-ECG Database and has been carefully filtered using multiple quality metrics to ensure optimal performance in deep learning models.
This dataset was created to train and evaluate the DREAM (Deep Residual-Enabled Apnea Monitor) model, achieving 99.93% accuracy in sleep apnea detection. It is ideal for: - Sleep apnea detection research - ECG-based biosignal classification - Medical image classification tasks - Explainable AI in healthcare - Benchmark comparisons with state-of-the-art models
| Metric | Value |
|---|---|
| Total Images | Balanced dataset |
| Image Format | PNG |
| Image Dimensions | 128 × 180 pixels |
| Color Channels | Grayscale (1 channel) |
| Classes | 2 (Apnea, Non-Apnea) |
| Source | PhysioNet Apnea-ECG Database |
| Preprocessing | CWT + Multi-metric filtering |
| Quality Assurance | SNR, Entropy, Contrast, Skewness, Kurtosis filtering |
Comprehensive analysis using multiple quality metrics: - Signal-to-Noise Ratio (SNR): Signal quality assessment - Entropy: Information content measurement - Contrast: Visual distinction evaluation - Skewness: Distribution asymmetry - Kurtosis: Distribution peakedness - Intensity Ranges: Pixel value distribution
Scientific Justification: - Machine Learning Theory: Clean class boundaries improve classification accuracy by eliminating "grey zone" samples - Clinical Decision-Making: Prioritizes reducing false negatives in diagnostic contexts - Biosignal Processing: Standard practice for medical signal quality control
Binary_Classification_Apnea/
├── apnea/ # Apnea event images
│ ├── image_001.png
│ ├── image_002.png
│ └── ...
└── non_apnea/ # Normal breathing images
├── image_001.png
├── image_002.png
└── ...
import os
import numpy as np
from PIL import Image
from sklearn.model_s...
Facebook
TwitterBipolar is a psychological mental disorder. In this particular search, I have analysed the handwriting of bipolar disorder by image processing techniques. All Image data has been gathered from the particular survey. For finding what is Bipolar general definition please check the nature article: https://www.nature.com/articles/s41380-021-01091-4 The language of handwriting is Persian.
The paper on this dataset is available in bellow link: https://doi.org/10.22060/ajmc.2024.22576.1176
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Title: Exploring Mars: A Comprehensive Dataset of Rover Photos and Metadata Description
This dataset provides an extensive collection of Mars rover images paired with in-depth metadata. Sourced from various Mars missions, this dataset is a treasure trove for anyone interested in space exploration, planetary science, or computer vision.
Components:
Dataset Origin
The dataset was compiled from various Mars missions conducted over the years. Special care has been taken to include a diverse set of images to enable a wide range of analyses and applications. Objective
As a learner delving into the field of Computer Vision, my objectives for this project are multi-fold:
Research Questions
Tools and Technologies
I plan to utilize Python for this project, particularly libraries like OpenCV for image processing, Pandas for data manipulation, and Matplotlib/Seaborn for data visualization. For machine learning tasks, I will likely use scikit-learn or TensorFlow.
Learning and Development
This project serves as both a learning exercise and a stepping stone toward more complex computer vision projects. I aim to document my learning journey, challenges, and milestones in a series of Kaggle notebooks. Collaboration and Feedback
I warmly invite the Kaggle community to offer suggestions, critiques, or even collaborate on this venture. Your insights could be invaluable in enhancing the depth and breadth of this project.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains curated information about gaming laptops listed on Amazon India under a ₹1,00,000 budget, collected via web scraping in 2024. It includes laptop titles, pricing details, customer ratings, discounts, and review counts — useful for anyone looking to analyze laptop market trends, price-performance ratios, or build recommendation models.
The data has been cleaned and formatted for easy analysis, with all pricing and ratings converted to numerical format. Whether you're a gamer on a budget or a data analyst exploring tech e-commerce, this dataset is ready for EDA.
| --------------------- | ---------------------------------------- |
| Product Title | Full name of the laptop (brand + specs) |
| Product Price | Final price in INR (cleaned) |
| Original Price | Original price before discount |
| Discount Percentage | Discount percentage (if any) |
| Product Rating | Average customer rating (out of 5) |
| Number of Ratings | Total number of user ratings |
| Product Image URL | Link to the product image (optional use) |
🔍 Use Cases 📈 Price trend analysis on Indian laptops
⚖️ Budget vs specs comparison
🛒 E-commerce analytics for gaming laptops
🤖 Input for machine learning models (recommendation/classification)
📊 EDA or visualization dashboards
⚠️ Disclaimer This dataset was created for educational and research purposes only. It contains publicly available data collected from Amazon.in. The author does not claim ownership of any product information. This work is not affiliated with or endorsed by Amazon.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The "Colombo Cafes 🍵: Ratings & Insights Dataset" offers a detailed exploration of the café scene in Colombo, designed to cater to a variety of data science projects. This dataset is primed for in-depth analyses, from understanding consumer preferences to predictive modeling despite its concise size.
This dataset provides a snapshot of Colombo's vibrant café culture, encapsulating key data points that reflect the diversity and richness of the city's coffee spots. It serves as an essential tool for those looking to delve into the dynamics of the café industry in Colombo.
The dataset's structured format and comprehensive data points make it an ideal candidate for a range of data science applications. Researchers and analysts can employ this dataset for exploratory data analysis, sentiment analysis, trend identification, and even for developing sophisticated machine learning models aimed at predicting café popularity or customer preferences.
After refinement, the dataset comprises several key columns: - title: The unique name of each café, crucial for identifying and distinguishing between different establishments. - totalScore: The average customer rating for each café, on a scale from 1 to 5, with missing values imputed with the dataset's mean score to ensure data consistency. - reviewsCount: The number of reviews for each café, providing insights into customer engagement; missing values are set to zero. - street, city, countryCode: These columns offer detailed location information for each café, enabling geographic analyses and mapping. - website: The official website URL for the café, where available, providing a direct link for further information. - phone: The contact number for each café, facilitating communication and inquiries. - categoryName: The classification of the establishment (e.g., Café, Coffee Shop), useful for segmenting and analyzing the data.
This dataset is compiled with strict adherence to ethical data mining practices, ensuring the privacy and confidentiality of all sourced information while maintaining high standards of data accuracy and integrity.
Gratitude is extended to platforms such as Google, whose repositories of user-generated content have been invaluable in assembling this detailed dataset. Their contribution has been instrumental in capturing the essence of Colombo's café culture.
The dataset is enhanced by visual elements that depict the ambiance and aesthetic of Colombo's cafés, with specific recognition given to an image accessible here, enriching the narrative and offering a visual representation of the data contained within.
Facebook
TwitterBank has multiple banking products that it sells to customer such as saving account, credit cards, investments etc. It wants to which customer will purchase its credit cards. For the same it has various kind of information regarding the demographic details of the customer, their banking behavior etc. Once it can predict the chances that customer will purchase a product, it wants to use the same to make pre-payment to the authors.
In this part I will demonstrate how to build a model, to predict which clients will subscribing to a term deposit, with inception of machine learning. In the first part we will deal with the description and visualization of the analysed data, and in the second we will go to data classification models.
-Desire target -Data Understanding -Preprocessing Data -Machine learning Model -Prediction -Comparing Results
Predict if a client will subscribe (yes/no) to a term deposit — this is defined as a classification problem.
The dataset (Assignment-2_data.csv) used in this assignment contains bank customers’ data. File name: Assignment-2_Data File format: . csv Numbers of Row: 45212 Numbers of Attributes: 17 non- empty conditional attributes attributes and one decision attribute.
https://user-images.githubusercontent.com/91852182/143783430-eafd25b0-6d40-40b8-ac5b-1c4f67ca9e02.png">
https://user-images.githubusercontent.com/91852182/143783451-3e49b817-29a6-4108-b597-ce35897dda4a.png">
Data pre-processing is a main step in Machine Learning as the useful information which can be derived it from data set directly affects the model quality so it is extremely important to do at least necessary preprocess for our data before feeding it into our model.
In this assignment, we are going to utilize python to develop a predictive machine learning model. First, we will import some important and necessary libraries.
Below we are can see that there are various numerical and categorical columns. The most important column here is y, which is the output variable (desired target): this will tell us if the client subscribed to a term deposit(binary: ‘yes’,’no’).
https://user-images.githubusercontent.com/91852182/143783456-78c22016-149b-4218-a4a5-765ca348f069.png">
We must to check missing values in our dataset if we do have any and do, we have any duplicated values or not.
https://user-images.githubusercontent.com/91852182/143783471-a8656640-ec57-4f38-8905-35ef6f3e7f30.png">
We can see that in 'age' 9 missing values and 'balance' as well 3 values missed. In this case based that our dataset it has around 45k row I will remove them from dataset. on Pic 1 and 2 you will see before and after.
https://user-images.githubusercontent.com/91852182/143783474-b3898011-98e3-43c8-bd06-2cfcde714694.png">
From the above analysis we can see that only 5289 people out of 45200 have subscribed which is roughly 12%. We can see that our dataset highly unbalanced. we need to take it as a note.
https://user-images.githubusercontent.com/91852182/143783534-a05020a8-611d-4da1-98cf-4fec811cb5d8.png">
Our list of categorical variables.
https://user-images.githubusercontent.com/91852182/143783542-d40006cd-4086-4707-a683-f654a8cb2205.png">
Our list of numerical variables.
https://user-images.githubusercontent.com/91852182/143783551-6b220f99-2c4d-47d0-90ab-18ede42a4ae5.png">
In above boxplot we can see that some point in very young age and as well impossible age. So,
https://user-images.githubusercontent.com/91852182/143783564-ad0e2a27-5df5-4e04-b5d7-6d218cabd405.png">
https://user-images.githubusercontent.com/91852182/143783589-5abf0a0b-8bab-4192-98c8-d2e04f32a5c5.png">
Now, we don’t have issues on this feature so we can use it
https://user-images.githubusercontent.com/91852182/143783599-5205eddb-a0f5-446d-9f45-cc1adbfcce67.png">
https://user-images.githubusercontent.com/91852182/143783601-e520d59c-3b21-4627-a9bb-cac06f415a1e.png">
https://user-images.githubusercontent.com/91852182/143783634-03e5a584-a6fb-4bcb-8dc5-1f3cc50f9507.png">
https://user-images.githubusercontent.com/91852182/143783640-f6e71323-abbe-49c1-9935-35ffb2d10569.png">
This attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes...
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset shows data on the import of candles to Kazakhstan for 2022. Points to analyze:
-**Supplier countries (Production Country Code):** market share in terms of volume and financing
-**Supplier companies (Code of Sender)**: price per kilogram, market share in terms of quantity and financing; type of transportation
The following points are important for understanding the candle market in Kazakhstan:
-**Recipient companies (Name of Receiver)**: market share in terms of volume and financing
Please note that company names may be duplicated in differen ways in the raw data, for example "TOO FLIP.KZ " and "FLIP.KZ ТОО". These two companies are the same, but are considered different because of the misnaming, but their data should be combined. The same instructions should be applied to other receivers with multiple names to get a clear image.
-**Product classification (Product Description)**: which products are the most popular
Again, please note that the names are "messed", one product may be classified differently, which must be classified correctly to see an objective pattern, although this may not be an easy task.
The main task of this project is to determine the most popular type of product, we believe that data analysis and data science tools can help us answer this interesting question.
If you have any questions, please let us know in the comments, we will be happy to discuss them.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
🐾 Cats and Dogs Image Dataset
Scraped image dataset for classification
📘 Overview
This dataset contains images of cats and dogs collected using a custom web scraper from Google Images. It is designed for educational and research purposes, ideal for experimenting with image classification models, transfer learning, or deep learning architectures like CNNs.
📂 Dataset Details
Categories: 🐱 Cats, 🐶 Dogs
Total Images: ~600 (≈300 per class)
Image Type: Real photos only (filtered for quality)
File Format: JPG / PNG
Average Resolution: Around 300×300 px
🎯 Applications
You can use this dataset for:
Training and testing CNN models
Practicing transfer learning with models like ResNet, VGG16, or MobileNet
Exploring data augmentation and preprocessing techniques
Performing EDA (Exploratory Data Analysis) on image datasets
⚠️ Disclaimer
All images were obtained via public web search results and are shared strictly for educational and non-commercial use. Please verify image rights before any commercial application.
📜 License
CC0: Public Domain ✅ Free to use, modify, and share — no attribution required.