100+ datasets found

R
Tree Classification My Project Dataset
universe.roboflow.com
zip
Updated Jun 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Machine learning engineer (2025). Tree Classification My Project Dataset [Dataset]. https://universe.roboflow.com/machine-learning-engineer-vwukw/tree-classification-my-project/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 12, 2025
Dataset authored and provided by
Machine learning engineer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Trees Class
Description
Tree Classification My Project

## Overview Tree Classification My Project is a dataset for classification tasks - it contains Trees Class annotations for 554 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Classify Project Dataset
universe.roboflow.com
zip
Updated Sep 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
project (2024). Classify Project Dataset [Dataset]. https://universe.roboflow.com/project-3sqgw/classify-project
Explore at:
zipAvailable download formats
Dataset updated
Sep 10, 2024
Dataset authored and provided by
project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Disease
Description
Classify Project

## Overview Classify Project is a dataset for classification tasks - it contains Disease annotations for 675 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Trash Classification Project Dataset
universe.roboflow.com
zip
Updated Oct 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
autonomous (2024). Trash Classification Project Dataset [Dataset]. https://universe.roboflow.com/autonomous-u9gdl/trash-classification-project
Explore at:
zipAvailable download formats
Dataset updated
Oct 20, 2024
Dataset authored and provided by
autonomous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Trash Bounding Boxes
Description
Trash Classification Project

## Overview Trash Classification Project is a dataset for object detection tasks - it contains Trash annotations for 404 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Data from: Into the ML-universe: An Improved Classification and...
figshare.com
zip
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vincenzo De Martino; Gilberto Recupito; Giammaria Giordano; Filomena Ferrucci; Dario Di Nucci; Fabio Palomba (2025). Into the ML-universe: An Improved Classification and Characterization of Machine-Learning Projects [Dataset]. http://doi.org/10.6084/m9.figshare.25974886.v9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25974886.v9
Dataset updated
Mar 12, 2025
Dataset provided by
figshare
Authors
Vincenzo De Martino; Gilberto Recupito; Giammaria Giordano; Filomena Ferrucci; Dario Di Nucci; Fabio Palomba
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Replication package related to the paper "Into the ML-universe: An Improved Classification and Characterization of Machine-Learning Projects" which includes the results of the various steps of our study with related plots, and the tool we built to classify our projects.
R
Classification Project Dataset
universe.roboflow.com
zip
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FYP (2025). Classification Project Dataset [Dataset]. https://universe.roboflow.com/fyp-7ytaq/classification-project-h6wjs/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
FYP
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cells
Description
Classification Project

## Overview Classification Project is a dataset for classification tasks - it contains Cells annotations for 3,069 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
project-1-location-classification-dataset
huggingface.co
Updated Oct 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madhav Karthikeyakannan (2025). project-1-location-classification-dataset [Dataset]. https://huggingface.co/datasets/madhavkarthi/project-1-location-classification-dataset
Explore at:
Dataset updated
Oct 9, 2025
Authors
Madhav Karthikeyakannan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Location Classification Dataset

Dataset Summary

This dataset contains images extracted from videos for scene classification into 4 categories:

Cafe Gym Library Outdoor

Purpose

The dataset was created as part of a course project to perform location classification from an input image of the user's surroundings. The dataset represents real-world indoor and outdoor environments with varying lighting conditions, angles, and compositions.

Composition… See the full description on the dataset page: https://huggingface.co/datasets/madhavkarthi/project-1-location-classification-dataset.
Animal Image Classification – 5 Species
kaggle.com
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arif Miah (2025). Animal Image Classification – 5 Species [Dataset]. https://www.kaggle.com/datasets/miadul/animal-image-classification-5-species
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arif Miah
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
🐾 Animal Image Classification Dataset – 5 Classes

📌 About the Dataset

This dataset contains high-quality images of 5 different animal species: Cat, Cow, Lion, Deer, Dog — commonly used in beginner-level computer vision tasks such as image classification and model benchmarking.

Total Classes: 5

Images per Class: 100

Total Images: 500

Image Source: Public images downloaded from the internet (Google Images, Bing, etc.)

Each image has been manually selected to ensure clarity and proper labeling.

📁 Dataset Structure

The dataset is organized into three main folders:

animals_dataset/ ├── train/ ├── val/ └── test/

Each of these folders contains 5 subfolders (one for each class):

├── cat/ ├── cow/ ├── lion/ ├── deer/ └── dog/

Training Set: 70 images per class (70%)

Validation Set: 15 images per class (15%)

Test Set: 15 images per class (15%)

All images are in .jpg format and have been resized to a consistent shape (e.g., 224x224) for ease of use in deep learning models.

🔍 Use Cases

This dataset is ideal for:

Beginner-level image classification projects

CNN model training and evaluation

Transfer learning with pretrained models

Model benchmarking on a small, clean dataset

⚠️ Disclaimer

All images in this dataset are sourced from publicly available internet sources for educational and non-commercial research purposes only. If you are the owner of any image and wish to request removal, please contact us.
Data from: Metadata Classification Machine Learning Data
osti.gov
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Collier, Hannah; Enright, Eric (2024). Metadata Classification Machine Learning Data [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2446583
Explore at:
Dataset updated
Sep 18, 2024
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
Atmospheric Radiation Measurement (ARM) Archive, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (US); ARM Data Center, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Authors
Collier, Hannah; Enright, Eric
Description
This GitLab project contains the training data that was used for the metadata machine learning classification project.
CS 4375 Term Project - Classification
kaggle.com
Updated Dec 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bach Nguyen (2020). CS 4375 Term Project - Classification [Dataset]. https://www.kaggle.com/bachnguyentfk/cs-4375-term-project-classification/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bach Nguyen
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Content

CS 4375 term project data compilation, labeled and converted to .csv
d
Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xverum, Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training [Dataset]. https://datarade.ai/data-products/xverum-company-data-b2b-data-belgium-netherlands-denm-xverum
Explore at:
.json, .csvAvailable download formats
Dataset provided by
Xverum LLC
Authors
Xverum
Area covered
Western Sahara, Dominican Republic, Norway, Barbados, India, Sint Maarten (Dutch part), Cook Islands, United Kingdom, Oman, Jordan
Description
Xverum’s AI & ML Training Data provides one of the most extensive datasets available for AI and machine learning applications, featuring 800M B2B profiles with 100+ attributes. This dataset is designed to enable AI developers, data scientists, and businesses to train robust and accurate ML models. From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries and use cases with unparalleled scale, depth, and quality.

What Makes Our Data Unique?

Scale and Coverage: - A global dataset encompassing 800M B2B profiles from a wide array of industries and geographies. - Includes coverage across the Americas, Europe, Asia, and other key markets, ensuring worldwide representation.

Rich Attributes for Training Models: - Over 100 fields of detailed information, including company details, job roles, geographic data, industry categories, past experiences, and behavioral insights. - Tailored for training models in NLP, recommendation systems, and predictive algorithms.

Compliance and Quality: - Fully GDPR and CCPA compliant, providing secure and ethically sourced data. - Extensive data cleaning and validation processes ensure reliability and accuracy.

Annotation-Ready: - Pre-structured and formatted datasets that are easily ingestible into AI workflows. - Ideal for supervised learning with tagging options such as entities, sentiment, or categories.

How Is the Data Sourced? - Publicly available information gathered through advanced, GDPR-compliant web aggregation techniques. - Proprietary enrichment pipelines that validate, clean, and structure raw data into high-quality datasets. This approach ensures we deliver comprehensive, up-to-date, and actionable data for machine learning training.

Primary Use Cases and Verticals

Natural Language Processing (NLP): Train models for named entity recognition (NER), text classification, sentiment analysis, and conversational AI. Ideal for chatbots, language models, and content categorization.

Predictive Analytics and Recommendation Systems: Enable personalized marketing campaigns by predicting buyer behavior. Build smarter recommendation engines for ecommerce and content platforms.

B2B Lead Generation and Market Insights: Create models that identify high-value leads using enriched company and contact information. Develop AI systems that track trends and provide strategic insights for businesses.

HR and Talent Acquisition AI: Optimize talent-matching algorithms using structured job descriptions and candidate profiles. Build AI-powered platforms for recruitment analytics.

How This Product Fits Into Xverum’s Broader Data Offering Xverum is a leading provider of structured, high-quality web datasets. While we specialize in B2B profiles and company data, we also offer complementary datasets tailored for specific verticals, including ecommerce product data, job listings, and customer reviews. The AI Training Data is a natural extension of our core capabilities, bridging the gap between structured data and machine learning workflows. By providing annotation-ready datasets, real-time API access, and customization options, we ensure our clients can seamlessly integrate our data into their AI development processes.

Why Choose Xverum? - Experience and Expertise: A trusted name in structured web data with a proven track record. - Flexibility: Datasets can be tailored for any AI/ML application. - Scalability: With 800M profiles and more being added, you’ll always have access to fresh, up-to-date data. - Compliance: We prioritize data ethics and security, ensuring all data adheres to GDPR and other legal frameworks.

Ready to supercharge your AI and ML projects? Explore Xverum’s AI Training Data to unlock the potential of 800M global B2B profiles. Whether you’re building a chatbot, predictive algorithm, or next-gen AI application, our data is here to help.

Contact us for sample datasets or to discuss your specific needs.
The data for "The ZTF Source Classification Project: III. A Catalog of...
zenodo.org
bin, csv, json, png +1
Updated Nov 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian F. Healy; Brian F. Healy; Michael W. Coughlin; Michael W. Coughlin; Ashish A. Mahabal; Ashish A. Mahabal; Theophile Jegou du Laz; Theophile Jegou du Laz; Andrew Drake; Matthew J. Graham; Lynne A. Hillenbrand; Jan van Roestel; Jan van Roestel; Paula Szkody; Paula Szkody; LeighAnna Zielske; Mohammed Guiga; Muhammad Yusuf Hassan; Jill L. Hughes; Guy Nir; Saagar Parikh; Sungmin Park; Palak Purohit; Umaa Rebbapragada; Draco Reed; Daniel Warshofsky; Avery Wold; Avery Wold; Joshua S. Bloom; Joshua S. Bloom; Frank J. Masci; Frank J. Masci; Reed Riddle; Reed Riddle; Roger Smith; Roger Smith; Andrew Drake; Matthew J. Graham; Lynne A. Hillenbrand; LeighAnna Zielske; Mohammed Guiga; Muhammad Yusuf Hassan; Jill L. Hughes; Guy Nir; Saagar Parikh; Sungmin Park; Palak Purohit; Umaa Rebbapragada; Draco Reed; Daniel Warshofsky (2024). The data for "The ZTF Source Classification Project: III. A Catalog of Variable Sources" [Dataset]. http://doi.org/10.5281/zenodo.14155156
Explore at:
png, bin, zip, json, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14155156
Dataset updated
Nov 13, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Brian F. Healy; Brian F. Healy; Michael W. Coughlin; Michael W. Coughlin; Ashish A. Mahabal; Ashish A. Mahabal; Theophile Jegou du Laz; Theophile Jegou du Laz; Andrew Drake; Matthew J. Graham; Lynne A. Hillenbrand; Jan van Roestel; Jan van Roestel; Paula Szkody; Paula Szkody; LeighAnna Zielske; Mohammed Guiga; Muhammad Yusuf Hassan; Jill L. Hughes; Guy Nir; Saagar Parikh; Sungmin Park; Palak Purohit; Umaa Rebbapragada; Draco Reed; Daniel Warshofsky; Avery Wold; Avery Wold; Joshua S. Bloom; Joshua S. Bloom; Frank J. Masci; Frank J. Masci; Reed Riddle; Reed Riddle; Roger Smith; Roger Smith; Andrew Drake; Matthew J. Graham; Lynne A. Hillenbrand; LeighAnna Zielske; Mohammed Guiga; Muhammad Yusuf Hassan; Jill L. Hughes; Guy Nir; Saagar Parikh; Sungmin Park; Palak Purohit; Umaa Rebbapragada; Draco Reed; Daniel Warshofsky
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The classification of variable objects provides insight into a wide variety of astrophysics ranging from stellar interiors to galactic nuclei. The Zwicky Transient Facility (ZTF) provides time series observations that record the variability of more than a billion sources. The scale of these data necessitates automated approaches to make a thorough analysis. Building on previous work, this paper reports the results of the ZTF Source Classification Project (SCoPe), which trains neural network and XGBoost machine learning (ML) algorithms to perform dichotomous classification of variable ZTF sources using a manually constructed training set containing 170,632 light curves. We find that several classifiers achieve high precision and recall scores, suggesting the reliability of their predictions for 373,819,334 light curves across 210 ZTF fields. We also identify the most important features for XGB classification and compare the performance of the two ML algorithms, finding a pattern of higher precision among XGB classifiers. The resulting classification catalog is available to the public, and the software developed for SCoPe is open-source and adaptable to future time-domain surveys.
R
Iopa Classification Project Dataset
universe.roboflow.com
zip
Updated Sep 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dental model train (2025). Iopa Classification Project Dataset [Dataset]. https://universe.roboflow.com/dental-model-train/iopa-classification-project-uawxm
Explore at:
zipAvailable download formats
Dataset updated
Sep 11, 2025
Dataset authored and provided by
Dental model train
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Objects
Description
IOPA Classification Project

## Overview IOPA Classification Project is a dataset for classification tasks - it contains Objects annotations for 925 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
UCI and OpenML Data Sets for Ordinal Quantification
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8177302
Dataset updated
Jul 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

Usage

You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

Data Extraction: In your terminal, you can call either

make

(recommended), or

julia --project="." --eval "using Pkg; Pkg.instantiate()" julia --project="." extract-oq.jl

Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

Further Reading

Implementation of our experiments: https://github.com/mirkobunse/regularized-oq
Z
Curlie Enhanced with LLM Annotations: Two Datasets for Advancing...
data.niaid.nih.gov
zenodo.org
Updated Dec 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cizinsky, Ludek (2023). Curlie Enhanced with LLM Annotations: Two Datasets for Advancing Homepage2Vec's Multilingual Website Classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10413067
Explore at:
Dataset updated
Dec 21, 2023
Dataset provided by
Nutter, Peter
Senghaas, Mika
Cizinsky, Ludek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Advancing Homepage2Vec with LLM-Generated Datasets for Multilingual Website Classification

This dataset contains two subsets of labeled website data, specifically created to enhance the performance of Homepage2Vec, a multi-label model for website classification. The datasets were generated using Large Language Models (LLMs) to provide more accurate and diverse topic annotations for websites, addressing a limitation of existing Homepage2Vec training data.

Key Features:

LLM-generated annotations: Both datasets feature website topic labels generated using LLMs, a novel approach to creating high-quality training data for website classification models.

Improved multi-label classification: Fine-tuning Homepage2Vec with these datasets has been shown to improve its macro F1 score from 38% to 43% evaluated on a human-labeled dataset, demonstrating their effectiveness in capturing a broader range of website topics.

Multilingual applicability: The datasets facilitate classification of websites in multiple languages, reflecting the inherent multilingual nature of Homepage2Vec.

Dataset Composition:

curlie-gpt3.5-10k: 10,000 websites labeled using GPT-3.5, context 2 and 1-shot

curlie-gpt4-10k: 10,000 websites labeled using GPT-4, context 2 and zero-shot

Intended Use:

Fine-tuning and advancing Homepage2Vec or similar website classification models

Research on LLM-generated datasets for text classification tasks

Exploration of multilingual website classification

Additional Information:

Project and report repository: https://github.com/CS-433/ml-project-2-mlp

Acknowledgments:

This dataset was created as part of a project at EPFL's Data Science Lab (DLab) in collaboration with Prof. Robert West and Tiziano Piccardi.
English Tense Classification
kaggle.com
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hafizflow (2024). English Tense Classification [Dataset]. http://doi.org/10.34740/kaggle/dsv/8693486
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/8693486
Dataset updated
Jun 14, 2024
Dataset provided by
Kaggle
Authors
hafizflow
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset comprises English sentences labeled with their corresponding tense categories. It is intended for use in natural language processing (NLP) and machine learning projects to classify the tense of English sentences. Each entry includes a sentence and a numerical label representing its tense.
h
Code-comment-classification
huggingface.co
opendatalab.com
Updated Mar 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pooja Rani (2023). Code-comment-classification [Dataset]. https://huggingface.co/datasets/poojaruhal/Code-comment-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2023
Authors
Pooja Rani
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for Code Comment Classification

Dataset Summary

The dataset contains class comments extracted from various big and diverse open-source projects of three programming languages Java, Smalltalk, and Python.

Supported Tasks and Leaderboards

Single-label text classification and Multi-label text classification

Languages

Java, Python, Smalltalk

Dataset Structure Data Instances

{ "class" : "Absy.java", "comment":"*… See the full description on the dataset page: https://huggingface.co/datasets/poojaruhal/Code-comment-classification.
Fruit Classification dataset
kaggle.com
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JIS College of Engineering (2025). Fruit Classification dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11818125
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11818125
Dataset updated
May 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
JIS College of Engineering
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Fruit Classification and Freshness Detection Dataset 🔍 Overview This dataset has been meticulously curated to facilitate research and development in the domain of fruit classification and freshness detection using advanced deep learning techniques. It is designed to support the creation of hybrid models that integrate YOLOv8 for real-time object detection with Convolutional Neural Networks (CNNs) for assessing fruit freshness. The dataset encompasses a diverse range of images captured under varying lighting conditions and angles, simulating real-world scenarios such as grocery stores, farms, and storage facilities.

The dataset comprises 8,099 high-resolution images of three commonly consumed fruits—apples, bananas, and oranges—each categorized into fresh and rotten conditions. Every image has been manually annotated in the YOLO format to aid object detection tasks and labeled for binary classification (Fresh/Rotten), enabling comprehensive model training.

📁 Dataset Structure Total Images: 8,099

Training Set: 6,508 images (80%)

Test Set: 1,591 images (20%)

Classes (6 total):

Fresh Apples

Rotten Apples

Fresh Bananas

Rotten Bananas

Fresh Oranges

Rotten Oranges

Annotations: Provided in YOLO format using LabelImg

Image Format: JPG, resized to 300x300 pixels

Captured With: Smartphone camera under varied lighting and angles

🧠 Applications This dataset is ideal for:

Object Detection using YOLOv8

Freshness classification using CNN

Hybrid models combining detection and classification

Computer vision projects in smart agriculture, food safety, and automated retail systems

📊 Sample Use Case A hybrid deep learning model utilizing this dataset achieved:

Object Detection (YOLOv8):

mAP@0.5: 98%

mAP@0.5:0.95: 87%

Freshness Classification (CNN):

Test Accuracy: 97.6%

These results underscore the dataset’s suitability for high-performance, real-time AI applications in agricultural automation and food quality assessment.

👨‍💻 Contributors Prof. Shubhashree Sahoo

Dr. Sitanath Biswas

Mr. Shubham Kumar Sah

Mr. Chirag Nahata

Special thanks to Dr. Soumobroto Saha and Prof. (Dr.) Partha Sarkar for their invaluable guidance and support throughout this research endeavor.
Chicago Crime with Climate Data, 2021
kaggle.com
Updated Dec 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Rozenberg (2021). Chicago Crime with Climate Data, 2021 [Dataset]. https://www.kaggle.com/datasets/markrozenberg/chicago-crime-with-climate-data-2021
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 24, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mark Rozenberg
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Chicago
Description
In this project I used machine learning and deep learning multiclass classification algorithms to predict types of crime commited in the city of Chicago in 2021. Moreover, I added weather data as features to the models with hope that the last will enrich the models and improve predictions.

project page on GitHub:

https://github.com/Mark-Rozenberg/Crime-And-Climate
Data from: Gravity Spy Volunteer Classifications of LIGO Glitches from...
zenodo.org
explore.openaire.eu
bin
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Zevin; Michael Zevin; Scott Coughlin; Eve Chase; Sara Allen; Sara Bahaadini; Christopher Berry; Kevin Crowston; Mabi Harandi; Corey Jackson; Vicky Kalogera; Aggelos Katsaggelos; Carsten Osterlund; Oli Patane; Neda Rohani; Joshua Smith; Siddharth Soni; Laura Trouille; Scott Coughlin; Eve Chase; Sara Allen; Sara Bahaadini; Christopher Berry; Kevin Crowston; Mabi Harandi; Corey Jackson; Vicky Kalogera; Aggelos Katsaggelos; Carsten Osterlund; Oli Patane; Neda Rohani; Joshua Smith; Siddharth Soni; Laura Trouille (2022). Gravity Spy Volunteer Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b [Dataset]. http://doi.org/10.5281/zenodo.5911227
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5911227
Dataset updated
Jan 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Zevin; Michael Zevin; Scott Coughlin; Eve Chase; Sara Allen; Sara Bahaadini; Christopher Berry; Kevin Crowston; Mabi Harandi; Corey Jackson; Vicky Kalogera; Aggelos Katsaggelos; Carsten Osterlund; Oli Patane; Neda Rohani; Joshua Smith; Siddharth Soni; Laura Trouille; Scott Coughlin; Eve Chase; Sara Allen; Sara Bahaadini; Christopher Berry; Kevin Crowston; Mabi Harandi; Corey Jackson; Vicky Kalogera; Aggelos Katsaggelos; Carsten Osterlund; Oli Patane; Neda Rohani; Joshua Smith; Siddharth Soni; Laura Trouille
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains machine learning and volunteer classifications from the Gravity Spy project. It includes glitches from observing runs O1, O2, O3a and O3b that received at least one classification from a registered volunteer in the project. It also indicates glitches that are nominally retired from the project using our default set of retirement parameters, which are described below. See more details in the Gravity Spy Methods paper.

When a particular subject in a citizen science project (in this case, glitches from the LIGO datastream) is deemed to be classified sufficiently it is "retired" from the project. For the Gravity Spy project, retirement depends on a combination of both volunteer and machine learning classifications, and a number of parameterizations affect how quickly glitches get retired. For this dataset, we use a default set of retirement parameters, the most important of which are:

A glitches must be classified by at least 2 registered volunteers

Based on both the initial machine learning classification and volunteer classifications, the glitch has more than a 90% probability of residing in a particular class

Each volunteer classification (weighted by that volunteer's confusion matrix) contains a weight equal to the initial machine learning score when determining the final probability

The choice of these and other parameterization will affect the accuracy of the retired dataset as well as the number of glitches that are retired, and will be explored in detail in an upcoming publication (Zevin et al. in prep).

The dataset can be read in using e.g. Pandas:
```
import pandas as pd
dataset = pd.read_hdf('retired_fulldata_min2_max50_ret0p9.hdf5', key='image_db')
```
Each row in the dataframe contains information about a particular glitch in the Gravity Spy dataset.

Description of series in dataframe

['1080Lines', '1400Ripples', 'Air_Compressor', 'Blip', 'Chirp', 'Extremely_Loud', 'Helix', 'Koi_Fish', 'Light_Modulation', 'Low_Frequency_Burst', 'Low_Frequency_Lines', 'No_Glitch', 'None_of_the_Above', 'Paired_Doves', 'Power_Line', 'Repeating_Blips', 'Scattered_Light', 'Scratchy', 'Tomte', 'Violin_Mode', 'Wandering_Line', 'Whistle']

Machine learning scores for each glitch class in the trained model, which for a particular glitch will sum to unity

['ml_confidence', 'ml_label']

Highest machine learning confidence score across all classes for a particular glitch, and the class associated with this score

['gravityspy_id', 'id']

Unique identified for each glitch on the Zooniverse platform ('gravityspy_id') and in the Gravity Spy project ('id'), which can be used to link a particular glitch to the full Gravity Spy dataset (which contains GPS times among many other descriptors)

['retired']

Marks whether the glitch is retired using our default set of retirement parameters (1=retired, 0=not retired)

['Nclassifications']

The total number of classifications performed by registered volunteers on this glitch

['final_score', 'final_label']

The final score (weighted combination of machine learning and volunteer classifications) and the most probable type of glitch

['tracks']

Array of classification weights that were added to each glitch category due to each volunteer's classification

```
For machine learning classifications on all glitches in O1, O2, O3a, and O3b, please see Gravity Spy Machine Learning Classifications on Zenodo

For the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.

For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo.
h
autotrain-data-meme-classification
huggingface.co
Updated Mar 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hrishikesh Yadav (2023). autotrain-data-meme-classification [Dataset]. https://huggingface.co/datasets/Hrishikesh332/autotrain-data-meme-classification
Explore at:
Dataset updated
Mar 24, 2023
Authors
Hrishikesh Yadav
Description
AutoTrain Dataset for project: meme-classification

Dataset Description

This dataset has been automatically processed by AutoTrain for project meme-classification.

Languages

The BCP-47 code for the dataset's language is unk.

Dataset Structure Data Instances

A sample from this dataset looks as follows: [ { "image": "<657x657 RGB PIL image>", "target": 1 }, { "image": "<1124x700 RGB PIL image>", "target": 0 }]… See the full description on the dataset page: https://huggingface.co/datasets/Hrishikesh332/autotrain-data-meme-classification.

Facebook

Twitter

Click to copy link

Link copied

Cite

Machine learning engineer (2025). Tree Classification My Project Dataset [Dataset]. https://universe.roboflow.com/machine-learning-engineer-vwukw/tree-classification-my-project/dataset/1

Tree Classification My Project Dataset

tree-classification-my-project

tree-classification-my-project-dataset

Explore at:

zipAvailable download formats

Dataset updated

Jun 12, 2025

Dataset authored and provided by

Machine learning engineer

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

Trees Class

Description

Tree Classification My Project

## Overview

Tree Classification My Project is a dataset for classification tasks - it contains Trees Class annotations for 554 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Clear search

Close search

Google apps

Main menu

Tree Classification My Project Dataset

Tree Classification My Project

Classify Project Dataset

Classify Project

Trash Classification Project Dataset

Trash Classification Project

Data from: Into the ML-universe: An Improved Classification and...

Classification Project Dataset

Classification Project

project-1-location-classification-dataset

Animal Image Classification – 5 Species

🐾 Animal Image Classification Dataset – 5 Classes

📌 About the Dataset

📁 Dataset Structure

🔍 Use Cases

⚠️ Disclaimer

Data from: Metadata Classification Machine Learning Data

CS 4375 Term Project - Classification

Content

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...

The data for "The ZTF Source Classification Project: III. A Catalog of...

Iopa Classification Project Dataset

IOPA Classification Project

UCI and OpenML Data Sets for Ordinal Quantification

Curlie Enhanced with LLM Annotations: Two Datasets for Advancing...

English Tense Classification

Code-comment-classification

Fruit Classification dataset

Chicago Crime with Climate Data, 2021

In this project I used machine learning and deep learning multiclass classification algorithms to predict types of crime commited in the city of Chicago in 2021. Moreover, I added weather data as features to the models with hope that the last will enrich the models and improve predictions.

project page on GitHub:

Data from: Gravity Spy Volunteer Classifications of LIGO Glitches from...

autotrain-data-meme-classification

Tree Classification My Project DatasetSee More Versions

tree-classification-my-project

tree-classification-my-project-dataset

Tree Classification My Project

Tree Classification My Project Dataset