100+ datasets found
  1. R

    Tree Classification My Project Dataset

    • universe.roboflow.com
    zip
    Updated Jun 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Machine learning engineer (2025). Tree Classification My Project Dataset [Dataset]. https://universe.roboflow.com/machine-learning-engineer-vwukw/tree-classification-my-project/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    Machine learning engineer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Trees Class
    Description

    Tree Classification My Project

    ## Overview
    
    Tree Classification My Project is a dataset for classification tasks - it contains Trees Class annotations for 554 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  2. R

    Classify Project Dataset

    • universe.roboflow.com
    zip
    Updated Sep 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    project (2024). Classify Project Dataset [Dataset]. https://universe.roboflow.com/project-3sqgw/classify-project
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 10, 2024
    Dataset authored and provided by
    project
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Disease
    Description

    Classify Project

    ## Overview
    
    Classify Project is a dataset for classification tasks - it contains Disease annotations for 675 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. R

    Trash Classification Project Dataset

    • universe.roboflow.com
    zip
    Updated Oct 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    autonomous (2024). Trash Classification Project Dataset [Dataset]. https://universe.roboflow.com/autonomous-u9gdl/trash-classification-project
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2024
    Dataset authored and provided by
    autonomous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Trash Bounding Boxes
    Description

    Trash Classification Project

    ## Overview
    
    Trash Classification Project is a dataset for object detection tasks - it contains Trash annotations for 404 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  4. Data from: Into the ML-universe: An Improved Classification and...

    • figshare.com
    zip
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincenzo De Martino; Gilberto Recupito; Giammaria Giordano; Filomena Ferrucci; Dario Di Nucci; Fabio Palomba (2025). Into the ML-universe: An Improved Classification and Characterization of Machine-Learning Projects [Dataset]. http://doi.org/10.6084/m9.figshare.25974886.v9
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    figshare
    Authors
    Vincenzo De Martino; Gilberto Recupito; Giammaria Giordano; Filomena Ferrucci; Dario Di Nucci; Fabio Palomba
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Replication package related to the paper "Into the ML-universe: An Improved Classification and Characterization of Machine-Learning Projects" which includes the results of the various steps of our study with related plots, and the tool we built to classify our projects.

  5. R

    Classification Project Dataset

    • universe.roboflow.com
    zip
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FYP (2025). Classification Project Dataset [Dataset]. https://universe.roboflow.com/fyp-7ytaq/classification-project-h6wjs/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    FYP
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cells
    Description

    Classification Project

    ## Overview
    
    Classification Project is a dataset for classification tasks - it contains Cells annotations for 3,069 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. h

    project-1-location-classification-dataset

    • huggingface.co
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madhav Karthikeyakannan (2025). project-1-location-classification-dataset [Dataset]. https://huggingface.co/datasets/madhavkarthi/project-1-location-classification-dataset
    Explore at:
    Dataset updated
    Oct 9, 2025
    Authors
    Madhav Karthikeyakannan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Location Classification Dataset

      Dataset Summary
    

    This dataset contains images extracted from videos for scene classification into 4 categories:

    Cafe Gym Library Outdoor

      Purpose
    

    The dataset was created as part of a course project to perform location classification from an input image of the user's surroundings. The dataset represents real-world indoor and outdoor environments with varying lighting conditions, angles, and compositions.

      Composition… See the full description on the dataset page: https://huggingface.co/datasets/madhavkarthi/project-1-location-classification-dataset.
    
  7. Animal Image Classification – 5 Species

    • kaggle.com
    Updated Jul 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arif Miah (2025). Animal Image Classification – 5 Species [Dataset]. https://www.kaggle.com/datasets/miadul/animal-image-classification-5-species
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arif Miah
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🐾 Animal Image Classification Dataset – 5 Classes

    📌 About the Dataset

    This dataset contains high-quality images of 5 different animal species: Cat, Cow, Lion, Deer, Dog — commonly used in beginner-level computer vision tasks such as image classification and model benchmarking.

    • Total Classes: 5
    • Images per Class: 100
    • Total Images: 500
    • Image Source: Public images downloaded from the internet (Google Images, Bing, etc.)

    Each image has been manually selected to ensure clarity and proper labeling.

    📁 Dataset Structure

    The dataset is organized into three main folders:

    animals_dataset/
    ├── train/
    ├── val/
    └── test/
    

    Each of these folders contains 5 subfolders (one for each class):

    ├── cat/
    ├── cow/
    ├── lion/
    ├── deer/
    └── dog/
    
    • Training Set: 70 images per class (70%)
    • Validation Set: 15 images per class (15%)
    • Test Set: 15 images per class (15%)

    All images are in .jpg format and have been resized to a consistent shape (e.g., 224x224) for ease of use in deep learning models.

    🔍 Use Cases

    This dataset is ideal for:

    • Beginner-level image classification projects
    • CNN model training and evaluation
    • Transfer learning with pretrained models
    • Model benchmarking on a small, clean dataset

    ⚠️ Disclaimer

    All images in this dataset are sourced from publicly available internet sources for educational and non-commercial research purposes only. If you are the owner of any image and wish to request removal, please contact us.

  8. Data from: Metadata Classification Machine Learning Data

    • osti.gov
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Collier, Hannah; Enright, Eric (2024). Metadata Classification Machine Learning Data [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2446583
    Explore at:
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Atmospheric Radiation Measurement (ARM) Archive, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (US); ARM Data Center, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
    Authors
    Collier, Hannah; Enright, Eric
    Description

    This GitLab project contains the training data that was used for the metadata machine learning classification project.

  9. CS 4375 Term Project - Classification

    • kaggle.com
    Updated Dec 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bach Nguyen (2020). CS 4375 Term Project - Classification [Dataset]. https://www.kaggle.com/bachnguyentfk/cs-4375-term-project-classification/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bach Nguyen
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Content

    CS 4375 term project data compilation, labeled and converted to .csv

  10. d

    Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...

    • datarade.ai
    .json, .csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xverum, Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training [Dataset]. https://datarade.ai/data-products/xverum-company-data-b2b-data-belgium-netherlands-denm-xverum
    Explore at:
    .json, .csvAvailable download formats
    Dataset provided by
    Xverum LLC
    Authors
    Xverum
    Area covered
    Western Sahara, Dominican Republic, Norway, Barbados, India, Sint Maarten (Dutch part), Cook Islands, United Kingdom, Oman, Jordan
    Description

    Xverum’s AI & ML Training Data provides one of the most extensive datasets available for AI and machine learning applications, featuring 800M B2B profiles with 100+ attributes. This dataset is designed to enable AI developers, data scientists, and businesses to train robust and accurate ML models. From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries and use cases with unparalleled scale, depth, and quality.

    What Makes Our Data Unique?

    Scale and Coverage: - A global dataset encompassing 800M B2B profiles from a wide array of industries and geographies. - Includes coverage across the Americas, Europe, Asia, and other key markets, ensuring worldwide representation.

    Rich Attributes for Training Models: - Over 100 fields of detailed information, including company details, job roles, geographic data, industry categories, past experiences, and behavioral insights. - Tailored for training models in NLP, recommendation systems, and predictive algorithms.

    Compliance and Quality: - Fully GDPR and CCPA compliant, providing secure and ethically sourced data. - Extensive data cleaning and validation processes ensure reliability and accuracy.

    Annotation-Ready: - Pre-structured and formatted datasets that are easily ingestible into AI workflows. - Ideal for supervised learning with tagging options such as entities, sentiment, or categories.

    How Is the Data Sourced? - Publicly available information gathered through advanced, GDPR-compliant web aggregation techniques. - Proprietary enrichment pipelines that validate, clean, and structure raw data into high-quality datasets. This approach ensures we deliver comprehensive, up-to-date, and actionable data for machine learning training.

    Primary Use Cases and Verticals

    Natural Language Processing (NLP): Train models for named entity recognition (NER), text classification, sentiment analysis, and conversational AI. Ideal for chatbots, language models, and content categorization.

    Predictive Analytics and Recommendation Systems: Enable personalized marketing campaigns by predicting buyer behavior. Build smarter recommendation engines for ecommerce and content platforms.

    B2B Lead Generation and Market Insights: Create models that identify high-value leads using enriched company and contact information. Develop AI systems that track trends and provide strategic insights for businesses.

    HR and Talent Acquisition AI: Optimize talent-matching algorithms using structured job descriptions and candidate profiles. Build AI-powered platforms for recruitment analytics.

    How This Product Fits Into Xverum’s Broader Data Offering Xverum is a leading provider of structured, high-quality web datasets. While we specialize in B2B profiles and company data, we also offer complementary datasets tailored for specific verticals, including ecommerce product data, job listings, and customer reviews. The AI Training Data is a natural extension of our core capabilities, bridging the gap between structured data and machine learning workflows. By providing annotation-ready datasets, real-time API access, and customization options, we ensure our clients can seamlessly integrate our data into their AI development processes.

    Why Choose Xverum? - Experience and Expertise: A trusted name in structured web data with a proven track record. - Flexibility: Datasets can be tailored for any AI/ML application. - Scalability: With 800M profiles and more being added, you’ll always have access to fresh, up-to-date data. - Compliance: We prioritize data ethics and security, ensuring all data adheres to GDPR and other legal frameworks.

    Ready to supercharge your AI and ML projects? Explore Xverum’s AI Training Data to unlock the potential of 800M global B2B profiles. Whether you’re building a chatbot, predictive algorithm, or next-gen AI application, our data is here to help.

    Contact us for sample datasets or to discuss your specific needs.

  11. The data for "The ZTF Source Classification Project: III. A Catalog of...

    • zenodo.org
    bin, csv, json, png +1
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian F. Healy; Brian F. Healy; Michael W. Coughlin; Michael W. Coughlin; Ashish A. Mahabal; Ashish A. Mahabal; Theophile Jegou du Laz; Theophile Jegou du Laz; Andrew Drake; Matthew J. Graham; Lynne A. Hillenbrand; Jan van Roestel; Jan van Roestel; Paula Szkody; Paula Szkody; LeighAnna Zielske; Mohammed Guiga; Muhammad Yusuf Hassan; Jill L. Hughes; Guy Nir; Saagar Parikh; Sungmin Park; Palak Purohit; Umaa Rebbapragada; Draco Reed; Daniel Warshofsky; Avery Wold; Avery Wold; Joshua S. Bloom; Joshua S. Bloom; Frank J. Masci; Frank J. Masci; Reed Riddle; Reed Riddle; Roger Smith; Roger Smith; Andrew Drake; Matthew J. Graham; Lynne A. Hillenbrand; LeighAnna Zielske; Mohammed Guiga; Muhammad Yusuf Hassan; Jill L. Hughes; Guy Nir; Saagar Parikh; Sungmin Park; Palak Purohit; Umaa Rebbapragada; Draco Reed; Daniel Warshofsky (2024). The data for "The ZTF Source Classification Project: III. A Catalog of Variable Sources" [Dataset]. http://doi.org/10.5281/zenodo.14155156
    Explore at:
    png, bin, zip, json, csvAvailable download formats
    Dataset updated
    Nov 13, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Brian F. Healy; Brian F. Healy; Michael W. Coughlin; Michael W. Coughlin; Ashish A. Mahabal; Ashish A. Mahabal; Theophile Jegou du Laz; Theophile Jegou du Laz; Andrew Drake; Matthew J. Graham; Lynne A. Hillenbrand; Jan van Roestel; Jan van Roestel; Paula Szkody; Paula Szkody; LeighAnna Zielske; Mohammed Guiga; Muhammad Yusuf Hassan; Jill L. Hughes; Guy Nir; Saagar Parikh; Sungmin Park; Palak Purohit; Umaa Rebbapragada; Draco Reed; Daniel Warshofsky; Avery Wold; Avery Wold; Joshua S. Bloom; Joshua S. Bloom; Frank J. Masci; Frank J. Masci; Reed Riddle; Reed Riddle; Roger Smith; Roger Smith; Andrew Drake; Matthew J. Graham; Lynne A. Hillenbrand; LeighAnna Zielske; Mohammed Guiga; Muhammad Yusuf Hassan; Jill L. Hughes; Guy Nir; Saagar Parikh; Sungmin Park; Palak Purohit; Umaa Rebbapragada; Draco Reed; Daniel Warshofsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The classification of variable objects provides insight into a wide variety of astrophysics ranging from stellar interiors to galactic nuclei. The Zwicky Transient Facility (ZTF) provides time series observations that record the variability of more than a billion sources. The scale of these data necessitates automated approaches to make a thorough analysis. Building on previous work, this paper reports the results of the ZTF Source Classification Project (SCoPe), which trains neural network and XGBoost machine learning (ML) algorithms to perform dichotomous classification of variable ZTF sources using a manually constructed training set containing 170,632 light curves. We find that several classifiers achieve high precision and recall scores, suggesting the reliability of their predictions for 373,819,334 light curves across 210 ZTF fields. We also identify the most important features for XGB classification and compare the performance of the two ML algorithms, finding a pattern of higher precision among XGB classifiers. The resulting classification catalog is available to the public, and the software developed for SCoPe is open-source and adaptable to future time-domain surveys.

  12. R

    Iopa Classification Project Dataset

    • universe.roboflow.com
    zip
    Updated Sep 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dental model train (2025). Iopa Classification Project Dataset [Dataset]. https://universe.roboflow.com/dental-model-train/iopa-classification-project-uawxm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 11, 2025
    Dataset authored and provided by
    Dental model train
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects
    Description

    IOPA Classification Project

    ## Overview
    
    IOPA Classification Project is a dataset for classification tasks - it contains Objects annotations for 925 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  13. UCI and OpenML Data Sets for Ordinal Quantification

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

    With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

    We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

    Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

    Usage

    You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

    Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

    Data Extraction: In your terminal, you can call either

    make

    (recommended), or

    julia --project="." --eval "using Pkg; Pkg.instantiate()"
    julia --project="." extract-oq.jl

    Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

    Further Reading

    Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

  14. Z

    Curlie Enhanced with LLM Annotations: Two Datasets for Advancing...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cizinsky, Ludek (2023). Curlie Enhanced with LLM Annotations: Two Datasets for Advancing Homepage2Vec's Multilingual Website Classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10413067
    Explore at:
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Nutter, Peter
    Senghaas, Mika
    Cizinsky, Ludek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Advancing Homepage2Vec with LLM-Generated Datasets for Multilingual Website Classification

    This dataset contains two subsets of labeled website data, specifically created to enhance the performance of Homepage2Vec, a multi-label model for website classification. The datasets were generated using Large Language Models (LLMs) to provide more accurate and diverse topic annotations for websites, addressing a limitation of existing Homepage2Vec training data.

    Key Features:

    LLM-generated annotations: Both datasets feature website topic labels generated using LLMs, a novel approach to creating high-quality training data for website classification models.

    Improved multi-label classification: Fine-tuning Homepage2Vec with these datasets has been shown to improve its macro F1 score from 38% to 43% evaluated on a human-labeled dataset, demonstrating their effectiveness in capturing a broader range of website topics.

    Multilingual applicability: The datasets facilitate classification of websites in multiple languages, reflecting the inherent multilingual nature of Homepage2Vec.

    Dataset Composition:

    curlie-gpt3.5-10k: 10,000 websites labeled using GPT-3.5, context 2 and 1-shot

    curlie-gpt4-10k: 10,000 websites labeled using GPT-4, context 2 and zero-shot

    Intended Use:

    Fine-tuning and advancing Homepage2Vec or similar website classification models

    Research on LLM-generated datasets for text classification tasks

    Exploration of multilingual website classification

    Additional Information:

    Project and report repository: https://github.com/CS-433/ml-project-2-mlp

    Acknowledgments:

    This dataset was created as part of a project at EPFL's Data Science Lab (DLab) in collaboration with Prof. Robert West and Tiziano Piccardi.

  15. English Tense Classification

    • kaggle.com
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hafizflow (2024). English Tense Classification [Dataset]. http://doi.org/10.34740/kaggle/dsv/8693486
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2024
    Dataset provided by
    Kaggle
    Authors
    hafizflow
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset comprises English sentences labeled with their corresponding tense categories. It is intended for use in natural language processing (NLP) and machine learning projects to classify the tense of English sentences. Each entry includes a sentence and a numerical label representing its tense.

  16. h

    Code-comment-classification

    • huggingface.co
    • opendatalab.com
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pooja Rani (2023). Code-comment-classification [Dataset]. https://huggingface.co/datasets/poojaruhal/Code-comment-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2023
    Authors
    Pooja Rani
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Code Comment Classification

      Dataset Summary
    

    The dataset contains class comments extracted from various big and diverse open-source projects of three programming languages Java, Smalltalk, and Python.

      Supported Tasks and Leaderboards
    

    Single-label text classification and Multi-label text classification

      Languages
    

    Java, Python, Smalltalk

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    { "class" : "Absy.java", "comment":"*… See the full description on the dataset page: https://huggingface.co/datasets/poojaruhal/Code-comment-classification.

  17. Fruit Classification dataset

    • kaggle.com
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JIS College of Engineering (2025). Fruit Classification dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11818125
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    JIS College of Engineering
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Fruit Classification and Freshness Detection Dataset 🔍 Overview This dataset has been meticulously curated to facilitate research and development in the domain of fruit classification and freshness detection using advanced deep learning techniques. It is designed to support the creation of hybrid models that integrate YOLOv8 for real-time object detection with Convolutional Neural Networks (CNNs) for assessing fruit freshness. The dataset encompasses a diverse range of images captured under varying lighting conditions and angles, simulating real-world scenarios such as grocery stores, farms, and storage facilities.

    The dataset comprises 8,099 high-resolution images of three commonly consumed fruits—apples, bananas, and oranges—each categorized into fresh and rotten conditions. Every image has been manually annotated in the YOLO format to aid object detection tasks and labeled for binary classification (Fresh/Rotten), enabling comprehensive model training.

    📁 Dataset Structure Total Images: 8,099

    Training Set: 6,508 images (80%)

    Test Set: 1,591 images (20%)

    Classes (6 total):

    Fresh Apples

    Rotten Apples

    Fresh Bananas

    Rotten Bananas

    Fresh Oranges

    Rotten Oranges

    Annotations: Provided in YOLO format using LabelImg

    Image Format: JPG, resized to 300x300 pixels

    Captured With: Smartphone camera under varied lighting and angles

    🧠 Applications This dataset is ideal for:

    Object Detection using YOLOv8

    Freshness classification using CNN

    Hybrid models combining detection and classification

    Computer vision projects in smart agriculture, food safety, and automated retail systems

    📊 Sample Use Case A hybrid deep learning model utilizing this dataset achieved:

    Object Detection (YOLOv8):

    mAP@0.5: 98%

    mAP@0.5:0.95: 87%

    Freshness Classification (CNN):

    Test Accuracy: 97.6%

    These results underscore the dataset’s suitability for high-performance, real-time AI applications in agricultural automation and food quality assessment.

    👨‍💻 Contributors Prof. Shubhashree Sahoo

    Dr. Sitanath Biswas

    Mr. Shubham Kumar Sah

    Mr. Chirag Nahata

    Special thanks to Dr. Soumobroto Saha and Prof. (Dr.) Partha Sarkar for their invaluable guidance and support throughout this research endeavor.

  18. Chicago Crime with Climate Data, 2021

    • kaggle.com
    Updated Dec 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Rozenberg (2021). Chicago Crime with Climate Data, 2021 [Dataset]. https://www.kaggle.com/datasets/markrozenberg/chicago-crime-with-climate-data-2021
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 24, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mark Rozenberg
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Chicago
    Description

    In this project I used machine learning and deep learning multiclass classification algorithms to predict types of crime commited in the city of Chicago in 2021. Moreover, I added weather data as features to the models with hope that the last will enrich the models and improve predictions.

    project page on GitHub:

    https://github.com/Mark-Rozenberg/Crime-And-Climate

  19. Data from: Gravity Spy Volunteer Classifications of LIGO Glitches from...

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Zevin; Michael Zevin; Scott Coughlin; Eve Chase; Sara Allen; Sara Bahaadini; Christopher Berry; Kevin Crowston; Mabi Harandi; Corey Jackson; Vicky Kalogera; Aggelos Katsaggelos; Carsten Osterlund; Oli Patane; Neda Rohani; Joshua Smith; Siddharth Soni; Laura Trouille; Scott Coughlin; Eve Chase; Sara Allen; Sara Bahaadini; Christopher Berry; Kevin Crowston; Mabi Harandi; Corey Jackson; Vicky Kalogera; Aggelos Katsaggelos; Carsten Osterlund; Oli Patane; Neda Rohani; Joshua Smith; Siddharth Soni; Laura Trouille (2022). Gravity Spy Volunteer Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b [Dataset]. http://doi.org/10.5281/zenodo.5911227
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Zevin; Michael Zevin; Scott Coughlin; Eve Chase; Sara Allen; Sara Bahaadini; Christopher Berry; Kevin Crowston; Mabi Harandi; Corey Jackson; Vicky Kalogera; Aggelos Katsaggelos; Carsten Osterlund; Oli Patane; Neda Rohani; Joshua Smith; Siddharth Soni; Laura Trouille; Scott Coughlin; Eve Chase; Sara Allen; Sara Bahaadini; Christopher Berry; Kevin Crowston; Mabi Harandi; Corey Jackson; Vicky Kalogera; Aggelos Katsaggelos; Carsten Osterlund; Oli Patane; Neda Rohani; Joshua Smith; Siddharth Soni; Laura Trouille
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains machine learning and volunteer classifications from the Gravity Spy project. It includes glitches from observing runs O1, O2, O3a and O3b that received at least one classification from a registered volunteer in the project. It also indicates glitches that are nominally retired from the project using our default set of retirement parameters, which are described below. See more details in the Gravity Spy Methods paper.

    When a particular subject in a citizen science project (in this case, glitches from the LIGO datastream) is deemed to be classified sufficiently it is "retired" from the project. For the Gravity Spy project, retirement depends on a combination of both volunteer and machine learning classifications, and a number of parameterizations affect how quickly glitches get retired. For this dataset, we use a default set of retirement parameters, the most important of which are:

    1. A glitches must be classified by at least 2 registered volunteers
    2. Based on both the initial machine learning classification and volunteer classifications, the glitch has more than a 90% probability of residing in a particular class
    3. Each volunteer classification (weighted by that volunteer's confusion matrix) contains a weight equal to the initial machine learning score when determining the final probability

    The choice of these and other parameterization will affect the accuracy of the retired dataset as well as the number of glitches that are retired, and will be explored in detail in an upcoming publication (Zevin et al. in prep).

    The dataset can be read in using e.g. Pandas:
    ```
    import pandas as pd
    dataset = pd.read_hdf('retired_fulldata_min2_max50_ret0p9.hdf5', key='image_db')
    ```
    Each row in the dataframe contains information about a particular glitch in the Gravity Spy dataset.

    Description of series in dataframe

    • ['1080Lines', '1400Ripples', 'Air_Compressor', 'Blip', 'Chirp', 'Extremely_Loud', 'Helix', 'Koi_Fish', 'Light_Modulation', 'Low_Frequency_Burst', 'Low_Frequency_Lines', 'No_Glitch', 'None_of_the_Above', 'Paired_Doves', 'Power_Line', 'Repeating_Blips', 'Scattered_Light', 'Scratchy', 'Tomte', 'Violin_Mode', 'Wandering_Line', 'Whistle']
      • Machine learning scores for each glitch class in the trained model, which for a particular glitch will sum to unity
    • ['ml_confidence', 'ml_label']
      • Highest machine learning confidence score across all classes for a particular glitch, and the class associated with this score
    • ['gravityspy_id', 'id']
      • Unique identified for each glitch on the Zooniverse platform ('gravityspy_id') and in the Gravity Spy project ('id'), which can be used to link a particular glitch to the full Gravity Spy dataset (which contains GPS times among many other descriptors)
    • ['retired']
      • Marks whether the glitch is retired using our default set of retirement parameters (1=retired, 0=not retired)
    • ['Nclassifications']
      • The total number of classifications performed by registered volunteers on this glitch
    • ['final_score', 'final_label']
      • The final score (weighted combination of machine learning and volunteer classifications) and the most probable type of glitch
    • ['tracks']
      • Array of classification weights that were added to each glitch category due to each volunteer's classification

    ```
    For machine learning classifications on all glitches in O1, O2, O3a, and O3b, please see Gravity Spy Machine Learning Classifications on Zenodo

    For the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.

    For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo.

  20. h

    autotrain-data-meme-classification

    • huggingface.co
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hrishikesh Yadav (2023). autotrain-data-meme-classification [Dataset]. https://huggingface.co/datasets/Hrishikesh332/autotrain-data-meme-classification
    Explore at:
    Dataset updated
    Mar 24, 2023
    Authors
    Hrishikesh Yadav
    Description

    AutoTrain Dataset for project: meme-classification

      Dataset Description
    

    This dataset has been automatically processed by AutoTrain for project meme-classification.

      Languages
    

    The BCP-47 code for the dataset's language is unk.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    A sample from this dataset looks as follows: [ { "image": "<657x657 RGB PIL image>", "target": 1 }, { "image": "<1124x700 RGB PIL image>", "target": 0 }]… See the full description on the dataset page: https://huggingface.co/datasets/Hrishikesh332/autotrain-data-meme-classification.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Machine learning engineer (2025). Tree Classification My Project Dataset [Dataset]. https://universe.roboflow.com/machine-learning-engineer-vwukw/tree-classification-my-project/dataset/1

Tree Classification My Project Dataset

tree-classification-my-project

tree-classification-my-project-dataset

Explore at:
zipAvailable download formats
Dataset updated
Jun 12, 2025
Dataset authored and provided by
Machine learning engineer
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured
Trees Class
Description

Tree Classification My Project

## Overview

Tree Classification My Project is a dataset for classification tasks - it contains Trees Class annotations for 554 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Search
Clear search
Close search
Google apps
Main menu