100+ datasets found

d
Classification
catalog.data.gov
data.nasa.gov
+1more
Updated Dec 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2023). Classification [Dataset]. https://catalog.data.gov/dataset/classification
Explore at:
Dataset updated
Dec 6, 2023
Dataset provided by
Dashlink
Description
A supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output consists of one or more classes to which the corresponding input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. In this chapter, we explain several basic classification algorithms.
P
Food Image Classification Dataset Dataset
paperswithcode.com
Updated Jul 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marc Bolaños; Aina Ferrà; Petia Radeva (2017). Food Image Classification Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/food-image-classification-dataset
Explore at:
Dataset updated
Jul 26, 2017
Authors
Marc Bolaños; Aina Ferrà; Petia Radeva
Description
About Dataset The file contains 24K unique figure obtained from various Google resources Meticulously curated figure ensuring diversity and representativeness Provides a solid foundation for developing robust and precise figure allocation algorithms Encourages exploration in the fascinating field of feed figure allocation

Unparalleled Diversity Dive into a vast collection spanning culinary landscapes worldwide. Immerse yourself in a diverse array of cuisines, from Italian pasta to Japanese sushi. Explore a rich tapestry of food imagery, meticulously curated for accuracy and breadth. Precision Labeling Benefit from meticulous labeling, ensuring each image is tagged with precision. Access detailed metadata for seamless integration into your machine learning projects. Empower your algorithms with the clarity they need to excel in food recognition tasks. Endless Applications Fuel advancements in machine learning and computer vision with this comprehensive dataset. Revolutionize food industry automation, from inventory management to quality control. Enable innovative applications in health monitoring and dietary analysis for a healthier tomorrow. Seamless Integration Seamlessly integrate our dataset into your projects with user-friendly access and documentation. Enjoy high-resolution images optimized for compatibility with a range of AI frameworks. Access support and resources to maximize the potential of our dataset for your specific needs.

Conclusion Embark on a culinary journey through the lens of artificial intelligence and unlock the potential of feed figure allocation with our SEO-optimized file. Elevate your research, elevate your projects, and elevate the way we perceive and interact with food in the digital age. Dive in today and savor the possibilities!

This dataset is sourced from Kaggle.
h
arxiv-classification
huggingface.co
Updated Apr 13, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ccdv (2017). arxiv-classification [Dataset]. https://huggingface.co/datasets/ccdv/arxiv-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2017
Authors
ccdv
Description
Arxiv Classification: a classification of Arxiv Papers (11 classes). This dataset is intended for long context classification (documents have all > 4k tokens). Copied from "Long Document Classification From Local Word Glimpses via Recurrent Attention Learning" @ARTICLE{8675939, author={He, Jun and Wang, Liqun and Liu, Liu and Feng, Jiao and Wu, Hao}, journal={IEEE Access}, title={Long Document Classification From Local Word Glimpses via Recurrent Attention Learning}, year={2019}… See the full description on the dataset page: https://huggingface.co/datasets/ccdv/arxiv-classification.
Drinking Waste Classification
kaggle.com
paperswithcode.com
Updated May 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arkadiy Serezhkin (2020). Drinking Waste Classification [Dataset]. https://www.kaggle.com/datasets/arkadiyhacks/drinking-waste-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 10, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arkadiy Serezhkin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About the Dataset:

4 classes of drinking waste: Aluminium Cans, Glass bottles, PET (plastic) bottles and HDPE (plastic) Milk bottles. rawimgs - images of 4 classes of waste YOLO_imgs - images of 4 classes of waste with corresponding txt file (annotations for YOLO framework) labels.txt - labels of the classes

Story

This dataset was manually labelled and collected as a part of final year Individual Project at University College London. Pictures were taken with 12 MP phone camera. I created a real-time waste detection and identification system powered by YOLO framework. Use it as you like, if you could cite me in your work, would be much appreciated. Please reach out to me if this dataset actually helped you with your project. Arkadiy Serezhkin - arkadiyhacks@gmail.com

Acknowledgements

The dataset used parts of manually collected dataset of Gary Thung and Mindy Yang. I would like to thank them for collecting their dataset as this is not a fun thing to do (from my own experience). You can find their repository here https://github.com/garythung/trashnet.
Data from: Country Classification
hub.arcgis.com
sdiinnovation-geoplatform.hub.arcgis.com
Updated Jul 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2022). Country Classification [Dataset]. https://hub.arcgis.com/content/65604db82ffd450da9e2c1b4c721db83
Explore at:
Dataset updated
Jul 25, 2022
Dataset authored and provided by
Esrihttp://esri.com/
Description
Accurate locations of people or places of interest is important to drive businesses and improve governement services. For accurate location, correctly geocoding addresses becomes important. Street addresses may sometimes be missing the country information and geocoding such incomplete addresses often results in poor accuracy. Geocoding accuracy and performance increases when the country is specified. This model categorizes incomplete addresses by automatically assigning the country they belong to.This deep learning model is trained on address dataset provided by openaddresses.io and can be used to classify addresses from 18 different countries in the world.Using the modelFollow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.Fine-tuning the modelThis model can be fine-tuned using the Train Text Classification Model tool available in the GeoAI toolbox in ArcGIS Pro.. Follow the guide to fine-tune this model.InputText on which country classification will be performed. Text should include street number or apartment number, street name, city or state.OutputText (classified country)Supported countriesThis model supports addresses from the following countries:AR – ArgentinaAT – AustriaAU – AustraliaBE – BelgiumCA – CanadaCH – SwitzerlandDE – GermanyDK – DenmarkES – SpainFI – FinlandFR – FranceIS – IcelandIT – ItalyKR – South KoreaLU – LuxemburgNZ – New ZealandSI – SloveniaUS – USAModel architectureThis model uses the xlm-roberta architecture implemented in Hugging Face Transformers.Accuracy metricsThe table below summarizes the precision, recall and F1-score of the model on the validation dataset.Training dataThe model has been trained on openly licensed data from openaddresses.io. Sample resultsHere are a few results from the model.
R
Tree Classification Dataset
universe.roboflow.com
zip
Updated Jun 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thanu (2024). Tree Classification Dataset [Dataset]. https://universe.roboflow.com/thanu-uelap/tree-classification-dataset
Explore at:
zipAvailable download formats
Dataset updated
Jun 18, 2024
Dataset authored and provided by
Thanu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Trees Bounding Boxes
Description
Tree Classification Dataset

## Overview Tree Classification Dataset is a dataset for object detection tasks - it contains Trees annotations for 5,683 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
D
Urban Street: Tree Classification Dataset
datasetninja.com
Updated Sep 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tingting Yang; Suyin Zhou; Zhijie Huang (2022). Urban Street: Tree Classification Dataset [Dataset]. https://datasetninja.com/urban-street-tree-classification
Explore at:
Dataset updated
Sep 24, 2022
Dataset provided by
Dataset Ninja
Authors
Tingting Yang; Suyin Zhou; Zhijie Huang
License
https://www.gnu.org/licenses/lgpl-3.0.htmlhttps://www.gnu.org/licenses/lgpl-3.0.html
Description
Authors introduce the Tree component for classification task within The Tree Dataset of Urban Street, encompassing 4,804 high-resolution images distributed across 23 classes. With these comprehensive resources at your disposal, this subset empowers researchers and practitioners to delve deep into the detailed analysis of urban street greenery, offering a valuable resource for comprehensive instance segmentation studies. Automatic tree species identification can be used to realize autonomous street tree inventories and help people without botanical knowledge and experience to better understand the diversity and regionalization of different urban landscapes.
Human Settlements Classification (Landsat 8)
hub.arcgis.com
angola.africageoportal.com
+3more
Updated Feb 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2021). Human Settlements Classification (Landsat 8) [Dataset]. https://hub.arcgis.com/content/f7754e9617b84356845e5f877d3c36c6
Explore at:
Dataset updated
Feb 17, 2021
Dataset authored and provided by
Esrihttp://esri.com/
Description
Human settlement maps are useful in understanding growth patterns, population distribution, resource management, change detection, and a variety of other applications where information related to earth surface is required. Human settlements classification is a complex exercise and is hard to capture using traditional means. Deep learning models are highly capable of learning these complex semantics and can produce superior results.Using the modelFollow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.Fine-tuning the modelThis model can be fine-tuned using the Train Deep Learning Model tool. Follow the guide to fine-tune this model.InputRaster, mosaic dataset, or image service. (Preferred cell size is 30 meters.)Note: This model is trained to work on Landsat 8 Imagery datasets which are in WGS 1984 Web Mercator (auxiliary sphere) coordinate system (WKID 3857).OutputClassified layer containing two classes: settlement and otherApplicable geographiesThis model is expected to work well in the United States.Model architectureThis model uses the UNet model architecture implemented in ArcGIS API for Python.Accuracy metricsThis model has an overall accuracy of 91.6 percent.Training dataThis model has been trained on an Esri proprietary human settlements classification dataset.Sample resultsHere are a few results from the model.
d
MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE...
catalog.data.gov
data.nasa.gov
+1more
Updated Dec 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2023). MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING [Dataset]. https://catalog.data.gov/dataset/multi-label-asrs-dataset-classification-using-semi-supervised-subspace-clustering
Explore at:
Dataset updated
Dec 6, 2023
Dataset provided by
Dashlink
Description
MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING MOHAMMAD SALIM AHMED, LATIFUR KHAN, NIKUNJ OZA, AND MANDAVA RAJESWARI Abstract. There has been a lot of research targeting text classification. Many of them focus on a particular characteristic of text data - multi-labelity. This arises due to the fact that a document may be associated with multiple classes at the same time. The consequence of such a characteristic is the low performance of traditional binary or multi-class classification techniques on multi-label text data. In this paper, we propose a text classification technique that considers this characteristic and provides very good performance. Our multi-label text classification approach is an extension of our previously formulated [3] multi-class text classification approach called SISC (Semi-supervised Impurity based Subspace Clustering). We call this new classification model as SISC-ML(SISC Multi-Label). Empirical evaluation on real world multi-label NASA ASRS (Aviation Safety Reporting System) data set reveals that our approach outperforms state-of-theart text classification as well as subspace clustering algorithms.
d
Data from: Pseudo-Label Generation for Multi-Label Text Classification
catalog.data.gov
datasets.ai
+2more
Updated Dec 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2023). Pseudo-Label Generation for Multi-Label Text Classification [Dataset]. https://catalog.data.gov/dataset/pseudo-label-generation-for-multi-label-text-classification
Explore at:
Dataset updated
Dec 6, 2023
Dataset provided by
Dashlink
Description
With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.
h
mosquito-species-classification-dataset
huggingface.co
Updated Oct 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mosquito-species-classification-dataset [Dataset]. https://huggingface.co/datasets/iloncka/mosquito-species-classification-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2024
Authors
Ilona Kovaleva
Description
iloncka/mosquito-species-classification-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Tree Point Classification
hub.arcgis.com
cacgeoportal.com
+1more
Updated Oct 8, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2020). Tree Point Classification [Dataset]. https://hub.arcgis.com/content/58d77b24469d4f30b5f68973deb65599
Explore at:
Dataset updated
Oct 8, 2020
Dataset authored and provided by
Esrihttp://esri.com/
Description
Classifying trees from point cloud data is useful in applications such as high-quality 3D basemap creation, urban planning, and forestry workflows. Trees have a complex geometrical structure that is hard to capture using traditional means. Deep learning models are highly capable of learning these complex structures and giving superior results.Using the modelFollow the guide to use the model. The model can be used with the 3D Basemaps solution and ArcGIS Pro's Classify Point Cloud Using Trained Model tool. Before using this model, ensure that the supported deep learning frameworks libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.InputThe model accepts unclassified point clouds with the attributes: X, Y, Z, and Number of Returns.Note: This model is trained to work on unclassified point clouds that are in a projected coordinate system, where the units of X, Y, and Z are based on the metric system of measurement. If the dataset is in degrees or feet, it needs to be re-projected accordingly. The provided deep learning model was trained using a training dataset with the full set of points. Therefore, it is important to make the full set of points available to the neural network while predicting - allowing it to better discriminate points of 'class of interest' versus background points. It is recommended to use 'selective/target classification' and 'class preservation' functionalities during prediction to have better control over the classification.This model was trained on airborne lidar datasets and is expected to perform best with similar datasets. Classification of terrestrial point cloud datasets may work but has not been validated. For such cases, this pre-trained model may be fine-tuned to save on cost, time and compute resources while improving accuracy. When fine-tuning this model, the target training data characteristics such as class structure, maximum number of points per block, and extra attributes should match those of the data originally used for training this model (see Training data section below).OutputThe model will classify the point cloud into the following 2 classes with their meaning as defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) described below: 0 Background 5 Trees / High-vegetationApplicable geographiesThis model is expected to work well in all regions globally, with an exception of mountainous regions. However, results can vary for datasets that are statistically dissimilar to training data.Model architectureThis model uses the PointCNN model architecture implemented in ArcGIS API for Python.Accuracy metricsThe table below summarizes the accuracy of the predictions on the validation dataset. Class Precision Recall F1-score Trees / High-vegetation (5) 0.975374 0.965929 0.970628Training dataThis model is trained on a subset of UK Environment Agency's open dataset. The training data used has the following characteristics: X, Y and Z linear unit meter Z range -19.29 m to 314.23 m Number of Returns 1 to 5 Intensity 1 to 4092 Point spacing 0.6 ± 0.3 Scan angle -23 to +23 Maximum points per block 8192 Extra attributes Number of Returns Class structure [0, 5]Sample resultsHere are a few results from the model.
P
Garbage Classification Dataset Dataset
paperswithcode.com
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Garbage Classification Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/garbage-classification-dataset
Explore at:
Dataset updated
Mar 24, 2025
Description
Description:

👉 Download the dataset here

This dataset contains a collection of 15,150 images, categorized into 12 distinct classes of common household waste. The classes include paper, cardboard, biological waste, metal, plastic, green glass, brown glass, white glass, clothing, shoes, batteries, and general trash. Each category represents a different type of material, contributing to more effective recycling and waste management strategies. Garbage Classification Dataset.

Objective

The purpose of this dataset is to aid in the development of machine learning models designed to automatically classify household waste into its appropriate categories, thus promoting more efficient recycling processes. Proper waste sorting is crucial for maximizing the amount of material that can be recycled, and this dataset is aimed at enhancing automation in this area. The classification of garbage into a broader range of categories, as opposed to the limited classes found in most available datasets (2-6 classes), allows for a more precise recycling process and could significantly improve recycling rates.

Download Dataset

Dataset Composition and Collection Process

The dataset was primarily collected through web scraping, as simulating a real-world garbage collection scenario (such as placing a camera above a conveyor belt) was not feasible at the time of collection. The goal was to obtain images that closely resemble actual garbage. For example, images in the biological waste category include rotten fruits, vegetables, and food remnants. Similarly, categories such as glass and metal consist of images of bottles, cans, and containers typically found in household trash. While the images for some categories, like clothes or shoes, were harder to find specifically as garbage, they still represent the items that may end up in waste streams.

In an ideal setting, a conveyor system could be used to gather real-time data by capturing images of waste in a continuous flow. Such a setup would enhance the dataset by providing authentic waste images for all categories. However, until that setup is available, this dataset serves as a significant step toward automating garbage classification and improving recycling technologies.

Potential for Future Improvements

While this dataset provides a strong foundation for household waste classification, there is potential for further improvements. For example, real-time data collection using conveyor systems or garbage processing plants could provide higher accuracy and more contextual images. Additionally, future datasets could expand to include more specialized categories, such as electronic waste, hazardous materials, or specific types of plastic.

Conclusion

The Garbage Classification dataset offers a broad and diverse collection of household waste images, making it a valuable resource for researchers and developers working in environmental sustainability, machine learning, and recycling automation. By improving the accuracy of waste classification systems, we can contribute to a cleaner, more sustainable future.

This dataset is sourced from Kaggle.
Power Line Classification
morocco.africageoportal.com
angola.africageoportal.com
+1more
Updated Dec 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2020). Power Line Classification [Dataset]. https://morocco.africageoportal.com/content/6ce6dae2d62c4037afc3a3abd19afb11
Explore at:
Dataset updated
Dec 15, 2020
Dataset authored and provided by
Esrihttp://esri.com/
Description
The classification of point cloud datasets to identify distribution wires is useful for identifying vegetation encroachment around power lines. Such workflows are important for preventing fires and power outages and are typically manual, recurring, and labor-intensive. This model is designed to extract distribution wires at the street level. Its predictions for high-tension transmission wires are less consistent with changes in geography as compared to street-level distribution wires. In the case of high-tension transmission wires, a lower ‘recall’ value is observed as compared to the value observed for low-lying street wires and poles.Using the modelFollow the guide to use the model. The model can be used with ArcGIS Pro's Classify Point Cloud Using Trained Model tool. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.InputThe model accepts unclassified point clouds with point geometry (X, Y and Z values). Note: The model is not dependent on any additional attributes such as Intensity, Number of Returns, etc. This model is trained to work on unclassified point clouds that are in a projected coordinate system, in which the units of X, Y and Z are based on the metric system of measurement. If the dataset is in degrees or feet, it needs to be re-projected accordingly. The model was trained using a training dataset with the full set of points. Therefore, it is important to make the full set of points available to the neural network while predicting - allowing it to better discriminate points of 'class of interest' versus background points. It is recommended to use 'selective/target classification' and 'class preservation' functionalities during prediction to have better control over the classification and scenarios with false positives.The model was trained on airborne lidar datasets and is expected to perform best with similar datasets. Classification of terrestrial point cloud datasets may work but has not been validated. For such cases, this pre-trained model may be fine-tuned to save on cost, time, and compute resources while improving accuracy. Another example where fine-tuning this model can be useful is when the object of interest is tram wires, railway wires, etc. which are geometrically similar to electricity wires. When fine-tuning this model, the target training data characteristics such as class structure, maximum number of points per block and extra attributes should match those of the data originally used for training this model (see Training data section below).OutputThe model will classify the point cloud into the following classes with their meaning as defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) described below: Classcode Class Description 0 Background Class 14 Distribution Wires 15 Distribution Tower/PolesApplicable geographiesThe model is expected to work within any geography. It's seen to produce favorable results as shown here in many regions. However, results can vary for datasets that are statistically dissimilar to training data.Model architectureThis model uses the RandLANet model architecture implemented in ArcGIS API for Python.Accuracy metricsThe table below summarizes the accuracy of the predictions on the validation dataset. - Precision Recall F1-score Background (0) 0.999679 0.999876 0.999778 Distribution Wires (14) 0.955085 0.936825 0.945867 Distribution Poles (15) 0.707983 0.553888 0.621527Training dataThis model is trained on manually classified training dataset provided to Esri by AAM group. The training data used has the following characteristics: X, Y, and Z linear unitmeter Z range-240.34 m to 731.17 m Number of Returns1 to 5 Intensity1 to 4095 Point spacing0.2 ± 0.1 Scan angle-42 to +35 Maximum points per block20000 Extra attributesNone Class structure[0, 14, 15]Sample resultsHere are a few results from the model.
tabular-classification-of-prompted-llms
huggingface.co
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tabular-classification-of-prompted-llms [Dataset]. https://huggingface.co/datasets/imodels/tabular-classification-of-prompted-llms
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset provided by
Authors
imodels
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
imodels/tabular-classification-of-prompted-llms dataset hosted on Hugging Face and contributed by the HF Datasets community
F
Italian Open Ended Classification Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Italian Open Ended Classification Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/italian-open-ended-classification-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Welcome to the Italian Open Ended Classification Prompt-Response Dataset—an extensive collection of 3000 meticulously curated prompt and response pairs. This dataset is a valuable resource for training Language Models (LMs) to classify input text accurately, a crucial aspect in advancing generative AI.
Dataset Content: This open-ended classification dataset comprises a diverse set of prompts and responses where the prompt contains input text to be classified and may also contain task instruction, context, constraints, and restrictions while completion contains the best classification category as response. Both these prompts and completions are available in Italian language. As this is an open-ended dataset, there will be no options given to choose the right classification category as a part of the prompt.
These prompt and completion pairs cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more. Each prompt is accompanied by a response, providing valuable information and insights to enhance the language model training process. Both the prompt and response were manually curated by native Italian people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This open-ended classification prompt and completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains prompts and responses with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Prompt Diversity: To ensure diversity, this open-ended classification dataset includes prompts with varying complexity levels, ranging from easy to medium and hard. Additionally, prompts are diverse in terms of length from short to medium and long, creating a comprehensive variety. The classification dataset also contains prompts with constraints and persona restrictions, which makes it even more useful for LLM training.Response Formats: To accommodate diverse learning experiences, our dataset incorporates different types of responses depending on the prompt. These formats include single-word, short phrase, and single sentence type of response. These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.Data Format and Annotation Details: This fully labeled Italian Open Ended Classification Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt length, prompt complexity, domain, response, response type, and rich text presence.Quality and Accuracy: Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Italian version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization: The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom open-ended classification prompt and completion data tailored to specific needs, providing flexibility and customization options.License: The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Italian Open Ended Classification Prompt-Completion Dataset to enhance the classification abilities and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
F
Bahasa Closed Ended Classification Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Bahasa Closed Ended Classification Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bahasa-closed-ended-classification-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Welcome to the Bahasa Closed Ended Classification Prompt-Response Dataset—an extensive collection of 3000 meticulously curated prompt and response pairs. This dataset is a valuable resource for training Language Models (LMs) to classify input text accurately, a crucial aspect in advancing generative AI.
Dataset Content: This closed-ended classification dataset comprises a diverse set of prompts and responses where the prompt contains input text to be classified and may also contain task instruction, context, constraints, and restrictions while completion contains the best classification category as response. Both these prompts and completions are available in Bahasa language. As this is a closed-ended dataset, there will be options given to choose the right classification category as a part of the prompt.
These prompt and completion pairs cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more. Each prompt is accompanied by a response, providing valuable information and insights to enhance the language model training process. Both the prompt and response were manually curated by native Bahasa people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This closed-ended classification prompt and completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains prompts and responses with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Prompt Diversity: To ensure diversity, this closed-ended classification dataset includes prompts with varying complexity levels, ranging from easy to medium and hard. Different types of prompts, such as multiple-choice, direct, and true/false, are included. Additionally, prompts are diverse in terms of length from short to medium and long, creating a comprehensive variety. The classification dataset also contains prompts with constraints and persona restrictions, which makes it even more useful for LLM training.Response Formats: To accommodate diverse learning experiences, our dataset incorporates different types of responses depending on the prompt. These formats include single-word, short phrase, and single sentence type of response. These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.Data Format and Annotation Details: This fully labeled Bahasa Closed Ended Classification Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt length, prompt complexity, domain, response, response type, and rich text presence.Quality and Accuracy: Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Bahasa version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization: The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom closed-ended classification prompt and completion data tailored to specific needs, providing flexibility and customization options.License: The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Bahasa Closed Ended Classification Prompt-Completion Dataset to enhance the classification abilities and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
R
Weapon Classification Dataset
universe.roboflow.com
zip
Updated Oct 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liteye Systems (2022). Weapon Classification Dataset [Dataset]. https://universe.roboflow.com/liteye-systems/weapon-classification
Explore at:
zipAvailable download formats
Dataset updated
Oct 28, 2022
Dataset authored and provided by
Liteye Systems
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Weapons Bounding Boxes
Description
Weapon Classification

## Overview Weapon Classification is a dataset for object detection tasks - it contains Weapons annotations for 576 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
R
Classification Of Fire And Smoke Dataset
universe.roboflow.com
zip
Updated Mar 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ClassificationFire (2022). Classification Of Fire And Smoke Dataset [Dataset]. https://universe.roboflow.com/classificationfire/classification-of-fire-and-smoke
Explore at:
zipAvailable download formats
Dataset updated
Mar 23, 2022
Dataset authored and provided by
ClassificationFire
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Fire And Smoke
Description
Classification Of Fire And Smoke

## Overview Classification Of Fire And Smoke is a dataset for classification tasks - it contains Fire And Smoke annotations for 2,775 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Data Classification Market Size & Share Analysis - Industry Research Report...
mordorintelligence.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mordor Intelligence, Data Classification Market Size & Share Analysis - Industry Research Report - Growth Trends [Dataset]. https://www.mordorintelligence.com/industry-reports/data-classification-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Mordor Intelligence
License
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Time period covered
2019 - 2030
Area covered
Global
Description
The Data Classification Market is Segmented by Solution (Software, Services), Deployment (On-Premise, Cloud), Application (Access Management, Governance & Compliance Management, Email & Mobile Protection), Industry Vertical (BFSI, Healthcare, Government & Defence, IT & Telecom, Energy & Utilities, Education), and Geography.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2023). Classification [Dataset]. https://catalog.data.gov/dataset/classification

Classification

Explore at:

Dataset updated

Dec 6, 2023

Dataset provided by

Dashlink

Description

A supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output consists of one or more classes to which the corresponding input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. In this chapter, we explain several basic classification algorithms.

Clear search

Close search

Google apps

Main menu

Classification

Food Image Classification Dataset Dataset

arxiv-classification

Drinking Waste Classification

About the Dataset:

Story

Acknowledgements

Data from: Country Classification

Tree Classification Dataset

Tree Classification Dataset

Urban Street: Tree Classification Dataset

Human Settlements Classification (Landsat 8)

MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE...

Data from: Pseudo-Label Generation for Multi-Label Text Classification

mosquito-species-classification-dataset

Tree Point Classification

Garbage Classification Dataset Dataset

Power Line Classification

tabular-classification-of-prompted-llms

Italian Open Ended Classification Prompt & Response Dataset

What’s Included

Bahasa Closed Ended Classification Prompt & Response Dataset

What’s Included

Weapon Classification Dataset

Weapon Classification

Classification Of Fire And Smoke Dataset

Classification Of Fire And Smoke

Data Classification Market Size & Share Analysis - Industry Research Report...

ClassificationSee More Versions

Classification