100+ datasets found

Unlabelled dataset
kaggle.com
Updated Oct 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Diggers (2023). Unlabelled dataset [Dataset]. https://www.kaggle.com/datasets/ahmedaliraja/unlabelled-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data Diggers
Description
This dataset consists of unlabeled data representing various data points collected from different sources and domains. The dataset serves as a blank canvas for unsupervised learning experiments, allowing for the exploration of patterns, clusters, and hidden insights through various data analysis techniques. Researchers and data enthusiasts can use this dataset to develop and test unsupervised learning algorithms, identify underlying structures, and gain a deeper understanding of data without predefined labels.
R
Unlabeled Dataset
universe.roboflow.com
zip
Updated Jul 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hasan Berat (2025). Unlabeled Dataset [Dataset]. https://universe.roboflow.com/hasan-berat-c5eeq/unlabeled
Explore at:
zipAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Hasan Berat
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Face Bounding Boxes
Description
Unlabeled

## Overview Unlabeled is a dataset for object detection tasks - it contains Face annotations for 2,928 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
R
Objects2022 Unlabeled Dataset
universe.roboflow.com
zip
Updated Feb 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
butiabots (2023). Objects2022 Unlabeled Dataset [Dataset]. https://universe.roboflow.com/butiabots/objects2022-unlabeled
Explore at:
zipAvailable download formats
Dataset updated
Feb 6, 2023
Dataset authored and provided by
butiabots
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Household Objects Bounding Boxes
Description
Objects2022 Unlabeled

## Overview Objects2022 Unlabeled is a dataset for object detection tasks - it contains Household Objects annotations for 727 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
m
Dataset - Towards the Systematic Testing of Virtual Reality Programs
data.mendeley.com
Updated Sep 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stevão Andrade (2021). Dataset - Towards the Systematic Testing of Virtual Reality Programs [Dataset]. http://doi.org/10.17632/4myfs585s9.2
Explore at:
Unique identifier
https://doi.org/10.17632/4myfs585s9.2
Dataset updated
Sep 16, 2021
Authors
Stevão Andrade
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data related to the experiment conducted in the paper Towards the Systematic Testing of Virtual Reality Programs.

It contains an implementation of an approach for predicting defect proneness on unlabeled datasets- Average Clustering and Labeling (ACL).

ACL models get good prediction performance and are comparable to typical supervised learning models in terms of F-measure. ACL offers a viable choice for defect prediction on unlabeled dataset.

This dataset also contains analyzes related to code smells on C# repositories. Please check the paper to get futher information.
Z
Unlabeled Sentinel 2 time series dataset : Self-Supervised Spatio-Temporal...
data.niaid.nih.gov
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iris Dumeur (2025). Unlabeled Sentinel 2 time series dataset : Self-Supervised Spatio-Temporal Representation Learning of Satellite Image Time Series [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7891923
Explore at:
Dataset updated
Apr 9, 2025
Dataset provided by
Jordi Inglada
Silvia Valero
Iris Dumeur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository list all the available repositories, to load the unlabeled Sentinel 2 (S2) L2A dataset used in the article "Self-Supervised Spatio-Temporal Representation Learning Of Satellite Image Time Series". This dataset is composed of patch time series acquired over France. For further details, see section IV.A of the pre-print article, available here. Each patch is constituted of the 10 bands [B2,B3,B4,B5,B6,B7,B8,B8A,B11,B12] and the three masks ['CLM_R1', 'EDG_R1', 'SAT_R1']. The global dataset is composed of two disjoint datasets: training (9 tiles) and validation dataset (4 tiles).

The validation dataset is available here : 10.5281/zenodo.7890452

The training dataset is composed of 9 zenodo repositories, one for each S2 tiles. Here are the available repositories:

T31UEP 10.5281/zenodo.7899943

T31TGJ 10.5281/zenodo.7899237

T30TYS 10.5281/zenodo.7924193

T31TFN 10.5281/zenodo.7896621

T31TDL 10.5281/zenodo.7896082

T31TDJ 10.5281/zenodo.7895498

T30UVU 10.5281/zenodo.7892410

T30TYQ 10.5281/zenodo.7890542

T30TXT 10.5281/zenodo.7875977

Dataset name S2 tiles ROI size Temporal extent

Train

T30TXT,T30TYQ,T30TYS,T30UVU,

T31TDJ,T31TDL,T31TFN,T31TGJ,T31UEP

1024*1024 2018-2020

Val T30TYR,T30UWU,T31TEK,T31UER 256*256 2016-2019
h
nnces-unlabeled
huggingface.co
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koye Alagbe (2024). nnces-unlabeled [Dataset]. https://huggingface.co/datasets/koyealagbe/nnces-unlabeled
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
Koye Alagbe
Description
koyealagbe/nnces-unlabeled dataset hosted on Hugging Face and contributed by the HF Datasets community
h
shaped-svgs-small-unlabeled-900
huggingface.co
Updated Mar 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruno De Oliveira (2024). shaped-svgs-small-unlabeled-900 [Dataset]. https://huggingface.co/datasets/oliveirabruno01/shaped-svgs-small-unlabeled-900
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2024
Authors
Bruno De Oliveira
Description
oliveirabruno01/shaped-svgs-small-unlabeled-900 dataset hosted on Hugging Face and contributed by the HF Datasets community
stranger-sections-2-unlabeled-data
kaggle.com
Updated Jun 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ArbaazKhan3 (2024). stranger-sections-2-unlabeled-data [Dataset]. https://www.kaggle.com/datasets/arbaazkhan3/stranger-sections-2-unlabeled-data/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ArbaazKhan3
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by ArbaazKhan3

Released under Apache 2.0

Contents
h
peer-unlabeled
huggingface.co
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taylor Joren (2025). peer-unlabeled [Dataset]. https://huggingface.co/datasets/taylor-joren/peer-unlabeled
Explore at:
Dataset updated
Jun 21, 2025
Authors
Taylor Joren
Description
taylor-joren/peer-unlabeled dataset hosted on Hugging Face and contributed by the HF Datasets community
unlabeled-data
kaggle.com
Updated Feb 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ifeomaozo12 (2021). unlabeled-data [Dataset]. https://www.kaggle.com/ifeomaozo/unlabeleddata/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 6, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ifeomaozo12
Description
Dataset

This dataset was created by ifeomaozo12

Contents
f
Training and execution times (in seconds) of considered classifiers on the...
figshare.com
xls
Updated Sep 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karina Shyrokykh; Max Girnyk; Lisa Dellmuth (2023). Training and execution times (in seconds) of considered classifiers on the original collected dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0290762.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0290762.t003
Dataset updated
Sep 29, 2023
Dataset provided by
PLOS ONE
Authors
Karina Shyrokykh; Max Girnyk; Lisa Dellmuth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training and execution times (in seconds) of considered classifiers on the original collected dataset.
H
Replication Data for: Measuring the Significance of Policy Outputs with...
dataverse.harvard.edu
explore.openaire.eu
Updated Oct 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radoslaw Zubek; Abhishek Dasgupta; David Doyle (2020). Replication Data for: Measuring the Significance of Policy Outputs with Positive Unlabeled Learning [Dataset]. http://doi.org/10.7910/DVN/1XXDMW
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/1XXDMW
Dataset updated
Oct 19, 2020
Dataset provided by
Harvard Dataverse
Authors
Radoslaw Zubek; Abhishek Dasgupta; David Doyle
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/1XXDMWhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/1XXDMW
Description
Identifying important policy outputs has long been of interest to political scientists. In this work, we propose a novel approach to the classification of policies. Instead of obtaining and aggregating expert evaluations of significance for a finite set of policy outputs, we use experts to identify a small set of significant outputs and then employ positive unlabeled (PU) learning to search for other similar examples in a large unlabeled set. We further propose to automate the first step by harvesting ‘seed’ sets of significant outputs from web data. We offer an application of the new approach by classifying over 9,000 government regulations in the United Kingdom. The obtained estimates are successfully validated against human experts, by forecasting web citations, and with a construct validity test.
h
Unlabeled_Dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wen li, Unlabeled_Dataset [Dataset]. https://huggingface.co/datasets/yiruuli/Unlabeled_Dataset
Explore at:
Authors
wen li
Description
Unlabeled Social Stories Dataset

This dataset contains high-quality social stories generated by different LLMs aimed at supporting children with special needs.

Citation

If you use this dataset, please cite: @misc{li2025socialstories, title = {Unlabeled Dataset}, author = {Wen Li}, year = {2025}, howpublished = {\url{https://huggingface.co/datasets/yirruli/Unlabeled_Dataset}}, note = {Accessed: [date]} }
h
unlabeled-urls
huggingface.co
Updated Oct 24, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Police Data Accessibility Project (2019). unlabeled-urls [Dataset]. https://huggingface.co/datasets/PDAP/unlabeled-urls
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 24, 2019
Dataset authored and provided by
Police Data Accessibility Project
Description
PDAP/unlabeled-urls dataset hosted on Hugging Face and contributed by the HF Datasets community
f
Classification performance of considered classifiers on the artificially...
plos.figshare.com
xls
Updated Sep 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karina Shyrokykh; Max Girnyk; Lisa Dellmuth (2023). Classification performance of considered classifiers on the artificially balanced dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0290762.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0290762.t004
Dataset updated
Sep 29, 2023
Dataset provided by
PLOS ONE
Authors
Karina Shyrokykh; Max Girnyk; Lisa Dellmuth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classification performance of considered classifiers on the artificially balanced dataset.
f
Explanations for each cluster in Adult dataset.
plos.figshare.com
xls
Updated Oct 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang Chen; Caiming Zhong; Zehua Zhang (2023). Explanations for each cluster in Adult dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0292960.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292960.t004
Dataset updated
Oct 27, 2023
Dataset provided by
PLOS ONE
Authors
Liang Chen; Caiming Zhong; Zehua Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clustering is an unsupervised machine learning technique whose goal is to cluster unlabeled data. But traditional clustering methods only output a set of results and do not provide any explanations of the results. Although in the literature a number of methods based on decision tree have been proposed to explain the clustering results, most of them have some disadvantages, such as too many branches and too deep leaves, which lead to complex explanations and make it difficult for users to understand. In this paper, a hypercube overlay model based on multi-objective optimization is proposed to achieve succinct explanations of clustering results. The model designs two objective functions based on the number of hypercubes and the compactness of instances and then uses multi-objective optimization to find a set of nondominated solutions. Finally, an Utopia point is defined to determine the most suitable solution, in which each cluster can be covered by as few hypercubes as possible. Based on these hypercubes, an explanations of each cluster is provided. Upon verification on synthetic and real datasets respectively, it shows that the model can provide a concise and understandable explanations to users.
Brazilian Legal Proceedings
kaggle.com
Updated May 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felipe Maia Polo (2021). Brazilian Legal Proceedings [Dataset]. https://www.kaggle.com/felipepolo/brazilian-legal-proceedings/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Felipe Maia Polo
Description
The Dataset

These datasets were used while writing the following work:

Polo, F. M., Ciochetti, I., and Bertolo, E. (2021). Predicting legal proceedings status: approaches based on sequential text data. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, pages 264–265.

Please cite us if you use our datasets in your academic work:

@inproceedings{polo2021predicting, title={Predicting legal proceedings status: approaches based on sequential text data}, author={Polo, Felipe Maia and Ciochetti, Itamar and Bertolo, Emerson}, booktitle={Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law}, pages={264--265}, year={2021} }

More details below!

Context

Every legal proceeding in Brazil is one of three possible classes of status: (i) archived proceedings, (ii) active proceedings, and (iii) suspended proceedings. The three possible classes are given in a specific instant in time, which may be temporary or permanent. Moreover, they are decided by the courts to organize their workflow, which in Brazil may reach thousands of simultaneous cases per judge. Developing machine learning models to classify legal proceedings according to their status can assist public and private institutions in managing large portfolios of legal proceedings, providing gains in scale and efficiency.

In this dataset, each proceeding is made up of a sequence of short texts called “motions” written in Portuguese by the courts’ administrative staff. The motions relate to the proceedings, but not necessarily to their legal status.

Content

Our data is composed of two datasets: a dataset of ~3*10^6 unlabeled motions and a dataset containing 6449 legal proceedings, each with an individual and a variable number of motions, but which have been labeled by lawyers. Among the labeled data, 47.14% is classified as archived (class 1), 45.23% is classified as active (class 2), and 7.63% is classified as suspended (class 3).

The datasets we use are representative samples from the first (São Paulo) and third (Rio de Janeiro) most significant state courts. State courts handle the most variable types of cases throughout Brazil and are responsible for 80% of the total amount of lawsuits. Therefore, these datasets are a good representation of a very significant portion of the use of language and expressions in Brazilian legal vocabulary.

Regarding the labels dataset, the key "-1" denotes the most recent text while "-2" the second most recent and so on.

Acknowledgements

We would like to thank Ana Carolina Domingues Borges, Andrews Adriani Angeli, and Nathália Caroline Juarez Delgado from Tikal Tech for helping us to obtain the datasets. This work would not be possible without their efforts.

Inspiration

Can you develop good machine learning classifiers for text sequences? :)
STL10-Labeled Image Recognition Dataset
kaggle.com
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semih Yagli (2025). STL10-Labeled Image Recognition Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/12688697
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/12688697
Dataset updated
Aug 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Semih Yagli
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This public dataset contains labels for the unlabeled 100,000 pictures in the STL-10 dataset.

The dataset is human labeled with AI aid through Etiqueta, the one and only gamified mobile data labeling application. stl10.py is a python script written by Martin Tutek to download the complete STL10 dataset. labels.json contains labels for the 100,000 previously unlabeled images in the STL10 dataset legend.json is a mapping of the labels used. stats.ipynb presents a few statistics regarding the 100,000 newly labeled images.

If you use this dataset in your research please cite the following:

@techreport{yagli2025etiqueta, author = {Semih Yagli}, title = {Etiqueta: AI-Aided, Gamified Data Labeling to Label and Segment Data}, year = {2025}, number = {TR-2025-0001}, address = {NJ, USA}, month = Apr., url = {https://www.aidatalabel.com/technical_reports/aidatalabel_tr_2025_0001.pdf}, institution = {AI Data Label}, } @inproceedings{coates2011analysis, title = {An analysis of single-layer networks in unsupervised feature learning}, author = {Coates, Adam and Ng, Andrew and Lee, Honglak}, booktitle = {Proceedings of the fourteenth international conference on artificial intelligence and statistics}, pages = {215--223}, year = {2011}, organization = {JMLR Workshop and Conference Proceedings} }

Note: The dataset is imported to Kaggle from: https://github.com/semihyagli/STL10-Labeled See also: https://github.com/semihyagli/STL10_Segmentation

If you have comments and questions about Etiqueta or about this dataset, please reach us out at contact@aidatalabel.com
b5 - data h5 unlabeled 2 s l-3
kaggle.com
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vialactea (2025). b5 - data h5 unlabeled 2 s l-3 [Dataset]. https://www.kaggle.com/datasets/vialactea/b5-data-h5-unlabeled-2-s-l-3
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
vialactea
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by vialactea

Released under MIT

Contents
unlabeled multiclass emails
kaggle.com
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kevinzb56 (2024). unlabeled multiclass emails [Dataset]. https://www.kaggle.com/datasets/kevinzb56/unlabelled-multicass-emails/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
kevinzb56
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by kevinzb56

Released under Apache 2.0

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

Data Diggers (2023). Unlabelled dataset [Dataset]. https://www.kaggle.com/datasets/ahmedaliraja/unlabelled-dataset

Unlabelled dataset

Unlabeled Dataset: Exploring Uncharted Data Territories

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 29, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Data Diggers

Description

This dataset consists of unlabeled data representing various data points collected from different sources and domains. The dataset serves as a blank canvas for unsupervised learning experiments, allowing for the exploration of patterns, clusters, and hidden insights through various data analysis techniques. Researchers and data enthusiasts can use this dataset to develop and test unsupervised learning algorithms, identify underlying structures, and gain a deeper understanding of data without predefined labels.

Clear search

Close search

Google apps

Main menu

Unlabelled dataset

Unlabeled Dataset

Unlabeled

Objects2022 Unlabeled Dataset

Objects2022 Unlabeled

Dataset - Towards the Systematic Testing of Virtual Reality Programs

Unlabeled Sentinel 2 time series dataset : Self-Supervised Spatio-Temporal...

nnces-unlabeled

shaped-svgs-small-unlabeled-900

stranger-sections-2-unlabeled-data

Dataset

Contents

peer-unlabeled

unlabeled-data

Dataset

Contents

Training and execution times (in seconds) of considered classifiers on the...

Replication Data for: Measuring the Significance of Policy Outputs with...

Unlabeled_Dataset

unlabeled-urls

Classification performance of considered classifiers on the artificially...

Explanations for each cluster in Adult dataset.

Brazilian Legal Proceedings

The Dataset

Context

Content

Acknowledgements

Inspiration

STL10-Labeled Image Recognition Dataset

b5 - data h5 unlabeled 2 s l-3

Dataset

Contents

unlabeled multiclass emails

Dataset

Contents

Unlabelled dataset

Unlabeled Dataset: Exploring Uncharted Data Territories