Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a part of the unlabeled Sentinel 2 (S2) L2A dataset composed of patch time series acquired over France used to pretrain U-BARN. For further details, see section IV.A of the pre-print article "Self-Supervised Spatio-Temporal Representation Learning Of Satellite Image Time Series" available here. Each patch is constituted of the 10 bands [B2,B3,B4,B5,B6,B7,B8,B8A,B11,B12] and the three masks ['CLM_R1', 'EDG_R1', 'SAT_R1']. The global dataset is composed of two disjoint datasets: training (9 tiles) and validation dataset (4 tiles).
In this repo, only data from the S2 tile T30UVU are available. To download the full pretraining dataset, see: 10.5281/zenodo.7891924
Dataset name | S2 tiles | ROI size | Temporal extent |
Train |
T30TXT,T30TYQ,T30TYS,T30UVU, T31TDJ,T31TDL,T31TFN,T31TGJ,T31UEP | 1024*1024 | 2018-2020 |
Val | T30TYR,T30UWU,T31TEK,T31UER | 256*256 | 2016-2019 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classification performance of considered classifiers on the original collected dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Anna Jazayeri
Released under MIT
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This folder contains song recordings from typical adult zebra finches used to validate the song similarity method proposed in AVN. There is a variable number of song files from each bird. Most contain recordings from a single day of song, but for some birds who sang less frequently, recordings from multiple days were pooled, as indicated in the READ_ME.txt file in each bird's folder. These birds were all 'pupils' for the purpose of AVN validation. Their tutors are listed in Bird_list.csv, and recordings of their tutors are provided in the "Unlabeled Tutor Recordings" dataset within the AVN dataverse.
This dataset was created by Kirill Chemrov
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In credit risk assessment, unsupervised classification techniques can be introduced to reduce human resource expenses and expedite decision-making. Despite the efficacy of unsupervised learning methods in handling unlabeled datasets, their performance remains limited owing to challenges such as imbalanced data, local optima, and parameter adjustment complexities. Thus, this paper introduces a novel hybrid unsupervised classification method, named the two-stage hybrid system with spectral clustering and semi-supervised support vector machine (TSC-SVM), which effectively addresses the unsupervised imbalance problem in credit risk assessment by targeting global optimal solutions. Furthermore, a multi-view combined unsupervised method is designed to thoroughly mine data and enhance the robustness of label predictions. This method mitigates discrepancies in prediction outcomes from three distinct perspectives. The effectiveness, efficiency, and robustness of the proposed TSC-SVM model are demonstrated through various real-world applications. The proposed algorithm is anticipated to expand the customer base for financial institutions while reducing economic losses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The primers and unlabeled probes sequences.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Elbadry2025
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*Mean ± SD.#Gaussian Center ± HWHM (Half Width at Half Maximum).
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
This repository consists of Data/Code to reproduce the results of the thesis chapter "DCAST: Diverse Class-Aware Self-Training Mitigates Selection Bias for Fairer Learning".
The data is shared at: https://doi.org/10.6084/m9.figshare.27003601
The code is shared at: https://github.com/joanagoncalveslab/DCAST
Dataset Card for AutoTrain Evaluator
This repository contains model predictions generated by AutoTrain for the following task and dataset:
Task: Summarization Model: 0ys/mt5-small-finetuned-amazon-en-es Dataset: conceptual_captions Config: unlabeled Split: train
To run new evaluation jobs, visit Hugging Face's automatic model evaluator.
Contributions
Thanks to @DonaldDaz for evaluating this model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clustering is an unsupervised machine learning technique whose goal is to cluster unlabeled data. But traditional clustering methods only output a set of results and do not provide any explanations of the results. Although in the literature a number of methods based on decision tree have been proposed to explain the clustering results, most of them have some disadvantages, such as too many branches and too deep leaves, which lead to complex explanations and make it difficult for users to understand. In this paper, a hypercube overlay model based on multi-objective optimization is proposed to achieve succinct explanations of clustering results. The model designs two objective functions based on the number of hypercubes and the compactness of instances and then uses multi-objective optimization to find a set of nondominated solutions. Finally, an Utopia point is defined to determine the most suitable solution, in which each cluster can be covered by as few hypercubes as possible. Based on these hypercubes, an explanations of each cluster is provided. Upon verification on synthetic and real datasets respectively, it shows that the model can provide a concise and understandable explanations to users.
This dataset was created by YasserOuzzine
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a part of the unlabeled Sentinel 2 (S2) L2A dataset composed of patch time series acquired over France used to pretrain U-BARN. For further details, see section IV.A of the pre-print article "Self-Supervised Spatio-Temporal Representation Learning Of Satellite Image Time Series" available here. Each patch is constituted of the 10 bands [B2,B3,B4,B5,B6,B7,B8,B8A,B11,B12] and the three masks ['CLM_R1', 'EDG_R1', 'SAT_R1']. The global dataset is composed of two disjoint datasets: training (9 tiles) and validation dataset (4 tiles).
In this repo, only data from the S2 tile T30TYS are available. To download the full pretraining dataset, see: 10.5281/zenodo.7891924
Global unlabeled dataset description
Dataset name
S2 tiles
ROI size
Temporal extent
Train
T30TXT,T30TYQ,T30TYS,T30UVU,
T31TDJ,T31TDL,T31TFN,T31TGJ,T31UEP
1024*1024
2018-2020
Val
T30TYR,T30UWU,T31TEK,T31UER
256*256
2016-2019
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
unofficial mirror of VietMed (Vietnamese speech data in medical domain) unlabeled set
official announcement: https://arxiv.org/abs/2404.05659 official download: https://huggingface.co/datasets/leduckhai/VietMed this repo contains the unlabeled set: 966h - 230k samples i also gather the metadata: see info.csv my extraction code: https://github.com/phineas-pta/fine-tune-whisper-vi/blob/main/misc/vietmed-unlabeled.py need to do: check misspelling, restore foreign words phonetised… See the full description on the dataset page: https://huggingface.co/datasets/doof-ferb/VietMed_unlabeled.
In first 2 binders: Slides of landscapes from Point Lobos State Reserve, Garrapata State Park, and unlabeled sites on the Central Coast
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was created by Ching-Yuan Bai
Released under CC BY-SA 4.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Submillimeter grid ECoG recorded in human and mouse. Data in 2.0 s windows, unlabeled. The electrodes were arranged in a square grid with spacing specified in each file. The arrangement of the channels on the grid is in the "grid" variable allowing each channel to be mapped relative to the others.Details regarding methods and use of the data available in the linked publication.
SPATIALLY ADAPTIVE SEMI-SUPERVISED LEARNING WITH GAUSSIAN PROCESSES FOR HYPERSPECTRAL DATA ANALYSIS GOO JUN * AND JOYDEEP GHOSH* Abstract. A semi-supervised learning algorithm for the classification of hyperspectral data, Gaussian process expectation maximization (GP-EM), is proposed. Model parameters for each land cover class is first estimated by a supervised algorithm using Gaussian process regressions to find spatially adaptive parameters, and the estimated parameters are then used to initialize a spatially adaptive mixture-of-Gaussians model. The mixture model is updated by expectationmaximization iterations using the unlabeled data, and the spatially adaptive parameters for unlabeled instances are obtained by Gaussian process regressions with soft assignments. Two sets of hyperspectral data taken from the Botswana area by the NASA EO-1 satellite are used for experiments. Empirical evaluations show that the proposed framework performs significantly better than baseline algorithms that do not use spatial information, and the results are also better than any previously reported results by other algorithms on the same data.
Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. The paper can be found at https://arxiv.org/pdf/2206.08023.pdf In addition to providing the labeled 600 CT and MRI scans, we expect to provide 2000 CT and 1200 MRI scans without labels to support more learning tasks (semi-supervised, un-supervised, domain adaption, ...). The link can be found in: labeled data (500CT+100MRI) unlabeled data Part I (900CT) unlabeled data Part II (1100CT) (Now there are 1000CT, we will replenish to 1100CT) unlabeled data Part III (1200MRI) if you found this dataset useful for your research, please cite: @inproceedings{NEURIPS2022_ee604e1b, author = {Ji, Yuanfeng and Bai, Haotian and GE, Chongjian and Yang, Jie and Zhu, Ye and Zhang, Ruimao and Li, Zhen and Zhanng, Lingyan and Ma, Wanling and Wan, Xiang and Luo, Ping}, booktitle = {Advances in Neural Information Processing Systems}, editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh}, pages = {36722--36732}, publisher = {Curran Associates, Inc.}, title = {AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation}, url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/ee604e1bedbd069d9fc9328b7b9584be-Paper-Datasets_and_Benchmarks.pdf}, volume = {35}, year = {2022} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a part of the unlabeled Sentinel 2 (S2) L2A dataset composed of patch time series acquired over France used to pretrain U-BARN. For further details, see section IV.A of the pre-print article "Self-Supervised Spatio-Temporal Representation Learning Of Satellite Image Time Series" available here. Each patch is constituted of the 10 bands [B2,B3,B4,B5,B6,B7,B8,B8A,B11,B12] and the three masks ['CLM_R1', 'EDG_R1', 'SAT_R1']. The global dataset is composed of two disjoint datasets: training (9 tiles) and validation dataset (4 tiles).
In this repo, only data from the S2 tile T30UVU are available. To download the full pretraining dataset, see: 10.5281/zenodo.7891924
Dataset name | S2 tiles | ROI size | Temporal extent |
Train |
T30TXT,T30TYQ,T30TYS,T30UVU, T31TDJ,T31TDL,T31TFN,T31TGJ,T31UEP | 1024*1024 | 2018-2020 |
Val | T30TYR,T30UWU,T31TEK,T31UER | 256*256 | 2016-2019 |