34 datasets found

f
Number of images used for the training and testing of the models with...
plos.figshare.com
xls
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seung-Ah Lee; Hyun Su Kim; Ehwa Yang; Young Cheol Yoon; Ji Hyun Lee; Byung-Ok Choi; Jae-Hun Kim (2024). Number of images used for the training and testing of the models with different labeling strategies. [Dataset]. http://doi.org/10.1371/journal.pone.0310203.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310203.t001
Dataset updated
Sep 6, 2024
Dataset provided by
PLOS ONE
Authors
Seung-Ah Lee; Hyun Su Kim; Ehwa Yang; Young Cheol Yoon; Ji Hyun Lee; Byung-Ok Choi; Jae-Hun Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Number of images used for the training and testing of the models with different labeling strategies.
P
DensePASS Dataset
paperswithcode.com
opendatalab.com
Updated Oct 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chaoxiang Ma; Jiaming Zhang; Kailun Yang; Alina Roitberg; Rainer Stiefelhagen (2021). DensePASS Dataset [Dataset]. https://paperswithcode.com/dataset/densepass
Explore at:
Dataset updated
Oct 22, 2021
Authors
Chaoxiang Ma; Jiaming Zhang; Kailun Yang; Alina Roitberg; Rainer Stiefelhagen
Description
DensePASS - a novel densely annotated dataset for panoramic segmentation under cross-domain conditions, specifically built to study the Pinhole-to-Panoramic transfer and accompanied with pinhole camera training examples obtained from Cityscapes. DensePASS covers both, labelled- and unlabelled 360-degree images, with the labelled data comprising 19 classes which explicitly fit the categories available in the source domain (i.e. pinhole) data.
The dataset and MAE model code for "An Efficient Lightning Classifier Using...
zenodo.org
zip
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jingyu Lu; Jingyu Lu (2024). The dataset and MAE model code for "An Efficient Lightning Classifier Using a Self-supervised Learning Neural Network" [Dataset]. http://doi.org/10.5281/zenodo.14556712
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14556712
Dataset updated
Dec 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jingyu Lu; Jingyu Lu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These files contain the BLNET lightning dataset and a modified MAE neural network code. The dataset includes 100,000 unlabeled lightning pulse files and 3,000 labeled lightning pulses, each spanning 1 ms (5,000 points per file). The unlabeled data is used for pretraining the MAE, while the labeled data is used for finetuning the MAE.
Z
Quantitative phase microscopy labelled (449 FOVs) and unlabelled (1819 FOVs)...
data.niaid.nih.gov
Updated Aug 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Balvan (2021). Quantitative phase microscopy labelled (449 FOVs) and unlabelled (1819 FOVs) dataset of various adherent cell lines for segmentation purposes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4771830
Explore at:
Dataset updated
Aug 2, 2021
Dataset provided by
Jaromir Gumulec
Jan Balvan
Tomas Vicar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains microscopic images of multiple cell lines captured by multiple microcopic without use of any fluorescent labeling and a manually annotated ground truth for subsequent use in segmentation algorithms. Dataset also includes images reconstructed according to the methods described below in order to ease further segmentation.

Our data consist of

244 labelled images of PC-3 (7,907 cells), 205 labelled PNT1A (9,288 cells), in the paper designated as "QPI_Seg_PNT1A_PC3", and

1,819 unlabelled images with a mixture of 22Rv1, A2058, A2780, A8780, DU145, Fadu, G361, HOB and LNCaP used for pretraining, in the paper designated as "QPI_Cell_unlabelled".

See Vicar et al. XXXX 2021 DOI XXX (TBA after publishing)

Code using this dataset is available at XXXX (TBA after publishing)

Materials and methods

A set of adherent cell lines of various origins, tumorigenic potential, and morphology were used in this paper (PC-3, PNT1A, 22Rv1, DU145, LNCaP, A2058, A2780, A8780, Fadu, G361, HOB). PC-3, PNT1A, 22Rv1, DU145, LNCaP, A2780, and G361 cell lines were cultured in RPMI-1640 medium, A2058, FaDu, and HOB cell lines were cultured in DMEM-F12 medium, all supplemented with antibiotics (penicillin 100 U/ml and streptomycin 0.1 mg/ml), and with 10% fetal bovine serum (FBS). Prior to microscopy acquisition, the cells were maintained at 37 °C in a humidified (60%) incubator with 5% CO\textsubscript{2} (Sanyo, Japan). For acquisition purposes, the cells were cultivated in the Flow chamber µ-Slide I Luer Family (Ibidi, Martinsried, Germany). To maintain standard cultivation conditions during time-lapse experiments, cells were placed in the gas chamber H201 - for Mad City Labs Z100/Z500 piezo Z-stage (Okolab, Ottaviano NA, Italy). For the acquisition of QPI, a coherence-controlled holographic microscope (Telight, Q-Phase) was used. Objective Nikon Plan 10×/0.3 was used for hologram acquisition with a CCD camera (XIMEA MR4021MC). Holographic data were numerically reconstructed with the Fourier transform method (described in Slaby, 2013 and phase unwrapping was used on the phase image. QPI datasets used in this paper were acquired during various experimental setups and treatments. In most cases, experiments were conducted with the time-lapse acquisition. The final dataset contains images acquired at least three hours apart.

Folder structure and file and filename description

labelled (QPI_Seg_PNT1A_PC3): 205 FOVs PNT1A and 244 FOVs PC-3 cells with segmentation labels, e.g. 00001_PC3_img.tif - 32bit tiff image (in pg/um2 values) 00001_PC3_mask.png - 8bit image with mask with unique grayscale value corresponding to single cell in FOV.

unlabelled (QPI_Cell_unlabelled): 11 varying cell lines, total 1819 FOVs, 32bit tiff image (in pg/um2 values)
A
Unlabelled Weed Detection Images for Hot Peppers
data.amerigeoss.org
Updated Nov 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trinidad and Tobago (2022). Unlabelled Weed Detection Images for Hot Peppers [Dataset]. https://data.amerigeoss.org/dataset/weeddetection_hotpeppers
Explore at:
Dataset updated
Nov 1, 2022
Dataset provided by
Trinidad and Tobago
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data contains images of Capsicum Annuum that have been grown on several smallholder farms in Trinidad and Tobago ;showing different levels of weed cover and different weed species. In most instances, weeds can be recognized by the naked eye. However, there are times when the weeds and the crops are of similar species and may appear almost identical. When weeds are plentiful and interwoven with crops, it becomes increasingly difficult to determine weed cover on a given piece of land. This data can be used in research surrounding weed detection in hot peppers. When accompanied by the labelled versions, this data can be used to train machine learning models for identifying weed detection in Capsicum Annuum (Hot Peppers).
d
Unlabelled training datasets of AIS Trajectories from Danish Waters for...
data.dtu.dk
bin
Updated Jul 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen (2023). Unlabelled training datasets of AIS Trajectories from Danish Waters for Abnormal Behavior Detection [Dataset]. http://doi.org/10.11583/DTU.21511842.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.21511842.v1
Dataset updated
Jul 10, 2023
Dataset provided by
Technical University of Denmark
Authors
Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"

DOI: https://doi.org/10.11583/DTU.c.6287841

Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour.

The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.

We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course

These datasets consists of unlabelled trajectories for the purpose of training unsupervised models. For labelled datasets for evaluation please refer to the collection. Link in Related publications.

The data is saved using the pickle format for Python Each dataset is split into 2 files with naming convention:

datasetInfo_XXX
data_XXX

Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.

The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.

Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl

See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.
Data from: Amos: A large-scale abdominal multi-organ benchmark for versatile...
zenodo.org
explore.openaire.eu
csv, zip
Updated May 25, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JI YUANFENG; JI YUANFENG (2023). Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation [Dataset]. http://doi.org/10.5281/zenodo.7262581
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7262581
Dataset updated
May 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
JI YUANFENG; JI YUANFENG
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. The paper can be found at https://arxiv.org/pdf/2206.08023.pdf

In addition to providing the labeled 600 CT and MRI scans, we expect to provide 2000 CT and 1200 MRI scans without labels to support more learning tasks (semi-supervised, un-supervised, domain adaption, ...). The link can be found in:

labeled data (500CT+100MRI)

unlabeled data Part I (900CT)

unlabeled data Part II (1100CT) (Now there are 1000CT, we will replenish to 1100CT)

unlabeled data Part III (1200MRI)

if you found this dataset useful for your research, please cite:

@inproceedings{NEURIPS2022_ee604e1b, author = {Ji, Yuanfeng and Bai, Haotian and GE, Chongjian and Yang, Jie and Zhu, Ye and Zhang, Ruimao and Li, Zhen and Zhanng, Lingyan and Ma, Wanling and Wan, Xiang and Luo, Ping}, booktitle = {Advances in Neural Information Processing Systems}, editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh}, pages = {36722--36732}, publisher = {Curran Associates, Inc.}, title = {AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation}, url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/ee604e1bedbd069d9fc9328b7b9584be-Paper-Datasets_and_Benchmarks.pdf}, volume = {35}, year = {2022} }
Global import data of Label Blank
volza.com
csv
Updated Dec 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2025). Global import data of Label Blank [Dataset]. https://www.volza.com/p/label-blank/import/import-in-india/
Explore at:
csvAvailable download formats
Dataset updated
Dec 3, 2025
Dataset provided by
Volza
Authors
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
Description
15638 Global import shipment records of Label Blank with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
f
Table_1_HiRAND: A novel GCN semi-supervised deep learning-based framework...
frontiersin.figshare.com
txt
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue Huang; Zhiwei Rong; Liuchao Zhang; Zhenyi Xu; Jianxin Ji; Jia He; Weisha Liu; Yan Hou; Kang Li (2023). Table_1_HiRAND: A novel GCN semi-supervised deep learning-based framework for classification and feature selection in drug research and development.csv [Dataset]. http://doi.org/10.3389/fonc.2023.1047556.s002
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fonc.2023.1047556.s002
Dataset updated
Jun 21, 2023
Dataset provided by
Frontiers
Authors
Yue Huang; Zhiwei Rong; Liuchao Zhang; Zhenyi Xu; Jianxin Ji; Jia He; Weisha Liu; Yan Hou; Kang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The prediction of response to drugs before initiating therapy based on transcriptome data is a major challenge. However, identifying effective drug response label data costs time and resources. Methods available often predict poorly and fail to identify robust biomarkers due to the curse of dimensionality: high dimensionality and low sample size. Therefore, this necessitates the development of predictive models to effectively predict the response to drugs using limited labeled data while being interpretable. In this study, we report a novel Hierarchical Graph Random Neural Networks (HiRAND) framework to predict the drug response using transcriptome data of few labeled data and additional unlabeled data. HiRAND completes the information integration of the gene graph and sample graph by graph convolutional network (GCN). The innovation of our model is leveraging data augmentation strategy to solve the dilemma of limited labeled data and using consistency regularization to optimize the prediction consistency of unlabeled data across different data augmentations. The results showed that HiRAND achieved better performance than competitive methods in various prediction scenarios, including both simulation data and multiple drug response data. We found that the prediction ability of HiRAND in the drug vorinostat showed the best results across all 62 drugs. In addition, HiRAND was interpreted to identify the key genes most important to vorinostat response, highlighting critical roles for ribosomal protein-related genes in the response to histone deacetylase inhibition. Our HiRAND could be utilized as an efficient framework for improving the drug response prediction performance using few labeled data.
Z
LEPset
data.niaid.nih.gov
zenodo.org
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang, Pingping (2023). LEPset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8041284
Explore at:
Dataset updated
Jun 15, 2023
Dataset provided by
Zhang, Pingping
Sheng,Bin
Li, Jiajia
Wang,Kaixuan
Wang,Teng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LEPset is a large-scale EUS-based pancreas image dataset from the Department of Goenterology, Changhai Hospital, Second Military Medical University/Naval Medical University. This dataset consists of 420 patients and 3,500 images, and it has been divided into two categories (PC and NPC). We have invited experienced clinicians to annotate the category labels for all 3500 EUS images. Moreover, our LEPset also has 8,000 EUS images without any classification annotation.

After downloading the data set LEPset.zip, select the appropriate unzip file to extract it

After unzipping, there will be two folders: unlabeled and labeled

There are 8000 EUS images in the unlabeled folder and two folders in the labeled folder, NPC and PC, representing non-pancreatic cancer and pancreatic cancer respectively. 140 patients (1820 images) in NPC and 280 patients (1680 images) in PC

Unlabelled images can be used for pre-training of the model, and labelled images can be used for training and validation of the supervised model
Z
Proteomics LC-MS/MS test dataset for protein quantitation via stable isotope...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sigloch, Florian Christoph (2020). Proteomics LC-MS/MS test dataset for protein quantitation via stable isotope labelling [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1051551
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Sigloch, Florian Christoph
Tholen, Stefan
Schilling, Oliver
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The provided mzML file can be used as a test dataset for protein identification and quantitation software. It was generated from human embryonic kidney (HEK) cells that were either unlabelled or labelled with heavy SILAC (K6R6, unimod accession 188, PSI-MS Name: "Label:13C(6)"). Apart from different labelling, the HEK cells were kept in exactly the same conditions and harvested simultaneously. Light and heavy labelled proteins from HEK cell lysate were mixed in a certain ratio, digested with Trypsin and measured on a ThermoFisher QExactive mass spectrometer. A more detailed description on the generation of the dataset will soon be accessible at PRIDE.

The provided mzML file has been converted from Thermo RAW and slightly modified via msConvert (ProteoWizard). To reduce the filesize and to speed up analysis, it has further been filtered to contain only the data measured between 2,000 sec and 3,000 sec of the original LC-MS/MS run.
P
CalMS21 Dataset
paperswithcode.com
Updated Apr 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jennifer J. Sun; Tomomi Karigo; Dipam Chakraborty; Sharada P. Mohanty; Benjamin Wild; Quan Sun; Chen Chen; David J. Anderson; Pietro Perona; Yisong Yue; Ann Kennedy (2021). CalMS21 Dataset [Dataset]. https://paperswithcode.com/dataset/calms21
Explore at:
Dataset updated
Apr 5, 2021
Authors
Jennifer J. Sun; Tomomi Karigo; Dipam Chakraborty; Sharada P. Mohanty; Benjamin Wild; Quan Sun; Chen Chen; David J. Anderson; Pietro Perona; Yisong Yue; Ann Kennedy
Description
The Caltech Mouse Social Interactions (CalMS21) dataset is a multi-agent dataset from behavioral neuroscience. The dataset consists of trajectory data of social interactions, recorded from videos of freely behaving mice in a standard resident-intruder assay. The CalMS21 dataset is part of the Multi-Agent Behavior Challenge 2021.

To help accelerate behavioral studies, the CalMS21 dataset provides a benchmark to evaluate the performance of automated behavior classification methods in three settings: (1) for training on large behavioral datasets all annotated by a single annotator, (2) for style transfer to learn inter-annotator differences in behavior definitions, and (3) for learning of new behaviors of interest given limited training data. The dataset consists of 6 million frames of unlabelled tracked poses of interacting mice, as well as over 1 million frames with tracked poses and corresponding frame-level behavior annotations. The challenge of the dataset is to be able to classify behaviors accurately using both labelled and unlabelled tracking data, as well as being able to generalize to new annotators and behaviors.
Global export data of Label Blank
volza.com
csv
Updated Jan 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza.LLC (2025). Global export data of Label Blank [Dataset]. https://www.volza.com/exports-india/india-export-data-of-label+blank
Explore at:
csvAvailable download formats
Dataset updated
Jan 22, 2025
Dataset provided by
Volza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Count of exporters, Sum of export value, 2014-01-01/2021-09-30, Count of export shipments
Description
1554 Global export shipment records of Label Blank with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
f
Simulation results for Scenario 2.
figshare.com
plos.figshare.com
xls
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ciaran Evans; Max G’Sell (2024). Simulation results for Scenario 2. [Dataset]. http://doi.org/10.1371/journal.pone.0310194.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310194.t003
Dataset updated
Sep 16, 2024
Dataset provided by
PLOS ONE
Authors
Ciaran Evans; Max G’Sell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classifiers have been developed to help diagnose dengue fever in patients presenting with febrile symptoms. However, classifier predictions often rely on the assumption that new observations come from the same distribution as training data. If the population prevalence of dengue changes, as would happen with a dengue outbreak, it is important to raise an alarm as soon as possible, so that appropriate public health measures can be taken and also so that the classifier can be re-calibrated. In this paper, we consider the problem of detecting such a change in distribution in sequentially-observed, unlabeled classification data. We focus on label shift changes to the distribution, where the class priors shift but the class conditional distributions remain unchanged. We reduce this problem to the problem of detecting a change in the one-dimensional classifier scores, leading to simple nonparametric sequential changepoint detection procedures. Our procedures leverage classifier training data to estimate the detection statistic, and converge to their parametric counterparts in the size of the training data. In simulated outbreaks with real dengue data, we show that our method outperforms other detection procedures in this label shift setting.
h
Semantic Shift Benchmark Corruption (SSB-C)
datahub.hku.hk
bin
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hongjun Wang; Sagar Vaze; Kai Han (2025). Semantic Shift Benchmark Corruption (SSB-C) [Dataset]. http://doi.org/10.25442/hku.28607261.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25442/hku.28607261.v1
Dataset updated
Mar 18, 2025
Dataset provided by
HKU Data Repository
Authors
Hongjun Wang; Sagar Vaze; Kai Han
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Generalized Category Discovery (GCD) is a challenging task in which, given a partially labelled dataset, models must categorize all unlabelled instances, regardless of whether they come from labelled categories or from new ones. In this paper, we challenge a remaining assumption in this task: that all images share the same domain. Specifically, we introduce a new task and method to handle GCD when the unlabelled data also contains images from different domains to the labelled set. Our proposed `HiLo' networks extract High-level semantic and Low-level domain features, before minimizing the mutual information between the representations. Our intuition is that the clusterings based on domain information and semantic information should be independent. We further extend our method with a specialized domain augmentation tailored for the GCD task, as well as a curriculum learning approach. Finally, we construct a benchmark from corrupted fine-grained datasets as well as a large-scale evaluation on DomainNet with real-world domain shifts, reimplementing a number of GCD baselines in this setting. We demonstrate that HiLo outperforms SoTA category discovery models by a large margin on all evaluations.
DENTEX CHALLENGE 2023
zenodo.org
kaggle.com
zip
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sezgin Er; Sezgin Er (2023). DENTEX CHALLENGE 2023 [Dataset]. http://doi.org/10.5281/zenodo.7812323
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7812323
Dataset updated
Jun 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sezgin Er; Sezgin Er
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DENTEX CHALLENGE

We present the Dental Enumeration and Diagnosis on Panoramic X-rays Challenge (DENTEX), organized in conjunction with the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) in 2023. The primary objective of this challenge is to develop algorithms that can accurately detect abnormal teeth with dental enumeration and associated diagnosis. This not only aids in accurate treatment planning but also helps practitioners carry out procedures with a low margin of error.

The challenge provides three types of hierarchically annotated data and additional unlabeled X-rays for optional pre-training. The annotation of the data is structured using the Fédération Dentaire Internationale (FDI) system. The first set of data is partially labeled because it only includes quadrant information. The second set of data is also partially labeled but contains additional enumeration information along with the quadrant. The third data is fully labeled because it includes all quadrant-enumeration-diagnosis information for each abnormal tooth, and all participant algorithms will be benchmarked on the third data.

DENTEX aims to provide insights into the effectiveness of AI in dental radiology analysis and its potential to improve dental practice by comparing frameworks that simultaneously point out abnormal teeth with dental enumeration and associated diagnosis on panoramic dental X-rays.

Please visit our website to join DENTEX (Dental Enumeration and Diagnosis on Panoramic X- rays Challenge) which is held at MICCAI2023.

DATA

The DENTEX dataset comprises panoramic dental X-rays obtained from three different institutions using standard clinical conditions but varying equipment and imaging protocols, resulting in diverse image quality reflecting heterogeneous clinical practice. The dataset includes X-rays from patients aged 12 and above, randomly selected from the hospital's database to ensure patient privacy and confidentiality.

To enable effective use of the FDI system, the dataset is hierarchically organized into three types of data;

(a) 693 X-rays labeled for quadrant detection and quadrant classes only,

(b) 634 X-rays labeled for tooth detection with quadrant and tooth enumeration classes,

(c) 1005 X-rays fully labeled for abnormal tooth detection with quadrant, tooth enumeration, and diagnosis classes.

The diagnosis class includes four specific categories: caries, deep caries, periapical lesions, and impacted teeth. An additional 1571 unlabeled X-rays are provided for pre-training.

The dataset is used for DENTEX (Dental Enumeration and Diagnosis on Panoramic X-rays Challenge).

Data Split for Evaluation and Training

The DENTEX 2023 dataset comprises three types of data: (a) partially annotated quadrant data, (b) partially annotated quadrant-enumeration data, and (c) fully annotated quadrant-enumeration-diagnosis data. The first two types of data are intended for training and development purposes, while the third type is used for training and evaluations.

To comply with standard machine learning practices, the fully annotated third dataset, consisting of 1005 panoramic X-rays, is partitioned into training, validation, and testing subsets, comprising 705, 50, and 250 images, respectively. Ground truth labels are provided only for the training data, while the validation data is provided without associated ground truth, and the testing data is kept hidden from participants.

Annotation Protocol

The DENTEX provides three hierarchically annotated datasets that facilitate various dental detection tasks: (1) quadrant-only for quadrant detection, (2) quadrant-enumeration for tooth detection, and (3) quadrant-enumeration-diagnosis for abnormal tooth detection. Although it may seem redundant to provide a quadrant detection dataset, it is crucial for utilizing the FDI Numbering System. The FDI system is a globally-used system that assigns each quadrant of the mouth a number from 1 through 4. The top right is 1, the top left is 2, the bottom left is 3, and the bottom right is 4. Then each of the eight teeth and each molar are numbered 1 through 8. The 1 starts at the front middle tooth, and the numbers rise the farther back we go. So for example, the back tooth on the lower left side would be 48 according to FDI notation, which means quadrant 4, number 8. Therefore, the quadrant segmentation dataset can significantly simplify the dental enumeration task, even though evaluations will be made only on the fully annotated third data.

Note: The datasets are fully identical to the data used for our baseline method named as HierarchicalDet. Therefore, please visit HierarchicalDet (Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays) repo for more info.

CITING US

If you use DENTEX, we would appreciate references to the following papers.

Ibrahim Ethem Hamamci, Sezgin Er, Enis Simsar, Anjany Sekuboyina, Mustafa Gundogar, Bernd Stadlinger, Albert Mehl, Bjoern Menze., Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays, 2023.
Pre-print: https://arxiv.org/abs/2303.06500

Hamamci, I., Er, S., Simsar, E., Yuksel, A., Gultekin, S., Ozdemir, S., Yang, K., Li, H., Pati, S., Stadlinger, B., & others (2023). DENTEX: An Abnormal Tooth Detection with Dental Enumeration and Diagnosis Benchmark for Panoramic X-rays.
Pre-print: https://arxiv.org/abs/2305.19112
A
Labelled Weed Detection Images for Hot Peppers
data.amerigeoss.org
Updated Nov 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trinidad and Tobago (2022). Labelled Weed Detection Images for Hot Peppers [Dataset]. https://data.amerigeoss.org/dataset/weeddetection_labelled_hotpeppers
Explore at:
Dataset updated
Nov 1, 2022
Dataset provided by
Trinidad and Tobago
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data contains the corresponding labelled images of Capsicum Annuum that are included in the "Unlabelled Weed Detection Images for Hot Peppers" data set on this site. This data set contains the labels 0,1 and 2 which can be displayed by assigning a unique pixel value (Eg. Recommended: 0,60,255) to each occurrence of the label. These images can be utilised as ground truth labels for machine learning and data exploration. These labels represent three categories, namely, weed, crop and background. The labels were assigned by a team of trained individuals from Trinidad and Tobago using the Image Labeller App in the Computer Vision library from Matlab.
P
IITKGP_Fence Dataset Dataset
paperswithcode.com
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). IITKGP_Fence Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/iitkgp-fence-dataset
Explore at:
Dataset updated
Sep 20, 2024
Description
Overview The IITKGP_Fence dataset is designed for tasks related to fence-like occlusion detection, defocus blur, depth mapping, and object segmentation. The captured data vaies in scene composition, background defocus, and object occlusions. The dataset comprises both labeled and unlabeled data, as well as additional video and RGB-D data. The contains ground truth occlusion masks (GT) for the corresponding images. We created the ground truth occlusion labels in a semi-automatic way with user interaction.

Key Dataset Features:

Fence Detection: Designed for detecting fences or fence-like structures that might occlude objects. Defocus Blur: Also contains images and videos with blurred objects, likely to challenge detection and segmentation algorithms. RGBD Data: Offers depth information alongside RGB images, which can be used for tasks like 3D reconstruction or occlusion handling. Unlabeled and Labeled Data: Facilitates both supervised and unsupervised learning tasks. The Labeled folder data provides ground truth occlusion masks, while the Unlabeled folder data allows for further experimentation or self-supervised methods.

Dataset Repository

GitHub Repository: Occlusion-Removal Paper: Deep Generative Adversarial Network for Occlusion Removal from a Single Image Authors: Sankaraganesh Jonna, Moushumi Medhi, Rajiv Ranjan Sahay

Contact medhi.moushumi@iitkgp.ac.in
d
Replication Data for Igbo Natural Language Processing Tasks I
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nweya, Gerald Okey; Akinola Solomon Oluwole; Onwuegbuzia, Emeka Felix; Ejinwa, Samuel Obinna; Adiboshi, Anita; Nwokwu, Daniel Success; Ihunna Peter; Osuagwu, Amarachi Akudo (2023). Replication Data for Igbo Natural Language Processing Tasks I [Dataset]. http://doi.org/10.7910/DVN/RXBNCZ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/RXBNCZ
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Nweya, Gerald Okey; Akinola Solomon Oluwole; Onwuegbuzia, Emeka Felix; Ejinwa, Samuel Obinna; Adiboshi, Anita; Nwokwu, Daniel Success; Ihunna Peter; Osuagwu, Amarachi Akudo
Description
The Igbo synchronised corpus (IgboSynCorp) is an annotated corpus of spoken Igbo created by a team of linguists and NLP experts at the University of Ibadan and Afe Babalola University, Nigeria. The project was designed to create an open access labelled and unlabelled dataset for Natural Language Processing tasks in the Igbo language. The dataset was created to enable robust and more equitable application of machine learning tools of high social value in Igbo. The dataset is consists of ELAN text and wav files of Igbo speech. There are two categories of ELAN files: Gold files (90 mins) and Non Gold files (188 mins). The Gold files (19,722 words or 2761 sentences were transcribed phonetically and orthographically, translated to English, glossed and PoS tagged based on the universal dependency PoS tags . The None Gold files were only transcribed orthographically and translated to English. There are 110 recordings of spoken Igbo (.wav Files) amounting to 38.8075 hours or 2,328.45 minutes. There are 110 wav files of Igbo Oral narratives. The metadata is compiled in excel sheets. The Igbosyncorp Metadata I contains the demographic information about the language consultants. While Igbosyncorp metadata II outlines domains of speech represented in the individual wav file (oral narrative). There are two lexicon files with about 2300 words altogether which originated from the glossing and part of speech tagging, The project was funded by Lacuna Fund https://lacunafund.org of the Meridian Institute, 105 Village Place, Dillion, Colorado 80435, United States of America. (2022-06-21)
P
CLEAR Dataset
paperswithcode.com
Updated Aug 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhiqiu Lin; Jia Shi; Deepak Pathak; Deva Ramanan (2024). CLEAR Dataset [Dataset]. https://paperswithcode.com/dataset/clear
Explore at:
Dataset updated
Aug 25, 2024
Authors
Zhiqiu Lin; Jia Shi; Deepak Pathak; Deva Ramanan
Description
CLEAR is a continual image classification benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014). CLEAR is built from existing large-scale image collections (YFCC100M) through a novel and scalable low-cost approach to visio-linguistic dataset curation. The pipeline makes use of pretrained vision language models (e.g. CLIP) to interactively build labeled datasets, which are further validated with crowd-sourcing to remove errors and even inappropriate images (hidden in original YFCC100M). The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning.

Facebook

Twitter

Click to copy link

Link copied

Cite

Seung-Ah Lee; Hyun Su Kim; Ehwa Yang; Young Cheol Yoon; Ji Hyun Lee; Byung-Ok Choi; Jae-Hun Kim (2024). Number of images used for the training and testing of the models with different labeling strategies. [Dataset]. http://doi.org/10.1371/journal.pone.0310203.t001

Number of images used for the training and testing of the models with different labeling strategies.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0310203.t001

Dataset updated

Sep 6, 2024

Dataset provided by

PLOS ONE

Authors

Seung-Ah Lee; Hyun Su Kim; Ehwa Yang; Young Cheol Yoon; Ji Hyun Lee; Byung-Ok Choi; Jae-Hun Kim

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Number of images used for the training and testing of the models with different labeling strategies.

Clear search

Close search

Google apps

Main menu

Number of images used for the training and testing of the models with...

DensePASS Dataset

The dataset and MAE model code for "An Efficient Lightning Classifier Using...

Quantitative phase microscopy labelled (449 FOVs) and unlabelled (1819 FOVs)...

Unlabelled Weed Detection Images for Hot Peppers

Unlabelled training datasets of AIS Trajectories from Danish Waters for...

Data from: Amos: A large-scale abdominal multi-organ benchmark for versatile...

Global import data of Label Blank

Table_1_HiRAND: A novel GCN semi-supervised deep learning-based framework...

LEPset

Proteomics LC-MS/MS test dataset for protein quantitation via stable isotope...

CalMS21 Dataset

Global export data of Label Blank

Simulation results for Scenario 2.

Semantic Shift Benchmark Corruption (SSB-C)

DENTEX CHALLENGE 2023

Labelled Weed Detection Images for Hot Peppers

IITKGP_Fence Dataset Dataset

Replication Data for Igbo Natural Language Processing Tasks I

CLEAR Dataset

Number of images used for the training and testing of the models with different labeling strategies.