Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of images used for the training and testing of the models with different labeling strategies.
DensePASS - a novel densely annotated dataset for panoramic segmentation under cross-domain conditions, specifically built to study the Pinhole-to-Panoramic transfer and accompanied with pinhole camera training examples obtained from Cityscapes. DensePASS covers both, labelled- and unlabelled 360-degree images, with the labelled data comprising 19 classes which explicitly fit the categories available in the source domain (i.e. pinhole) data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files contain the BLNET lightning dataset and a modified MAE neural network code. The dataset includes 100,000 unlabeled lightning pulse files and 3,000 labeled lightning pulses, each spanning 1 ms (5,000 points per file). The unlabeled data is used for pretraining the MAE, while the labeled data is used for finetuning the MAE.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains microscopic images of multiple cell lines captured by multiple microcopic without use of any fluorescent labeling and a manually annotated ground truth for subsequent use in segmentation algorithms. Dataset also includes images reconstructed according to the methods described below in order to ease further segmentation.
Our data consist of
244 labelled images of PC-3 (7,907 cells), 205 labelled PNT1A (9,288 cells), in the paper designated as "QPI_Seg_PNT1A_PC3", and
1,819 unlabelled images with a mixture of 22Rv1, A2058, A2780, A8780, DU145, Fadu, G361, HOB and LNCaP used for pretraining, in the paper designated as "QPI_Cell_unlabelled".
See Vicar et al. XXXX 2021 DOI XXX (TBA after publishing)
Code using this dataset is available at XXXX (TBA after publishing)
Materials and methods
A set of adherent cell lines of various origins, tumorigenic potential, and morphology were used in this paper (PC-3, PNT1A, 22Rv1, DU145, LNCaP, A2058, A2780, A8780, Fadu, G361, HOB). PC-3, PNT1A, 22Rv1, DU145, LNCaP, A2780, and G361 cell lines were cultured in RPMI-1640 medium, A2058, FaDu, and HOB cell lines were cultured in DMEM-F12 medium, all supplemented with antibiotics (penicillin 100 U/ml and streptomycin 0.1 mg/ml), and with 10% fetal bovine serum (FBS). Prior to microscopy acquisition, the cells were maintained at 37 °C in a humidified (60%) incubator with 5% CO\textsubscript{2} (Sanyo, Japan). For acquisition purposes, the cells were cultivated in the Flow chamber µ-Slide I Luer Family (Ibidi, Martinsried, Germany). To maintain standard cultivation conditions during time-lapse experiments, cells were placed in the gas chamber H201 - for Mad City Labs Z100/Z500 piezo Z-stage (Okolab, Ottaviano NA, Italy). For the acquisition of QPI, a coherence-controlled holographic microscope (Telight, Q-Phase) was used. Objective Nikon Plan 10×/0.3 was used for hologram acquisition with a CCD camera (XIMEA MR4021MC). Holographic data were numerically reconstructed with the Fourier transform method (described in Slaby, 2013 and phase unwrapping was used on the phase image. QPI datasets used in this paper were acquired during various experimental setups and treatments. In most cases, experiments were conducted with the time-lapse acquisition. The final dataset contains images acquired at least three hours apart.
Folder structure and file and filename description
labelled (QPI_Seg_PNT1A_PC3): 205 FOVs PNT1A and 244 FOVs PC-3 cells with segmentation labels, e.g. 00001_PC3_img.tif - 32bit tiff image (in pg/um2 values) 00001_PC3_mask.png - 8bit image with mask with unique grayscale value corresponding to single cell in FOV.
unlabelled (QPI_Cell_unlabelled): 11 varying cell lines, total 1819 FOVs, 32bit tiff image (in pg/um2 values)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data contains images of Capsicum Annuum that have been grown on several smallholder farms in Trinidad and Tobago ;showing different levels of weed cover and different weed species. In most instances, weeds can be recognized by the naked eye. However, there are times when the weeds and the crops are of similar species and may appear almost identical. When weeds are plentiful and interwoven with crops, it becomes increasingly difficult to determine weed cover on a given piece of land. This data can be used in research surrounding weed detection in hot peppers. When accompanied by the labelled versions, this data can be used to train machine learning models for identifying weed detection in Capsicum Annuum (Hot Peppers).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"
DOI: https://doi.org/10.11583/DTU.c.6287841
Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour.
The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.
We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course
These datasets consists of unlabelled trajectories for the purpose of training unsupervised models. For labelled datasets for evaluation please refer to the collection. Link in Related publications.
The data is saved using the pickle format for Python Each dataset is split into 2 files with naming convention:
datasetInfo_XXX
data_XXX
Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.
The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.
Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl
See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. The paper can be found at https://arxiv.org/pdf/2206.08023.pdf
In addition to providing the labeled 600 CT and MRI scans, we expect to provide 2000 CT and 1200 MRI scans without labels to support more learning tasks (semi-supervised, un-supervised, domain adaption, ...). The link can be found in:
if you found this dataset useful for your research, please cite:
@inproceedings{NEURIPS2022_ee604e1b, author = {Ji, Yuanfeng and Bai, Haotian and GE, Chongjian and Yang, Jie and Zhu, Ye and Zhang, Ruimao and Li, Zhen and Zhanng, Lingyan and Ma, Wanling and Wan, Xiang and Luo, Ping}, booktitle = {Advances in Neural Information Processing Systems}, editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh}, pages = {36722--36732}, publisher = {Curran Associates, Inc.}, title = {AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation}, url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/ee604e1bedbd069d9fc9328b7b9584be-Paper-Datasets_and_Benchmarks.pdf}, volume = {35}, year = {2022} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
15638 Global import shipment records of Label Blank with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The prediction of response to drugs before initiating therapy based on transcriptome data is a major challenge. However, identifying effective drug response label data costs time and resources. Methods available often predict poorly and fail to identify robust biomarkers due to the curse of dimensionality: high dimensionality and low sample size. Therefore, this necessitates the development of predictive models to effectively predict the response to drugs using limited labeled data while being interpretable. In this study, we report a novel Hierarchical Graph Random Neural Networks (HiRAND) framework to predict the drug response using transcriptome data of few labeled data and additional unlabeled data. HiRAND completes the information integration of the gene graph and sample graph by graph convolutional network (GCN). The innovation of our model is leveraging data augmentation strategy to solve the dilemma of limited labeled data and using consistency regularization to optimize the prediction consistency of unlabeled data across different data augmentations. The results showed that HiRAND achieved better performance than competitive methods in various prediction scenarios, including both simulation data and multiple drug response data. We found that the prediction ability of HiRAND in the drug vorinostat showed the best results across all 62 drugs. In addition, HiRAND was interpreted to identify the key genes most important to vorinostat response, highlighting critical roles for ribosomal protein-related genes in the response to histone deacetylase inhibition. Our HiRAND could be utilized as an efficient framework for improving the drug response prediction performance using few labeled data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LEPset is a large-scale EUS-based pancreas image dataset from the Department of Goenterology, Changhai Hospital, Second Military Medical University/Naval Medical University. This dataset consists of 420 patients and 3,500 images, and it has been divided into two categories (PC and NPC). We have invited experienced clinicians to annotate the category labels for all 3500 EUS images. Moreover, our LEPset also has 8,000 EUS images without any classification annotation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided mzML file can be used as a test dataset for protein identification and quantitation software. It was generated from human embryonic kidney (HEK) cells that were either unlabelled or labelled with heavy SILAC (K6R6, unimod accession 188, PSI-MS Name: "Label:13C(6)"). Apart from different labelling, the HEK cells were kept in exactly the same conditions and harvested simultaneously. Light and heavy labelled proteins from HEK cell lysate were mixed in a certain ratio, digested with Trypsin and measured on a ThermoFisher QExactive mass spectrometer. A more detailed description on the generation of the dataset will soon be accessible at PRIDE.
The provided mzML file has been converted from Thermo RAW and slightly modified via msConvert (ProteoWizard). To reduce the filesize and to speed up analysis, it has further been filtered to contain only the data measured between 2,000 sec and 3,000 sec of the original LC-MS/MS run.
The Caltech Mouse Social Interactions (CalMS21) dataset is a multi-agent dataset from behavioral neuroscience. The dataset consists of trajectory data of social interactions, recorded from videos of freely behaving mice in a standard resident-intruder assay. The CalMS21 dataset is part of the Multi-Agent Behavior Challenge 2021.
To help accelerate behavioral studies, the CalMS21 dataset provides a benchmark to evaluate the performance of automated behavior classification methods in three settings: (1) for training on large behavioral datasets all annotated by a single annotator, (2) for style transfer to learn inter-annotator differences in behavior definitions, and (3) for learning of new behaviors of interest given limited training data. The dataset consists of 6 million frames of unlabelled tracked poses of interacting mice, as well as over 1 million frames with tracked poses and corresponding frame-level behavior annotations. The challenge of the dataset is to be able to classify behaviors accurately using both labelled and unlabelled tracking data, as well as being able to generalize to new annotators and behaviors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1554 Global export shipment records of Label Blank with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classifiers have been developed to help diagnose dengue fever in patients presenting with febrile symptoms. However, classifier predictions often rely on the assumption that new observations come from the same distribution as training data. If the population prevalence of dengue changes, as would happen with a dengue outbreak, it is important to raise an alarm as soon as possible, so that appropriate public health measures can be taken and also so that the classifier can be re-calibrated. In this paper, we consider the problem of detecting such a change in distribution in sequentially-observed, unlabeled classification data. We focus on label shift changes to the distribution, where the class priors shift but the class conditional distributions remain unchanged. We reduce this problem to the problem of detecting a change in the one-dimensional classifier scores, leading to simple nonparametric sequential changepoint detection procedures. Our procedures leverage classifier training data to estimate the detection statistic, and converge to their parametric counterparts in the size of the training data. In simulated outbreaks with real dengue data, we show that our method outperforms other detection procedures in this label shift setting.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Generalized Category Discovery (GCD) is a challenging task in which, given a partially labelled dataset, models must categorize all unlabelled instances, regardless of whether they come from labelled categories or from new ones. In this paper, we challenge a remaining assumption in this task: that all images share the same domain. Specifically, we introduce a new task and method to handle GCD when the unlabelled data also contains images from different domains to the labelled set. Our proposed `HiLo' networks extract High-level semantic and Low-level domain features, before minimizing the mutual information between the representations. Our intuition is that the clusterings based on domain information and semantic information should be independent. We further extend our method with a specialized domain augmentation tailored for the GCD task, as well as a curriculum learning approach. Finally, we construct a benchmark from corrupted fine-grained datasets as well as a large-scale evaluation on DomainNet with real-world domain shifts, reimplementing a number of GCD baselines in this setting. We demonstrate that HiLo outperforms SoTA category discovery models by a large margin on all evaluations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DENTEX CHALLENGE
We present the Dental Enumeration and Diagnosis on Panoramic X-rays Challenge (DENTEX), organized in conjunction with the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) in 2023. The primary objective of this challenge is to develop algorithms that can accurately detect abnormal teeth with dental enumeration and associated diagnosis. This not only aids in accurate treatment planning but also helps practitioners carry out procedures with a low margin of error.
The challenge provides three types of hierarchically annotated data and additional unlabeled X-rays for optional pre-training. The annotation of the data is structured using the Fédération Dentaire Internationale (FDI) system. The first set of data is partially labeled because it only includes quadrant information. The second set of data is also partially labeled but contains additional enumeration information along with the quadrant. The third data is fully labeled because it includes all quadrant-enumeration-diagnosis information for each abnormal tooth, and all participant algorithms will be benchmarked on the third data.
DENTEX aims to provide insights into the effectiveness of AI in dental radiology analysis and its potential to improve dental practice by comparing frameworks that simultaneously point out abnormal teeth with dental enumeration and associated diagnosis on panoramic dental X-rays.
Please visit our website to join DENTEX (Dental Enumeration and Diagnosis on Panoramic X- rays Challenge) which is held at MICCAI2023.
DATA
The DENTEX dataset comprises panoramic dental X-rays obtained from three different institutions using standard clinical conditions but varying equipment and imaging protocols, resulting in diverse image quality reflecting heterogeneous clinical practice. The dataset includes X-rays from patients aged 12 and above, randomly selected from the hospital's database to ensure patient privacy and confidentiality.
To enable effective use of the FDI system, the dataset is hierarchically organized into three types of data;
(a) 693 X-rays labeled for quadrant detection and quadrant classes only,
(b) 634 X-rays labeled for tooth detection with quadrant and tooth enumeration classes,
(c) 1005 X-rays fully labeled for abnormal tooth detection with quadrant, tooth enumeration, and diagnosis classes.
The diagnosis class includes four specific categories: caries, deep caries, periapical lesions, and impacted teeth. An additional 1571 unlabeled X-rays are provided for pre-training.
Data Split for Evaluation and Training
The DENTEX 2023 dataset comprises three types of data: (a) partially annotated quadrant data, (b) partially annotated quadrant-enumeration data, and (c) fully annotated quadrant-enumeration-diagnosis data. The first two types of data are intended for training and development purposes, while the third type is used for training and evaluations.
To comply with standard machine learning practices, the fully annotated third dataset, consisting of 1005 panoramic X-rays, is partitioned into training, validation, and testing subsets, comprising 705, 50, and 250 images, respectively. Ground truth labels are provided only for the training data, while the validation data is provided without associated ground truth, and the testing data is kept hidden from participants.
Annotation Protocol
The DENTEX provides three hierarchically annotated datasets that facilitate various dental detection tasks: (1) quadrant-only for quadrant detection, (2) quadrant-enumeration for tooth detection, and (3) quadrant-enumeration-diagnosis for abnormal tooth detection. Although it may seem redundant to provide a quadrant detection dataset, it is crucial for utilizing the FDI Numbering System. The FDI system is a globally-used system that assigns each quadrant of the mouth a number from 1 through 4. The top right is 1, the top left is 2, the bottom left is 3, and the bottom right is 4. Then each of the eight teeth and each molar are numbered 1 through 8. The 1 starts at the front middle tooth, and the numbers rise the farther back we go. So for example, the back tooth on the lower left side would be 48 according to FDI notation, which means quadrant 4, number 8. Therefore, the quadrant segmentation dataset can significantly simplify the dental enumeration task, even though evaluations will be made only on the fully annotated third data.
Note: The datasets are fully identical to the data used for our baseline method named as HierarchicalDet. Therefore, please visit HierarchicalDet (Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays) repo for more info.
CITING US
If you use DENTEX, we would appreciate references to the following papers.
Hamamci, I., Er, S., Simsar, E., Yuksel, A., Gultekin, S., Ozdemir, S., Yang, K., Li, H., Pati, S., Stadlinger, B., & others (2023). DENTEX: An Abnormal Tooth Detection with Dental Enumeration and Diagnosis Benchmark for Panoramic X-rays.
Pre-print: https://arxiv.org/abs/2305.19112
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data contains the corresponding labelled images of Capsicum Annuum that are included in the "Unlabelled Weed Detection Images for Hot Peppers" data set on this site. This data set contains the labels 0,1 and 2 which can be displayed by assigning a unique pixel value (Eg. Recommended: 0,60,255) to each occurrence of the label. These images can be utilised as ground truth labels for machine learning and data exploration. These labels represent three categories, namely, weed, crop and background. The labels were assigned by a team of trained individuals from Trinidad and Tobago using the Image Labeller App in the Computer Vision library from Matlab.
Overview The IITKGP_Fence dataset is designed for tasks related to fence-like occlusion detection, defocus blur, depth mapping, and object segmentation. The captured data vaies in scene composition, background defocus, and object occlusions. The dataset comprises both labeled and unlabeled data, as well as additional video and RGB-D data. The contains ground truth occlusion masks (GT) for the corresponding images. We created the ground truth occlusion labels in a semi-automatic way with user interaction.
Key Dataset Features:
Fence Detection: Designed for detecting fences or fence-like structures that might occlude objects. Defocus Blur: Also contains images and videos with blurred objects, likely to challenge detection and segmentation algorithms. RGBD Data: Offers depth information alongside RGB images, which can be used for tasks like 3D reconstruction or occlusion handling. Unlabeled and Labeled Data: Facilitates both supervised and unsupervised learning tasks. The Labeled folder data provides ground truth occlusion masks, while the Unlabeled folder data allows for further experimentation or self-supervised methods.
Dataset Repository
GitHub Repository: Occlusion-Removal Paper: Deep Generative Adversarial Network for Occlusion Removal from a Single Image Authors: Sankaraganesh Jonna, Moushumi Medhi, Rajiv Ranjan Sahay
Contact medhi.moushumi@iitkgp.ac.in
The Igbo synchronised corpus (IgboSynCorp) is an annotated corpus of spoken Igbo created by a team of linguists and NLP experts at the University of Ibadan and Afe Babalola University, Nigeria. The project was designed to create an open access labelled and unlabelled dataset for Natural Language Processing tasks in the Igbo language. The dataset was created to enable robust and more equitable application of machine learning tools of high social value in Igbo. The dataset is consists of ELAN text and wav files of Igbo speech. There are two categories of ELAN files: Gold files (90 mins) and Non Gold files (188 mins). The Gold files (19,722 words or 2761 sentences were transcribed phonetically and orthographically, translated to English, glossed and PoS tagged based on the universal dependency PoS tags . The None Gold files were only transcribed orthographically and translated to English. There are 110 recordings of spoken Igbo (.wav Files) amounting to 38.8075 hours or 2,328.45 minutes. There are 110 wav files of Igbo Oral narratives. The metadata is compiled in excel sheets. The Igbosyncorp Metadata I contains the demographic information about the language consultants. While Igbosyncorp metadata II outlines domains of speech represented in the individual wav file (oral narrative). There are two lexicon files with about 2300 words altogether which originated from the glossing and part of speech tagging, The project was funded by Lacuna Fund https://lacunafund.org of the Meridian Institute, 105 Village Place, Dillion, Colorado 80435, United States of America. (2022-06-21)
CLEAR is a continual image classification benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014). CLEAR is built from existing large-scale image collections (YFCC100M) through a novel and scalable low-cost approach to visio-linguistic dataset curation. The pipeline makes use of pretrained vision language models (e.g. CLIP) to interactively build labeled datasets, which are further validated with crowd-sourcing to remove errors and even inappropriate images (hidden in original YFCC100M). The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of images used for the training and testing of the models with different labeling strategies.