34 datasets found
  1. f

    Number of images used for the training and testing of the models with...

    • plos.figshare.com
    xls
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seung-Ah Lee; Hyun Su Kim; Ehwa Yang; Young Cheol Yoon; Ji Hyun Lee; Byung-Ok Choi; Jae-Hun Kim (2024). Number of images used for the training and testing of the models with different labeling strategies. [Dataset]. http://doi.org/10.1371/journal.pone.0310203.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 6, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Seung-Ah Lee; Hyun Su Kim; Ehwa Yang; Young Cheol Yoon; Ji Hyun Lee; Byung-Ok Choi; Jae-Hun Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of images used for the training and testing of the models with different labeling strategies.

  2. P

    DensePASS Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Oct 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chaoxiang Ma; Jiaming Zhang; Kailun Yang; Alina Roitberg; Rainer Stiefelhagen (2021). DensePASS Dataset [Dataset]. https://paperswithcode.com/dataset/densepass
    Explore at:
    Dataset updated
    Oct 22, 2021
    Authors
    Chaoxiang Ma; Jiaming Zhang; Kailun Yang; Alina Roitberg; Rainer Stiefelhagen
    Description

    DensePASS - a novel densely annotated dataset for panoramic segmentation under cross-domain conditions, specifically built to study the Pinhole-to-Panoramic transfer and accompanied with pinhole camera training examples obtained from Cityscapes. DensePASS covers both, labelled- and unlabelled 360-degree images, with the labelled data comprising 19 classes which explicitly fit the categories available in the source domain (i.e. pinhole) data.

  3. The dataset and MAE model code for "An Efficient Lightning Classifier Using...

    • zenodo.org
    zip
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jingyu Lu; Jingyu Lu (2024). The dataset and MAE model code for "An Efficient Lightning Classifier Using a Self-supervised Learning Neural Network" [Dataset]. http://doi.org/10.5281/zenodo.14556712
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jingyu Lu; Jingyu Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These files contain the BLNET lightning dataset and a modified MAE neural network code. The dataset includes 100,000 unlabeled lightning pulse files and 3,000 labeled lightning pulses, each spanning 1 ms (5,000 points per file). The unlabeled data is used for pretraining the MAE, while the labeled data is used for finetuning the MAE.

  4. Z

    Quantitative phase microscopy labelled (449 FOVs) and unlabelled (1819 FOVs)...

    • data.niaid.nih.gov
    Updated Aug 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Balvan (2021). Quantitative phase microscopy labelled (449 FOVs) and unlabelled (1819 FOVs) dataset of various adherent cell lines for segmentation purposes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4771830
    Explore at:
    Dataset updated
    Aug 2, 2021
    Dataset provided by
    Jaromir Gumulec
    Jan Balvan
    Tomas Vicar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains microscopic images of multiple cell lines captured by multiple microcopic without use of any fluorescent labeling and a manually annotated ground truth for subsequent use in segmentation algorithms. Dataset also includes images reconstructed according to the methods described below in order to ease further segmentation.

    Our data consist of

    244 labelled images of PC-3 (7,907 cells), 205 labelled PNT1A (9,288 cells), in the paper designated as "QPI_Seg_PNT1A_PC3", and

    1,819 unlabelled images with a mixture of 22Rv1, A2058, A2780, A8780, DU145, Fadu, G361, HOB and LNCaP used for pretraining, in the paper designated as "QPI_Cell_unlabelled".

    See Vicar et al. XXXX 2021 DOI XXX (TBA after publishing)

    Code using this dataset is available at XXXX (TBA after publishing)

    Materials and methods

    A set of adherent cell lines of various origins, tumorigenic potential, and morphology were used in this paper (PC-3, PNT1A, 22Rv1, DU145, LNCaP, A2058, A2780, A8780, Fadu, G361, HOB). PC-3, PNT1A, 22Rv1, DU145, LNCaP, A2780, and G361 cell lines were cultured in RPMI-1640 medium, A2058, FaDu, and HOB cell lines were cultured in DMEM-F12 medium, all supplemented with antibiotics (penicillin 100 U/ml and streptomycin 0.1 mg/ml), and with 10% fetal bovine serum (FBS). Prior to microscopy acquisition, the cells were maintained at 37 °C in a humidified (60%) incubator with 5% CO\textsubscript{2} (Sanyo, Japan). For acquisition purposes, the cells were cultivated in the Flow chamber µ-Slide I Luer Family (Ibidi, Martinsried, Germany). To maintain standard cultivation conditions during time-lapse experiments, cells were placed in the gas chamber H201 - for Mad City Labs Z100/Z500 piezo Z-stage (Okolab, Ottaviano NA, Italy). For the acquisition of QPI, a coherence-controlled holographic microscope (Telight, Q-Phase) was used. Objective Nikon Plan 10×/0.3 was used for hologram acquisition with a CCD camera (XIMEA MR4021MC). Holographic data were numerically reconstructed with the Fourier transform method (described in Slaby, 2013 and phase unwrapping was used on the phase image. QPI datasets used in this paper were acquired during various experimental setups and treatments. In most cases, experiments were conducted with the time-lapse acquisition. The final dataset contains images acquired at least three hours apart.

    Folder structure and file and filename description

    labelled (QPI_Seg_PNT1A_PC3): 205 FOVs PNT1A and 244 FOVs PC-3 cells with segmentation labels, e.g. 00001_PC3_img.tif - 32bit tiff image (in pg/um2 values) 00001_PC3_mask.png - 8bit image with mask with unique grayscale value corresponding to single cell in FOV.

    unlabelled (QPI_Cell_unlabelled): 11 varying cell lines, total 1819 FOVs, 32bit tiff image (in pg/um2 values)

  5. A

    Unlabelled Weed Detection Images for Hot Peppers

    • data.amerigeoss.org
    Updated Nov 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trinidad and Tobago (2022). Unlabelled Weed Detection Images for Hot Peppers [Dataset]. https://data.amerigeoss.org/dataset/weeddetection_hotpeppers
    Explore at:
    Dataset updated
    Nov 1, 2022
    Dataset provided by
    Trinidad and Tobago
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data contains images of Capsicum Annuum that have been grown on several smallholder farms in Trinidad and Tobago ;showing different levels of weed cover and different weed species. In most instances, weeds can be recognized by the naked eye. However, there are times when the weeds and the crops are of similar species and may appear almost identical. When weeds are plentiful and interwoven with crops, it becomes increasingly difficult to determine weed cover on a given piece of land. This data can be used in research surrounding weed detection in hot peppers. When accompanied by the labelled versions, this data can be used to train machine learning models for identifying weed detection in Capsicum Annuum (Hot Peppers).

  6. d

    Unlabelled training datasets of AIS Trajectories from Danish Waters for...

    • data.dtu.dk
    bin
    Updated Jul 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen (2023). Unlabelled training datasets of AIS Trajectories from Danish Waters for Abnormal Behavior Detection [Dataset]. http://doi.org/10.11583/DTU.21511842.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"

    DOI: https://doi.org/10.11583/DTU.c.6287841

    Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour.

    The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.

    We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course

    These datasets consists of unlabelled trajectories for the purpose of training unsupervised models. For labelled datasets for evaluation please refer to the collection. Link in Related publications.

    The data is saved using the pickle format for Python Each dataset is split into 2 files with naming convention:

    datasetInfo_XXX
    data_XXX

    Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.

    The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.

    Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl

    See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.

  7. Data from: Amos: A large-scale abdominal multi-organ benchmark for versatile...

    • zenodo.org
    • explore.openaire.eu
    csv, zip
    Updated May 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JI YUANFENG; JI YUANFENG (2023). Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation [Dataset]. http://doi.org/10.5281/zenodo.7262581
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    May 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    JI YUANFENG; JI YUANFENG
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. The paper can be found at https://arxiv.org/pdf/2206.08023.pdf

    In addition to providing the labeled 600 CT and MRI scans, we expect to provide 2000 CT and 1200 MRI scans without labels to support more learning tasks (semi-supervised, un-supervised, domain adaption, ...). The link can be found in:

    if you found this dataset useful for your research, please cite:

    @inproceedings{NEURIPS2022_ee604e1b,
     author = {Ji, Yuanfeng and Bai, Haotian and GE, Chongjian and Yang, Jie and Zhu, Ye and Zhang, Ruimao and Li, Zhen and Zhanng, Lingyan and Ma, Wanling and Wan, Xiang and Luo, Ping},
     booktitle = {Advances in Neural Information Processing Systems},
     editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
     pages = {36722--36732},
     publisher = {Curran Associates, Inc.},
     title = {AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation},
     url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/ee604e1bedbd069d9fc9328b7b9584be-Paper-Datasets_and_Benchmarks.pdf},
     volume = {35},
     year = {2022}
    }
    

  8. Global import data of Label Blank

    • volza.com
    csv
    Updated Dec 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2025). Global import data of Label Blank [Dataset]. https://www.volza.com/p/label-blank/import/import-in-india/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 3, 2025
    Dataset provided by
    Volza
    Authors
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
    Description

    15638 Global import shipment records of Label Blank with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

  9. f

    Table_1_HiRAND: A novel GCN semi-supervised deep learning-based framework...

    • frontiersin.figshare.com
    txt
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yue Huang; Zhiwei Rong; Liuchao Zhang; Zhenyi Xu; Jianxin Ji; Jia He; Weisha Liu; Yan Hou; Kang Li (2023). Table_1_HiRAND: A novel GCN semi-supervised deep learning-based framework for classification and feature selection in drug research and development.csv [Dataset]. http://doi.org/10.3389/fonc.2023.1047556.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Yue Huang; Zhiwei Rong; Liuchao Zhang; Zhenyi Xu; Jianxin Ji; Jia He; Weisha Liu; Yan Hou; Kang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The prediction of response to drugs before initiating therapy based on transcriptome data is a major challenge. However, identifying effective drug response label data costs time and resources. Methods available often predict poorly and fail to identify robust biomarkers due to the curse of dimensionality: high dimensionality and low sample size. Therefore, this necessitates the development of predictive models to effectively predict the response to drugs using limited labeled data while being interpretable. In this study, we report a novel Hierarchical Graph Random Neural Networks (HiRAND) framework to predict the drug response using transcriptome data of few labeled data and additional unlabeled data. HiRAND completes the information integration of the gene graph and sample graph by graph convolutional network (GCN). The innovation of our model is leveraging data augmentation strategy to solve the dilemma of limited labeled data and using consistency regularization to optimize the prediction consistency of unlabeled data across different data augmentations. The results showed that HiRAND achieved better performance than competitive methods in various prediction scenarios, including both simulation data and multiple drug response data. We found that the prediction ability of HiRAND in the drug vorinostat showed the best results across all 62 drugs. In addition, HiRAND was interpreted to identify the key genes most important to vorinostat response, highlighting critical roles for ribosomal protein-related genes in the response to histone deacetylase inhibition. Our HiRAND could be utilized as an efficient framework for improving the drug response prediction performance using few labeled data.

  10. Z

    LEPset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang, Pingping (2023). LEPset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8041284
    Explore at:
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Zhang, Pingping
    Sheng,Bin
    Li, Jiajia
    Wang,Kaixuan
    Wang,Teng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LEPset is a large-scale EUS-based pancreas image dataset from the Department of Goenterology, Changhai Hospital, Second Military Medical University/Naval Medical University. This dataset consists of 420 patients and 3,500 images, and it has been divided into two categories (PC and NPC). We have invited experienced clinicians to annotate the category labels for all 3500 EUS images. Moreover, our LEPset also has 8,000 EUS images without any classification annotation.

    1. After downloading the data set LEPset.zip, select the appropriate unzip file to extract it
    2. After unzipping, there will be two folders: unlabeled and labeled
    3. There are 8000 EUS images in the unlabeled folder and two folders in the labeled folder, NPC and PC, representing non-pancreatic cancer and pancreatic cancer respectively. 140 patients (1820 images) in NPC and 280 patients (1680 images) in PC
    4. Unlabelled images can be used for pre-training of the model, and labelled images can be used for training and validation of the supervised model
  11. Z

    Proteomics LC-MS/MS test dataset for protein quantitation via stable isotope...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sigloch, Florian Christoph (2020). Proteomics LC-MS/MS test dataset for protein quantitation via stable isotope labelling [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1051551
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Sigloch, Florian Christoph
    Tholen, Stefan
    Schilling, Oliver
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The provided mzML file can be used as a test dataset for protein identification and quantitation software. It was generated from human embryonic kidney (HEK) cells that were either unlabelled or labelled with heavy SILAC (K6R6, unimod accession 188, PSI-MS Name: "Label:13C(6)"). Apart from different labelling, the HEK cells were kept in exactly the same conditions and harvested simultaneously. Light and heavy labelled proteins from HEK cell lysate were mixed in a certain ratio, digested with Trypsin and measured on a ThermoFisher QExactive mass spectrometer. A more detailed description on the generation of the dataset will soon be accessible at PRIDE.

    The provided mzML file has been converted from Thermo RAW and slightly modified via msConvert (ProteoWizard). To reduce the filesize and to speed up analysis, it has further been filtered to contain only the data measured between 2,000 sec and 3,000 sec of the original LC-MS/MS run.

  12. P

    CalMS21 Dataset

    • paperswithcode.com
    Updated Apr 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer J. Sun; Tomomi Karigo; Dipam Chakraborty; Sharada P. Mohanty; Benjamin Wild; Quan Sun; Chen Chen; David J. Anderson; Pietro Perona; Yisong Yue; Ann Kennedy (2021). CalMS21 Dataset [Dataset]. https://paperswithcode.com/dataset/calms21
    Explore at:
    Dataset updated
    Apr 5, 2021
    Authors
    Jennifer J. Sun; Tomomi Karigo; Dipam Chakraborty; Sharada P. Mohanty; Benjamin Wild; Quan Sun; Chen Chen; David J. Anderson; Pietro Perona; Yisong Yue; Ann Kennedy
    Description

    The Caltech Mouse Social Interactions (CalMS21) dataset is a multi-agent dataset from behavioral neuroscience. The dataset consists of trajectory data of social interactions, recorded from videos of freely behaving mice in a standard resident-intruder assay. The CalMS21 dataset is part of the Multi-Agent Behavior Challenge 2021.

    To help accelerate behavioral studies, the CalMS21 dataset provides a benchmark to evaluate the performance of automated behavior classification methods in three settings: (1) for training on large behavioral datasets all annotated by a single annotator, (2) for style transfer to learn inter-annotator differences in behavior definitions, and (3) for learning of new behaviors of interest given limited training data. The dataset consists of 6 million frames of unlabelled tracked poses of interacting mice, as well as over 1 million frames with tracked poses and corresponding frame-level behavior annotations. The challenge of the dataset is to be able to classify behaviors accurately using both labelled and unlabelled tracking data, as well as being able to generalize to new annotators and behaviors.

  13. Global export data of Label Blank

    • volza.com
    csv
    Updated Jan 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza.LLC (2025). Global export data of Label Blank [Dataset]. https://www.volza.com/exports-india/india-export-data-of-label+blank
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 22, 2025
    Dataset provided by
    Volza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of exporters, Sum of export value, 2014-01-01/2021-09-30, Count of export shipments
    Description

    1554 Global export shipment records of Label Blank with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

  14. f

    Simulation results for Scenario 2.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ciaran Evans; Max G’Sell (2024). Simulation results for Scenario 2. [Dataset]. http://doi.org/10.1371/journal.pone.0310194.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 16, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Ciaran Evans; Max G’Sell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classifiers have been developed to help diagnose dengue fever in patients presenting with febrile symptoms. However, classifier predictions often rely on the assumption that new observations come from the same distribution as training data. If the population prevalence of dengue changes, as would happen with a dengue outbreak, it is important to raise an alarm as soon as possible, so that appropriate public health measures can be taken and also so that the classifier can be re-calibrated. In this paper, we consider the problem of detecting such a change in distribution in sequentially-observed, unlabeled classification data. We focus on label shift changes to the distribution, where the class priors shift but the class conditional distributions remain unchanged. We reduce this problem to the problem of detecting a change in the one-dimensional classifier scores, leading to simple nonparametric sequential changepoint detection procedures. Our procedures leverage classifier training data to estimate the detection statistic, and converge to their parametric counterparts in the size of the training data. In simulated outbreaks with real dengue data, we show that our method outperforms other detection procedures in this label shift setting.

  15. h

    Semantic Shift Benchmark Corruption (SSB-C)

    • datahub.hku.hk
    bin
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongjun Wang; Sagar Vaze; Kai Han (2025). Semantic Shift Benchmark Corruption (SSB-C) [Dataset]. http://doi.org/10.25442/hku.28607261.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 18, 2025
    Dataset provided by
    HKU Data Repository
    Authors
    Hongjun Wang; Sagar Vaze; Kai Han
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Generalized Category Discovery (GCD) is a challenging task in which, given a partially labelled dataset, models must categorize all unlabelled instances, regardless of whether they come from labelled categories or from new ones. In this paper, we challenge a remaining assumption in this task: that all images share the same domain. Specifically, we introduce a new task and method to handle GCD when the unlabelled data also contains images from different domains to the labelled set. Our proposed `HiLo' networks extract High-level semantic and Low-level domain features, before minimizing the mutual information between the representations. Our intuition is that the clusterings based on domain information and semantic information should be independent. We further extend our method with a specialized domain augmentation tailored for the GCD task, as well as a curriculum learning approach. Finally, we construct a benchmark from corrupted fine-grained datasets as well as a large-scale evaluation on DomainNet with real-world domain shifts, reimplementing a number of GCD baselines in this setting. We demonstrate that HiLo outperforms SoTA category discovery models by a large margin on all evaluations.

  16. DENTEX CHALLENGE 2023

    • zenodo.org
    • kaggle.com
    zip
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sezgin Er; Sezgin Er (2023). DENTEX CHALLENGE 2023 [Dataset]. http://doi.org/10.5281/zenodo.7812323
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sezgin Er; Sezgin Er
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DENTEX CHALLENGE

    We present the Dental Enumeration and Diagnosis on Panoramic X-rays Challenge (DENTEX), organized in conjunction with the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) in 2023. The primary objective of this challenge is to develop algorithms that can accurately detect abnormal teeth with dental enumeration and associated diagnosis. This not only aids in accurate treatment planning but also helps practitioners carry out procedures with a low margin of error.

    The challenge provides three types of hierarchically annotated data and additional unlabeled X-rays for optional pre-training. The annotation of the data is structured using the Fédération Dentaire Internationale (FDI) system. The first set of data is partially labeled because it only includes quadrant information. The second set of data is also partially labeled but contains additional enumeration information along with the quadrant. The third data is fully labeled because it includes all quadrant-enumeration-diagnosis information for each abnormal tooth, and all participant algorithms will be benchmarked on the third data.

    DENTEX aims to provide insights into the effectiveness of AI in dental radiology analysis and its potential to improve dental practice by comparing frameworks that simultaneously point out abnormal teeth with dental enumeration and associated diagnosis on panoramic dental X-rays.

    Please visit our website to join DENTEX (Dental Enumeration and Diagnosis on Panoramic X- rays Challenge) which is held at MICCAI2023.

    DATA

    The DENTEX dataset comprises panoramic dental X-rays obtained from three different institutions using standard clinical conditions but varying equipment and imaging protocols, resulting in diverse image quality reflecting heterogeneous clinical practice. The dataset includes X-rays from patients aged 12 and above, randomly selected from the hospital's database to ensure patient privacy and confidentiality.

    To enable effective use of the FDI system, the dataset is hierarchically organized into three types of data;

    • (a) 693 X-rays labeled for quadrant detection and quadrant classes only,

    • (b) 634 X-rays labeled for tooth detection with quadrant and tooth enumeration classes,

    • (c) 1005 X-rays fully labeled for abnormal tooth detection with quadrant, tooth enumeration, and diagnosis classes.

    The diagnosis class includes four specific categories: caries, deep caries, periapical lesions, and impacted teeth. An additional 1571 unlabeled X-rays are provided for pre-training.

    Data Split for Evaluation and Training

    The DENTEX 2023 dataset comprises three types of data: (a) partially annotated quadrant data, (b) partially annotated quadrant-enumeration data, and (c) fully annotated quadrant-enumeration-diagnosis data. The first two types of data are intended for training and development purposes, while the third type is used for training and evaluations.

    To comply with standard machine learning practices, the fully annotated third dataset, consisting of 1005 panoramic X-rays, is partitioned into training, validation, and testing subsets, comprising 705, 50, and 250 images, respectively. Ground truth labels are provided only for the training data, while the validation data is provided without associated ground truth, and the testing data is kept hidden from participants.

    Annotation Protocol

    The DENTEX provides three hierarchically annotated datasets that facilitate various dental detection tasks: (1) quadrant-only for quadrant detection, (2) quadrant-enumeration for tooth detection, and (3) quadrant-enumeration-diagnosis for abnormal tooth detection. Although it may seem redundant to provide a quadrant detection dataset, it is crucial for utilizing the FDI Numbering System. The FDI system is a globally-used system that assigns each quadrant of the mouth a number from 1 through 4. The top right is 1, the top left is 2, the bottom left is 3, and the bottom right is 4. Then each of the eight teeth and each molar are numbered 1 through 8. The 1 starts at the front middle tooth, and the numbers rise the farther back we go. So for example, the back tooth on the lower left side would be 48 according to FDI notation, which means quadrant 4, number 8. Therefore, the quadrant segmentation dataset can significantly simplify the dental enumeration task, even though evaluations will be made only on the fully annotated third data.

    Note: The datasets are fully identical to the data used for our baseline method named as HierarchicalDet. Therefore, please visit HierarchicalDet (Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays) repo for more info.

    CITING US

    If you use DENTEX, we would appreciate references to the following papers.

    1. Ibrahim Ethem Hamamci, Sezgin Er, Enis Simsar, Anjany Sekuboyina, Mustafa Gundogar, Bernd Stadlinger, Albert Mehl, Bjoern Menze., Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays, 2023.
      Pre-print: https://arxiv.org/abs/2303.06500
    2. Hamamci, I., Er, S., Simsar, E., Yuksel, A., Gultekin, S., Ozdemir, S., Yang, K., Li, H., Pati, S., Stadlinger, B., & others (2023). DENTEX: An Abnormal Tooth Detection with Dental Enumeration and Diagnosis Benchmark for Panoramic X-rays.
      Pre-print: https://arxiv.org/abs/2305.19112

  17. A

    Labelled Weed Detection Images for Hot Peppers

    • data.amerigeoss.org
    Updated Nov 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trinidad and Tobago (2022). Labelled Weed Detection Images for Hot Peppers [Dataset]. https://data.amerigeoss.org/dataset/weeddetection_labelled_hotpeppers
    Explore at:
    Dataset updated
    Nov 1, 2022
    Dataset provided by
    Trinidad and Tobago
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data contains the corresponding labelled images of Capsicum Annuum that are included in the "Unlabelled Weed Detection Images for Hot Peppers" data set on this site. This data set contains the labels 0,1 and 2 which can be displayed by assigning a unique pixel value (Eg. Recommended: 0,60,255) to each occurrence of the label. These images can be utilised as ground truth labels for machine learning and data exploration. These labels represent three categories, namely, weed, crop and background. The labels were assigned by a team of trained individuals from Trinidad and Tobago using the Image Labeller App in the Computer Vision library from Matlab.

  18. P

    IITKGP_Fence Dataset Dataset

    • paperswithcode.com
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). IITKGP_Fence Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/iitkgp-fence-dataset
    Explore at:
    Dataset updated
    Sep 20, 2024
    Description

    Overview The IITKGP_Fence dataset is designed for tasks related to fence-like occlusion detection, defocus blur, depth mapping, and object segmentation. The captured data vaies in scene composition, background defocus, and object occlusions. The dataset comprises both labeled and unlabeled data, as well as additional video and RGB-D data. The contains ground truth occlusion masks (GT) for the corresponding images. We created the ground truth occlusion labels in a semi-automatic way with user interaction.

    Key Dataset Features:

    Fence Detection: Designed for detecting fences or fence-like structures that might occlude objects. Defocus Blur: Also contains images and videos with blurred objects, likely to challenge detection and segmentation algorithms. RGBD Data: Offers depth information alongside RGB images, which can be used for tasks like 3D reconstruction or occlusion handling. Unlabeled and Labeled Data: Facilitates both supervised and unsupervised learning tasks. The Labeled folder data provides ground truth occlusion masks, while the Unlabeled folder data allows for further experimentation or self-supervised methods.

    Dataset Repository

    GitHub Repository: Occlusion-Removal Paper: Deep Generative Adversarial Network for Occlusion Removal from a Single Image Authors: Sankaraganesh Jonna, Moushumi Medhi, Rajiv Ranjan Sahay

    Contact medhi.moushumi@iitkgp.ac.in

  19. d

    Replication Data for Igbo Natural Language Processing Tasks I

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nweya, Gerald Okey; Akinola Solomon Oluwole; Onwuegbuzia, Emeka Felix; Ejinwa, Samuel Obinna; Adiboshi, Anita; Nwokwu, Daniel Success; Ihunna Peter; Osuagwu, Amarachi Akudo (2023). Replication Data for Igbo Natural Language Processing Tasks I [Dataset]. http://doi.org/10.7910/DVN/RXBNCZ
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Nweya, Gerald Okey; Akinola Solomon Oluwole; Onwuegbuzia, Emeka Felix; Ejinwa, Samuel Obinna; Adiboshi, Anita; Nwokwu, Daniel Success; Ihunna Peter; Osuagwu, Amarachi Akudo
    Description

    The Igbo synchronised corpus (IgboSynCorp) is an annotated corpus of spoken Igbo created by a team of linguists and NLP experts at the University of Ibadan and Afe Babalola University, Nigeria. The project was designed to create an open access labelled and unlabelled dataset for Natural Language Processing tasks in the Igbo language. The dataset was created to enable robust and more equitable application of machine learning tools of high social value in Igbo. The dataset is consists of ELAN text and wav files of Igbo speech. There are two categories of ELAN files: Gold files (90 mins) and Non Gold files (188 mins). The Gold files (19,722 words or 2761 sentences were transcribed phonetically and orthographically, translated to English, glossed and PoS tagged based on the universal dependency PoS tags . The None Gold files were only transcribed orthographically and translated to English. There are 110 recordings of spoken Igbo (.wav Files) amounting to 38.8075 hours or 2,328.45 minutes. There are 110 wav files of Igbo Oral narratives. The metadata is compiled in excel sheets. The Igbosyncorp Metadata I contains the demographic information about the language consultants. While Igbosyncorp metadata II outlines domains of speech represented in the individual wav file (oral narrative). There are two lexicon files with about 2300 words altogether which originated from the glossing and part of speech tagging, The project was funded by Lacuna Fund https://lacunafund.org of the Meridian Institute, 105 Village Place, Dillion, Colorado 80435, United States of America. (2022-06-21)

  20. P

    CLEAR Dataset

    • paperswithcode.com
    Updated Aug 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhiqiu Lin; Jia Shi; Deepak Pathak; Deva Ramanan (2024). CLEAR Dataset [Dataset]. https://paperswithcode.com/dataset/clear
    Explore at:
    Dataset updated
    Aug 25, 2024
    Authors
    Zhiqiu Lin; Jia Shi; Deepak Pathak; Deva Ramanan
    Description

    CLEAR is a continual image classification benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014). CLEAR is built from existing large-scale image collections (YFCC100M) through a novel and scalable low-cost approach to visio-linguistic dataset curation. The pipeline makes use of pretrained vision language models (e.g. CLIP) to interactively build labeled datasets, which are further validated with crowd-sourcing to remove errors and even inappropriate images (hidden in original YFCC100M). The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Seung-Ah Lee; Hyun Su Kim; Ehwa Yang; Young Cheol Yoon; Ji Hyun Lee; Byung-Ok Choi; Jae-Hun Kim (2024). Number of images used for the training and testing of the models with different labeling strategies. [Dataset]. http://doi.org/10.1371/journal.pone.0310203.t001

Number of images used for the training and testing of the models with different labeling strategies.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Sep 6, 2024
Dataset provided by
PLOS ONE
Authors
Seung-Ah Lee; Hyun Su Kim; Ehwa Yang; Young Cheol Yoon; Ji Hyun Lee; Byung-Ok Choi; Jae-Hun Kim
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Number of images used for the training and testing of the models with different labeling strategies.

Search
Clear search
Close search
Google apps
Main menu