13 datasets found

D
Data Annotation Tool Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Dec 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2024). Data Annotation Tool Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-annotation-tool-market-10075
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Dec 9, 2024
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Data Annotation Tool Market market was valued at USD 3.9 USD billion in 2023 and is projected to reach USD 6.64 USD billion by 2032, with an expected CAGR of 7.9% during the forecast period. A Data Annotation Tool is defined as the software that can be employed to make annotations to data hence helping a learning computer model learn patterns. These tools provide a way of segregating the data types to include images, texts, and audio, as well as videos. Some of the subcategories of annotation include images such as bounding boxes, segmentation, text such as entity recognition, sentiment analysis, audio such as transcription, sound labeling, and video such as object tracking. Other common features depend on the case but they commonly consist of interfaces, cooperation with others, suggestion of labels, and quality assurance. It can be used in the automotive industry (object detection for self-driving cars), text processing (classification of text), healthcare (medical imaging), and retail (recommendation). These tools get applied in training good quality, accurately labeled data sets for the engineering of efficient AI systems. Key drivers for this market are: Increasing Adoption of Cloud-based Managed Services to Drive Market Growth. Potential restraints include: Adverse Health Effect May Hamper Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
Data from: X-ray CT data with semantic annotations for the paper "A workflow...
catalog.data.gov
s.cnmilf.com
+2more
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2024). X-ray CT data with semantic annotations for the paper "A workflow for segmenting soil and plant X-ray CT images with deep learning in Google’s Colaboratory" [Dataset]. https://catalog.data.gov/dataset/x-ray-ct-data-with-semantic-annotations-for-the-paper-a-workflow-for-segmenting-soil-and-p-d195a
Explore at:
Dataset updated
May 2, 2024
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
Leaves from genetically unique Juglans regia plants were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA). Soil samples were collected in Fall of 2017 from the riparian oak forest located at the Russell Ranch Sustainable Agricultural Institute at the University of California Davis. The soil was sieved through a 2 mm mesh and was air dried before imaging. A single soil aggregate was scanned at 23 keV using the 10x objective lens with a pixel resolution of 650 nanometers on beamline 8.3.2 at the ALS. Additionally, a drought stressed almond flower bud (Prunus dulcis) from a plant housed at the University of California, Davis, was scanned using a 4x lens with a pixel resolution of 1.72 µm on beamline 8.3.2 at the ALS Raw tomographic image data was reconstructed using TomoPy. Reconstructions were converted to 8-bit tif or png format using ImageJ or the PIL package in Python before further processing. Images were annotated using Intel’s Computer Vision Annotation Tool (CVAT) and ImageJ. Both CVAT and ImageJ are free to use and open source. Leaf images were annotated in following Théroux-Rancourt et al. (2020). Specifically, Hand labeling was done directly in ImageJ by drawing around each tissue; with 5 images annotated per leaf. Care was taken to cover a range of anatomical variation to help improve the generalizability of the models to other leaves. All slices were labeled by Dr. Mina Momayyezi and Fiona Duong.To annotate the flower bud and soil aggregate, images were imported into CVAT. The exterior border of the bud (i.e. bud scales) and flower were annotated in CVAT and exported as masks. Similarly, the exterior of the soil aggregate and particulate organic matter identified by eye were annotated in CVAT and exported as masks. To annotate air spaces in both the bud and soil aggregate, images were imported into ImageJ. A gaussian blur was applied to the image to decrease noise and then the air space was segmented using thresholding. After applying the threshold, the selected air space region was converted to a binary image with white representing the air space and black representing everything else. This binary image was overlaid upon the original image and the air space within the flower bud and aggregate was selected using the “free hand” tool. Air space outside of the region of interest for both image sets was eliminated. The quality of the air space annotation was then visually inspected for accuracy against the underlying original image; incomplete annotations were corrected using the brush or pencil tool to paint missing air space white and incorrectly identified air space black. Once the annotation was satisfactorily corrected, the binary image of the air space was saved. Finally, the annotations of the bud and flower or aggregate and organic matter were opened in ImageJ and the associated air space mask was overlaid on top of them forming a three-layer mask suitable for training the fully convolutional network. All labeling of the soil aggregate and soil aggregate images was done by Dr. Devin Rippner. These images and annotations are for training deep learning models to identify different constituents in leaves, almond buds, and soil aggregates Limitations: For the walnut leaves, some tissues (stomata, etc.) are not labeled and only represent a small portion of a full leaf. Similarly, both the almond bud and the aggregate represent just one single sample of each. The bud tissues are only divided up into buds scales, flower, and air space. Many other tissues remain unlabeled. For the soil aggregate annotated labels are done by eye with no actual chemical information. Therefore particulate organic matter identification may be incorrect. Resources in this dataset:Resource Title: Annotated X-ray CT images and masks of a Forest Soil Aggregate. File Name: forest_soil_images_masks_for_testing_training.zipResource Description: This aggregate was collected from the riparian oak forest at the Russell Ranch Sustainable Agricultural Facility. The aggreagate was scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 0,0,0; pores spaces have a value of 250,250, 250; mineral solids have a value= 128,0,0; and particulate organic matter has a value of = 000,128,000. These files were used for training a model to segment the forest soil aggregate and for testing the accuracy, precision, recall, and f1 score of the model.Resource Title: Annotated X-ray CT images and masks of an Almond bud (P. Dulcis). File Name: Almond_bud_tube_D_P6_training_testing_images_and_masks.zipResource Description: Drought stressed almond flower bud (Prunis dulcis) from a plant housed at the University of California, Davis, was scanned by X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 4x lens with a pixel resolution of 1.72 µm using. For masks, the background has a value of 0,0,0; air spaces have a value of 255,255, 255; bud scales have a value= 128,0,0; and flower tissues have a value of = 000,128,000. These files were used for training a model to segment the almond bud and for testing the accuracy, precision, recall, and f1 score of the model.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads Resource Title: Annotated X-ray CT images and masks of Walnut leaves (J. Regia) . File Name: 6_leaf_training_testing_images_and_masks_for_paper.zipResource Description: Stems were collected from genetically unique J. regia accessions at the 117 USDA-ARS-NCGR in Wolfskill Experimental Orchard, Winters, California USA to use as scion, and were grafted by Sierra Gold Nursery onto a commonly used commercial rootstock, RX1 (J. microcarpa × J. regia). We used a common rootstock to eliminate any own-root effects and to simulate conditions for a commercial walnut orchard setting, where rootstocks are commonly used. The grafted saplings were repotted and transferred to the Armstrong lathe house facility at the University of California, Davis in June 2019, and kept under natural light and temperature. Leaves from each accession and treatment were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 170,170,170; Epidermis value= 85,85,85; Mesophyll value= 0,0,0; Bundle Sheath Extension value= 152,152,152; Vein value= 220,220,220; Air value = 255,255,255.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads
f
DataSheet1_Benchmarking automated cell type annotation tools for single-cell...
figshare.com
docx
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuge Wang; Xingzhi Sun; Hongyu Zhao (2023). DataSheet1_Benchmarking automated cell type annotation tools for single-cell ATAC-seq data.docx [Dataset]. http://doi.org/10.3389/fgene.2022.1063233.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2022.1063233.s001
Dataset updated
Jun 21, 2023
Dataset provided by
Frontiers
Authors
Yuge Wang; Xingzhi Sun; Hongyu Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As single-cell chromatin accessibility profiling methods advance, scATAC-seq has become ever more important in the study of candidate regulatory genomic regions and their roles underlying developmental, evolutionary, and disease processes. At the same time, cell type annotation is critical in understanding the cellular composition of complex tissues and identifying potential novel cell types. However, most existing methods that can perform automated cell type annotation are designed to transfer labels from an annotated scRNA-seq data set to another scRNA-seq data set, and it is not clear whether these methods are adaptable to annotate scATAC-seq data. Several methods have been recently proposed for label transfer from scRNA-seq data to scATAC-seq data, but there is a lack of benchmarking study on the performance of these methods. Here, we evaluated the performance of five scATAC-seq annotation methods on both their classification accuracy and scalability using publicly available single-cell datasets from mouse and human tissues including brain, lung, kidney, PBMC, and BMMC. Using the BMMC data as basis, we further investigated the performance of these methods across different data sizes, mislabeling rates, sequencing depths and the number of cell types unique to scATAC-seq. Bridge integration, which is the only method that requires additional multimodal data and does not need gene activity calculation, was overall the best method and robust to changes in data size, mislabeling rate and sequencing depth. Conos was the most time and memory efficient method but performed the worst in terms of prediction accuracy. scJoint tended to assign cells to similar cell types and performed relatively poorly for complex datasets with deep annotations but performed better for datasets only with major label annotations. The performance of scGCN and Seurat v3 was moderate, but scGCN was the most time-consuming method and had the most similar performance to random classifiers for cell types unique to scATAC-seq.
Global Full-Size Tray Automatic Labeling Equipment Market Industry Best...
statsndata.org
excel, pdf
Updated Feb 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Full-Size Tray Automatic Labeling Equipment Market Industry Best Practices 2025-2032 [Dataset]. https://www.statsndata.org/report/full-size-tray-automatic-labeling-equipment-market-324724
Explore at:
excel, pdfAvailable download formats
Dataset updated
Feb 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The Full-Size Tray Automatic Labeling Equipment market has emerged as a pivotal segment within the packaging and labeling industry, driven by the increasing demand for efficiency and accuracy in product packaging across various sectors, including food and beverage, pharmaceuticals, and consumer goods. This advanced
Z
Personal Protective Equipment Dataset (PPED)
data.niaid.nih.gov
zenodo.org
Updated May 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2022). Personal Protective Equipment Dataset (PPED) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6551757
Explore at:
Dataset updated
May 17, 2022
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Personal Protective Equipment Dataset (PPED)

This dataset serves as a benchmark for PPE in chemical plants We provide datasets and experimental results.

The dataset

We produced a data set based on the actual needs and relevant regulations in chemical plants. The standard GB 39800.1-2020 formulated by the Ministry of Emergency Management of the People’s Republic of China defines the protective requirements for plants and chemical laboratories. The complete dataset is contained in the folder PPED/data.

1.1. Image collection

We took more than 3300 pictures. We set the following different characteristics, including different environments, different distances, different lighting conditions, different angles, and the diversity of the number of people photographed.

Backgrounds: There are 4 backgrounds, including office, near machines, factory and regular outdoor scenes.

Scale: By taking pictures from different distances, the captured PPEs are classified in small, medium and large scales.

Light: Good lighting conditions and poor lighting conditions were studied.

Diversity: Some images contain a single person, and some contain multiple people.

Angle: The pictures we took can be divided into front and side.

A total of more than 3300 photos were taken in the raw data under all conditions. All images are located in the folder “PPED/data/JPEGImages”.

1.2. Label

We use Labelimg as the labeling tool, and we use the PASCAL-VOC labelimg format. Yolo use the txt format, we can use trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to txt file. Annotations are stored in the folder PPED/data/Annotations

1.3. Dataset Features

The pictures are made by us according to the different conditions mentioned above. The file PPED/data/feature.csv is a CSV file which notes all the .os of all the image. It records every feature of the picture, including lighting conditions, angles, backgrounds, number of people and scale.

1.4. Dataset Division

The data set is divided into 9:1 training set and test set.

Baseline Experiments

We provide baseline results with five models, namely Faster R-CNN ®, Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder PPED/experiment.

2.1. Environment and Configuration:

Intel Core i7-8700 CPU

NVIDIA GTX1060 GPU

16 GB of RAM

Python: 3.8.10

pytorch: 1.9.0

pycocotools: pycocotools-win

Windows 10

2.2. Applied Models

The source codes and results of the applied models is given in folder PPED/experiment with sub-folders corresponding to the model names.

2.2.1. Faster R-CNN

Faster R-CNN

backbone: resnet50+fpn

We downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth.

We modified the dataset path, training classes and training parameters including batch size.

We run train_res50_fpn.py start training.

Then, the weights are trained by the training set.

Finally, we validate the results on the test set.

backbone: mobilenetv2

the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.

The Faster R-CNN source code used in our experiment is given in folder PPED/experiment/Faster R-CNN. The weights of the fully-trained Faster R-CNN (R), Faster R-CNN (M) model are stored in file PPED/experiment/trained_models/resNetFpn-model-19.pth and mobile-model.pth. The performance measurements of Faster R-CNN (R) Faster R-CNN (M) are stored in folder PPED/experiment/results/Faster RCNN(R)and Faster RCNN(M).

2.2.2. SSD

backbone: resnet50

We downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth.

The same training method as Faster R-CNN is applied.

The SSD source code used in our experiment is given in folder PPED/experiment/ssd. The weights of the fully-trained SSD model are stored in file PPED/experiment/trained_models/SSD_19.pth. The performance measurements of SSD are stored in folder PPED/experiment/results/SSD.

2.2.3. YOLOv3-spp

backbone: DarkNet53

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov3-spp-ultralytics-608.pt.

The YOLOv3-spp source code used in our experiment is given in folder PPED/experiment/YOLOv3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file PPED/experiment/trained_models/YOLOvspp-19.pt. The performance measurements of YOLOv3-spp are stored in folder PPED/experiment/results/YOLOv3-spp.

2.2.4. YOLOv5

backbone: CSP_DarkNet

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov5s.

The YOLOv5 source code used in our experiment is given in folder PPED/experiment/yolov5. The weights of the fully-trained YOLOv5 model are stored in file PPED/experiment/trained_models/YOLOv5.pt. The performance measurements of YOLOv5 are stored in folder PPED/experiment/results/YOLOv5.

2.3. Evaluation

The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder PPED/experiment/eval.

Code Sources

Faster R-CNN (R and M)

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/faster_rcnn

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py

SSD

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/ssd

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py

YOLOv3-spp

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/yolov3-spp

YOLOv5

https://github.com/ultralytics/yolov5
2017 Census of Agriculture - Census Data Query Tool (CDQT)
agdatacommons.nal.usda.gov
bin
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA National Agricultural Statistics Service (2024). 2017 Census of Agriculture - Census Data Query Tool (CDQT) [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/2017_Census_of_Agriculture_-_Census_Data_Query_Tool_CDQT_/24663345
Explore at:
binAvailable download formats
Dataset updated
Feb 13, 2024
Dataset provided by
National Agricultural Statistics Servicehttp://www.nass.usda.gov/
United States Department of Agriculturehttp://usda.gov/
Authors
USDA National Agricultural Statistics Service
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Census of Agriculture is a complete count of U.S. farms and ranches and the people who operate them. Even small plots of land - whether rural or urban - growing fruit, vegetables or some food animals count if $1,000 or more of such products were raised and sold, or normally would have been sold, during the Census year. The Census of Agriculture, taken only once every five years, looks at land use and ownership, operator characteristics, production practices, income and expenditures. For America's farmers and ranchers, the Census of Agriculture is their voice, their future, and their opportunity. The Census Data Query Tool (CDQT) is a web-based tool that is available to access and download table level data from the Census of Agriculture Volume 1 publication. The data found via the CDQT may also be accessed in the NASS Quick Stats database. The CDQT is unique in that it automatically displays data from the past five Census of Agriculture publications. The CDQT is presented as a "2017 centric" view of the Census of Agriculture data. All data series that are present in the 2017 dataset are available within the CDQT, and any matching data series from prior Census years will also display (back to 1997). If a data series is not included in the 2017 dataset, then data cells will remain blank in the tool. For example, one of the data series had a label change from "Operator" to "Producer." This means that data from prior Census years labelled "Operator" will not show up where the label has changed to “Producer” for 2017. The new Census Data Query Tool application can be used to query Census data from 1997 through 2017. Data are searchable by Census table and are downloadable as CSV or PDF files. 2017 Census Ag Atlas Maps are also available for download. Resources in this dataset:Resource Title: 2017 Census of Agriculture - Census Data Query Tool (CDQT). File Name: Web Page, url: https://www.nass.usda.gov/Quick_Stats/CDQT/chapter/1/table/1 The Census Data Query Tool (CDQT) is a web based tool that is available to access and download table level data from the Census of Agriculture Volume 1 publication. The data found via the CDQT may also be accessed in the NASS Quick Stats database. The CDQT is unique in that it automatically displays data from the past five Census of Agriculture publications. The CDQT is presented as a "2017 centric" view of the Census of Agriculture data. All data series that are present in the 2017 dataset are available within the CDQT, and any matching data series from prior Census years will also display (back to 1997). If a data series is not included in the 2017 dataset, then data cells will remain blank in the tool. For example, one of the data series had a label change from "Operator" to "Producer." This means that data from prior Census years labelled "Operator" will not show up where the label has changed to "Producer" for 2017. Using CDQT:

Upon entering the CDQT, a data table is present. Changing the parameters at the top of the data table will retrieve different combinations of Census Chapter, Table, State, or County (when selecting Chapter 2). For the U.S., Volume 1, US/State Chapter 1 will include only U.S. data; Chapter 2 will include U.S. and State level data. For a State, Volume 1 US/State Level Data Chapter 1 will include only the State level data; Chapter 2 will include the State and county level data. Once a selection is made, press the “Update Grid” button to retrieve the new data table. Comma-separated values (CSV) download, compatible with most spreadsheet and database applications: to download a CSV file of the data as it is currently presented in the data grid, press the "CSV" button in the "Export Data" section of the toolbar. When CSV is chosen, data will be downloaded as numeric. To view the source PDF file for the data table, press the "View PDF" button in the toolbar.
h
Bitext-retail-ecommerce-llm-chatbot-training-dataset
huggingface.co
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-retail-ecommerce-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Retail (eCommerce) Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail (eCommerce)] sector can be easily achieved using our two-step approach to… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset.
M
Global Medical Image Annotation Market Industry Best Practices 2025-2032
statsndata.org
excel, pdf
Updated Feb 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Medical Image Annotation Market Industry Best Practices 2025-2032 [Dataset]. https://www.statsndata.org/report/medical-image-annotation-market-313867
Explore at:
excel, pdfAvailable download formats
Dataset updated
Feb 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The Medical Image Annotation market is an essential segment within the healthcare and technology sectors, playing a crucial role in the development of advanced diagnostic tools and artificial intelligence applications. This market revolves around the process of labeling medical images, such as X-rays, MRIs, and CT s
B
Replication Data for: Improving Objective Wound Assessment: "Fully-automated...
borealisdata.ca
Updated Mar 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jose Ramirez Garcia Luna; Dhanesh Ramachandram; Robert DJ Fraser; Justin Allport (2022). Replication Data for: Improving Objective Wound Assessment: "Fully-automated wound tissue segmentation using Deep Learning on mobile devices" [Dataset]. http://doi.org/10.5683/SP3/8C4FDV
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/8C4FDV
Dataset updated
Mar 14, 2022
Dataset provided by
Borealis
Authors
Jose Ramirez Garcia Luna; Dhanesh Ramachandram; Robert DJ Fraser; Justin Allport
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Background: The composition of tissue types present within a wound is a useful indicator of its healing progression and could be helpful in guiding its treatment. Additionally, this measure is clinically used in wound healing tools (e.g. BWAT) to assess risk and recommend treatment. However, the identification of wound tissue and the estimation of their relative composition is highly subjective and variable. This results in incorrect assessments being reported, leading to downstream impacts including inappropriate dressing selection, failure to identify wounds at risk of not healing, or failure to make appropriate referrals to specialists. Objective: To measure inter-and intra-rater variability in manual tissue segmentation and quantification among a cohort of wound care clinicians. To determine if an objective assessment of tissue types (i.e., size, amount) can be achieved using a deep convolutional neural network that predicts wound tissue types. The proposed objective measurement by machine learning model’s performance is reported in terms of mean intersection over union (mIOU) between model prediction and the ground truth labels. Finally, to compare the performance of the model wound tissue identification by a cohort of wound care clinicians. Methods: A dataset of 58 anonymized wound images of various types of chronic wounds from Swift Medical’s Wound Database was used to conduct the inter-rater and intra-rater agreement study. The dataset was split into 3 subsets, with 50% overlap between subsets to measure intra-rater agreement. Four different tissue types (epithelial, granulation, slough and eschar) within the wound bed were independently labelled by the 5 wound clinicians using a browser-based image annotation tool. Each subset was labelled at one-week intervals. Inter-rater and intra rater agreement was computed. Next, two separate deep convolutional neural networks architectures were developed for wound segmentation and tissue segmentation and are used in sequence in the proposed workflow. These models were trained using 465,187 wound image-label pairs and 17,000 image-label pairs respectively. This is by far the largest and most diverse reported dataset of labelled wound images used for training deep learning models for wound and wound tissue segmentation. This allows our models to be robust, unbiased towards skin tones and generalize well to unseen data. The deep learning model architectures were designed to be fast and nimble to allow them to run in near real-time on mobile devices. Results: We observed considerable variability when a cohort of wound clinicians was tasked to label the different tissue types within the wound using a browser-based image annotation tool. We report poor to moderate inter-rater agreement in identifying tissue types in chronic wound images. A very poor Krippendorff alpha value of 0.014 for inter-rater variability when identifying epithelization has been observed, while granulation is most consistently identified by the clinicians. The intra-rater ICC(3,1) (Intra-Class Correlation) however indicates raters are relatively consistent when labelling the same image multiple times over a period of time. Our deep learning models achieved a mean intersection over union (mIOU) of 0.8644 and 0.7192 for wound and tissue segmentation respectively. A cohort of wound clinicians, by consensus, rated 91% of the tissue segmentation results to be between fair and good in terms of tissue identification and segmentation quality. Conclusions: Our inter-rater agreement study validates that clinicians may exhibit considerable variability when identifying and visually estimating tissue proportion within the wound bed. The proposed deep learning model provides objective tissue identification and measurements to assist clinicians in documenting the wound more accurately. Our solution works on off-the-shelf mobile devices and was trained with the largest and most diverse chronic wound dataset ever reported and leading to a robust model when deployed. The proposed solution brings us a step closer to more accurate wound documentation and may lead to improved healing outcomes when deployed at scale.
The ATLAS of Traffic Lights
zenodo.org
mp4, zip
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rupert Polley; Nikolai Polley; Dominik Heid; Marc Heinrich; Sven Ochs; J. Marius Zöllner; Rupert Polley; Nikolai Polley; Dominik Heid; Marc Heinrich; Sven Ochs; J. Marius Zöllner (2025). The ATLAS of Traffic Lights [Dataset]. http://doi.org/10.5281/zenodo.14794667
Explore at:
zip, mp4Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.14794667
Dataset updated
Feb 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rupert Polley; Nikolai Polley; Dominik Heid; Marc Heinrich; Sven Ochs; J. Marius Zöllner; Rupert Polley; Nikolai Polley; Dominik Heid; Marc Heinrich; Sven Ochs; J. Marius Zöllner
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is an older version: Please use the newest available version.

Changelog:

31. Jan. 2024: v0.1 - We released a small dataset sample. Until the full release on the 28. Feb. 2025, the annotation format may be subject to change.

ATLAS

ATLAS (Applied Traffic Light Annotation Set) is a new, publicly available dataset designed to improve traffic light detection for autonomous driving. Existing open-source datasets often omit certain traffic light states and lack camera configurations for near and far distances. To address this, ATLAS features over 33,000 images collected from three synchronized cameras—wide, medium, and tele—with varied fields of view in the German city of Karlsruhe. This setup captures traffic lights at diverse distances and angles, including difficult overhead views. Each of the dataset’s 72,998 bounding boxes is meticulously labeled for 25 unique pictogram-state classes, covering rare but critical states (e.g., red-yellow) and pictograms (straight-right, straight-left). Additional annotations include challenging conditions such as heavy rain. All data is anonymized using state-of-the-art tools. ATLAS provides a comprehensive, high-quality resource for robust traffic light detection, overcoming limitations of existing datasets.

Camera FOV [°] Resolution Images
Front-Medium 61 × 39 1920 × 1200 25,158
Front-Tele 31 × 20 1920 × 1200 5,109
Front-Wide 106 × 92 2592 × 2048 2,777

Directory Format:

We provide the dataset in the following format:

├── ATLAS
├── train
├── front_medium
├── images
├── front_medium_1722622455-950002160.jpg
├── labels
├── front_medium_1722622455-950002160.txt
├── front_tele
├── front_wide
├── test
├── ATLAS_classes.yaml
├── LICENSE
└── README.md

Annotation Format:

Each line in an annotation file describes one bounding box using five fields:

class_id x_center y_center width height

class_id: An integer indicating the class of the annotated object. The file ATLAS_classes.yaml contains human-readable names corresponding to each numeric label.

x_center, y_center: The normalized coordinates of the bounding box center, relative to the image dimensions (in the range [0,1]), where x_center is measured horizontally and y_center vertically.

width, height: The normalized width and height of the bounding box, also expressed in the range [0,1]. These values are obtained by dividing the bounding box width and height in pixels by the overall image width and height, respectively.

Terms and Conditions

The ATLAS Dataset by FZI Research Center for Information Technology is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Therefore, the Dataset is only allowed to be used for non-commercial purposes, such as teaching and research. The Licensor thus grants the End User the right to use the dataset for its own internal and non-commercial use and the purpose of scientific research only. There may be inaccuracies, although the Licensor tried and will try its best to rectify any inaccuracy once found. We invite all users to report remarks via mail at polley@fzi.de

If the dataset is used in media, a link to the Licensor’s website is to be included. In case the End User uses the dataset within research papers, the following publication should be quoted:

Polley et al.: The ATLAS of Traffic Lights: A Reliable Perception Framework for Autonomous Driving (under review)
f
Data from: Protein-Labeling Reagents Selectively Activated by Copper(I)
figshare.com
acs.figshare.com
xlsx
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Cheng; Yuki Nishikawa; Takumi Wagatsuma; Taiho Kambe; Yu-ki Tanaka; Yasumitsu Ogra; Tomonori Tamura; Itaru Hamachi (2024). Protein-Labeling Reagents Selectively Activated by Copper(I) [Dataset]. http://doi.org/10.1021/acschembio.4c00011.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acschembio.4c00011.s002
Dataset updated
May 15, 2024
Dataset provided by
ACS Publications
Authors
Rong Cheng; Yuki Nishikawa; Takumi Wagatsuma; Taiho Kambe; Yu-ki Tanaka; Yasumitsu Ogra; Tomonori Tamura; Itaru Hamachi
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Copper is an essential trace element that participates in many biological processes through its unique redox cycling between cuprous (Cu+) and cupric (Cu2+) oxidation states. To elucidate the biological functions of copper, chemical biology tools that enable selective visualization and detection of copper ions and proteins in copper-rich environments are required. Herein, we describe the design of Cu+-responsive reagents based on a conditional protein labeling strategy. Upon binding Cu+, the probes generated quinone methide via oxidative bond cleavage, which allowed covalent labeling of surrounding proteins with high Cu+ selectivity. Using gel- and imaging-based analyses, the best-performing probe successfully detected changes in the concentration of labile Cu+ in living cells. Moreover, conditional proteomics analysis suggested intramitochondrial Cu+ accumulation in cells undergoing cuproptosis. Our results highlight the power of Cu+-responsive protein labeling in providing insights into the molecular mechanisms of Cu+ metabolism and homeostasis.
B
Toronto Land Use Spatial Data - parcel-level - (2019-2021)
borealisdata.ca
search.dataone.org
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel Fortin (2023). Toronto Land Use Spatial Data - parcel-level - (2019-2021) [Dataset]. http://doi.org/10.5683/SP3/1VMJAG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/1VMJAG
Dataset updated
Feb 23, 2023
Dataset provided by
Borealis
Authors
Marcel Fortin
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
Toronto
Description
Please note that this dataset is not an official City of Toronto land use dataset. It was created for personal and academic use using City of Toronto Land Use Maps (2019) found on the City of Toronto Official Plan website at https://www.toronto.ca/city-government/planning-development/official-plan-guidelines/official-plan/official-plan-maps-copy, along with the City of Toronto parcel fabric (Property Boundaries) found at https://open.toronto.ca/dataset/property-boundaries/ and Statistics Canada Census Dissemination Blocks level boundary files (2016). The property boundaries used were dated November 11, 2021. Further detail about the City of Toronto's Official Plan, consolidation of the information presented in its online form, and considerations for its interpretation can be found at https://www.toronto.ca/city-government/planning-development/official-plan-guidelines/official-plan/ Data Creation Documentation and Procedures Software Used The spatial vector data were created using ArcGIS Pro 2.9.0 in December 2021. PDF File Conversions Using Adobe Acrobat Pro DC software, the following downloaded PDF map images were converted to TIF format. 9028-cp-official-plan-Map-14_LandUse_AODA.pdf 9042-cp-official-plan-Map-22_LandUse_AODA.pdf 9070-cp-official-plan-Map-20_LandUse_AODA.pdf 908a-cp-official-plan-Map-13_LandUse_AODA.pdf 978e-cp-official-plan-Map-17_LandUse_AODA.pdf 97cc-cp-official-plan-Map-15_LandUse_AODA.pdf 97d4-cp-official-plan-Map-23_LandUse_AODA.pdf 97f2-cp-official-plan-Map-19_LandUse_AODA.pdf 97fe-cp-official-plan-Map-18_LandUse_AODA.pdf 9811-cp-official-plan-Map-16_LandUse_AODA.pdf 982d-cp-official-plan-Map-21_LandUse_AODA.pdf Georeferencing and Reprojecting Data Files The original projection of the PDF maps is unknown but were most likely published using MTM Zone 10 EPSG 2019 as per many of the City of Toronto's many datasets. They could also have possibly been published in UTM Zone 17 EPSG 26917 The TIF images were georeferenced in ArcGIS Pro using this projection with very good results. The images were matched against the City of Toronto's Centreline dataset found here The resulting TIF files and their supporting spatial files include: TOLandUseMap13.tfwx TOLandUseMap13.tif TOLandUseMap13.tif.aux.xml TOLandUseMap13.tif.ovr TOLandUseMap14.tfwx TOLandUseMap14.tif TOLandUseMap14.tif.aux.xml TOLandUseMap14.tif.ovr TOLandUseMap15.tfwx TOLandUseMap15.tif TOLandUseMap15.tif.aux.xml TOLandUseMap15.tif.ovr TOLandUseMap16.tfwx TOLandUseMap16.tif TOLandUseMap16.tif.aux.xml TOLandUseMap16.tif.ovr TOLandUseMap17.tfwx TOLandUseMap17.tif TOLandUseMap17.tif.aux.xml TOLandUseMap17.tif.ovr TOLandUseMap18.tfwx TOLandUseMap18.tif TOLandUseMap18.tif.aux.xml TOLandUseMap18.tif.ovr TOLandUseMap19.tif TOLandUseMap19.tif.aux.xml TOLandUseMap19.tif.ovr TOLandUseMap20.tfwx TOLandUseMap20.tif TOLandUseMap20.tif.aux.xml TOLandUseMap20.tif.ovr TOLandUseMap21.tfwx TOLandUseMap21.tif TOLandUseMap21.tif.aux.xml TOLandUseMap21.tif.ovr TOLandUseMap22.tfwx TOLandUseMap22.tif TOLandUseMap22.tif.aux.xml TOLandUseMap22.tif.ovr TOLandUseMap23.tfwx TOLandUseMap23.tif TOLandUseMap23.tif.aux.xml TOLandUseMap23.tif.ov Ground control points were saved for all georeferenced images. The files are the following: map13.txt map14.txt map15.txt map16.txt map17.txt map18.txt map19.txt map21.txt map22.txt map23.txt The City of Toronto's Property Boundaries shapefile, "property_bnds_gcc_wgs84.zip" were unzipped and also reprojected to EPSG 26917 (UTM Zone 17) into a new shapefile, "Property_Boundaries_UTM.shp" Mosaicing Images Once georeferenced, all images were then mosaiced into one image file, "LandUseMosaic20211220v01", within the project-generated Geodatabase, "Landuse.gdb" and exported TIF, "LandUseMosaic20211220.tif" Reclassifying Images Because the original images were of low quality and the conversion to TIF made the image colours even more inconsistent, a method was required to reclassify the images so that different land use classes could be identified. Using Deep learning Objects, the images were re-classified into useful consistent colours. Deep Learning Objects and Training The resulting mosaic was then prepared for reclassification using the Label Objects for Deep Learning tool in ArcGIS Pro. A training sample, "LandUseTrainingSamples20211220", was created in the geodatabase for all land use types as follows: Neighbourhoods Insitutional Natural Areas Core Employment Areas Mixed Use Areas Apartment Neighbourhoods Parks Roads Utility Corridors Other Open Spaces General Employment Areas Regeneration Areas Lettering (not a land use type, but an image colour (black), used to label streets). By identifying the letters, it then made the reclassification and vectorization results easier to clean up of unnecessary clutter caused by the labels of streets. Reclassification Once the training samples were created and saved, the raster was then reclassified using the Image Classification Wizard tool in ArcGIS Pro, using the Support...
f
Data from: LFQProfiler and RNPxl: Open-Source Tools for Label-Free...
figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Veit; Timo Sachsenberg; Aleksandar Chernev; Fabian Aicheler; Henning Urlaub; Oliver Kohlbacher (2023). LFQProfiler and RNPxl: Open-Source Tools for Label-Free Quantification and Protein–RNA Cross-Linking Integrated into Proteome Discoverer [Dataset]. http://doi.org/10.1021/acs.jproteome.6b00407.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.6b00407.s002
Dataset updated
Jun 1, 2023
Dataset provided by
ACS Publications
Authors
Johannes Veit; Timo Sachsenberg; Aleksandar Chernev; Fabian Aicheler; Henning Urlaub; Oliver Kohlbacher
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Modern mass spectrometry setups used in today’s proteomics studies generate vast amounts of raw data, calling for highly efficient data processing and analysis tools. Software for analyzing these data is either monolithic (easy to use, but sometimes too rigid) or workflow-driven (easy to customize, but sometimes complex). Thermo Proteome Discoverer (PD) is a powerful software for workflow-driven data analysis in proteomics which, in our eyes, achieves a good trade-off between flexibility and usability. Here, we present two open-source plugins for PD providing additional functionality: LFQProfiler for label-free quantification of peptides and proteins, and RNPxl for UV-induced peptide–RNA cross-linking data analysis. LFQProfiler interacts with existing PD nodes for peptide identification and validation and takes care of the entire quantitative part of the workflow. We show that it performs at least on par with other state-of-the-art software solutions for label-free quantification in a recently published benchmark (Ramus, C.; J. Proteomics 2016, 132, 51–62). The second workflow, RNPxl, represents the first software solution to date for identification of peptide–RNA cross-links including automatic localization of the cross-links at amino acid resolution and localization scoring. It comes with a customized integrated cross-link fragment spectrum viewer for convenient manual inspection and validation of the results.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Market Research Forecast (2024). Data Annotation Tool Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-annotation-tool-market-10075

Data Annotation Tool Market Report

Explore at:

doc, ppt, pdfAvailable download formats

Dataset updated

Dec 9, 2024

Dataset authored and provided by

Market Research Forecast

License

https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

Time period covered

2025 - 2033

Area covered

Global

Variables measured

Market Size

Description

The size of the Data Annotation Tool Market market was valued at USD 3.9 USD billion in 2023 and is projected to reach USD 6.64 USD billion by 2032, with an expected CAGR of 7.9% during the forecast period. A Data Annotation Tool is defined as the software that can be employed to make annotations to data hence helping a learning computer model learn patterns. These tools provide a way of segregating the data types to include images, texts, and audio, as well as videos. Some of the subcategories of annotation include images such as bounding boxes, segmentation, text such as entity recognition, sentiment analysis, audio such as transcription, sound labeling, and video such as object tracking. Other common features depend on the case but they commonly consist of interfaces, cooperation with others, suggestion of labels, and quality assurance. It can be used in the automotive industry (object detection for self-driving cars), text processing (classification of text), healthcare (medical imaging), and retail (recommendation). These tools get applied in training good quality, accurately labeled data sets for the engineering of efficient AI systems. Key drivers for this market are: Increasing Adoption of Cloud-based Managed Services to Drive Market Growth. Potential restraints include: Adverse Health Effect May Hamper Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.

Clear search

Close search

Google apps

Main menu

Camera	FOV [°]	Resolution	Images
Front-Medium	61 × 39	1920 × 1200	25,158
Front-Tele	31 × 20	1920 × 1200	5,109
Front-Wide	106 × 92	2592 × 2048	2,777

Data Annotation Tool Market Report

Data from: X-ray CT data with semantic annotations for the paper "A workflow...

DataSheet1_Benchmarking automated cell type annotation tools for single-cell...

Global Full-Size Tray Automatic Labeling Equipment Market Industry Best...

Personal Protective Equipment Dataset (PPED)

2017 Census of Agriculture - Census Data Query Tool (CDQT)

Bitext-retail-ecommerce-llm-chatbot-training-dataset

Global Medical Image Annotation Market Industry Best Practices 2025-2032

Replication Data for: Improving Objective Wound Assessment: "Fully-automated...

The ATLAS of Traffic Lights

This is an older version: Please use the newest available version.

Changelog:

ATLAS

Directory Format:

Annotation Format:

Terms and Conditions

Data from: Protein-Labeling Reagents Selectively Activated by Copper(I)

Toronto Land Use Spatial Data - parcel-level - (2019-2021)

Data from: LFQProfiler and RNPxl: Open-Source Tools for Label-Free...

Data Annotation Tool Market Report