24 datasets found
  1. O

    Open Source Data Labeling Tool Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Open Source Data Labeling Tool Report [Dataset]. https://www.datainsightsmarket.com/reports/open-source-data-labeling-tool-1421234
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    May 31, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The open-source data labeling tool market is experiencing robust growth, driven by the increasing demand for high-quality training data in various AI applications. The market's expansion is fueled by several key factors: the rising adoption of machine learning and deep learning algorithms across industries, the need for efficient and cost-effective data annotation solutions, and a growing preference for customizable and flexible tools that can adapt to diverse data types and project requirements. While proprietary solutions exist, the open-source ecosystem offers advantages including community support, transparency, cost-effectiveness, and the ability to tailor tools to specific needs, fostering innovation and accessibility. The market is segmented by tool type (image, text, video, audio), deployment model (cloud, on-premise), and industry (automotive, healthcare, finance). We project a market size of approximately $500 million in 2025, with a compound annual growth rate (CAGR) of 25% from 2025 to 2033, reaching approximately $2.7 billion by 2033. This growth is tempered by challenges such as the complexities associated with data security, the need for skilled personnel to manage and use these tools effectively, and the inherent limitations of certain open-source solutions compared to their commercial counterparts. Despite these restraints, the open-source model's inherent flexibility and cost advantages will continue to attract a significant user base. The market's competitive landscape includes established players like Alecion and Appen, alongside numerous smaller companies and open-source communities actively contributing to the development and improvement of these tools. Geographical expansion is expected across North America, Europe, and Asia-Pacific, with the latter projected to witness significant growth due to the increasing adoption of AI and machine learning in developing economies. Future market trends point towards increased integration of automated labeling techniques within open-source tools, enhanced collaborative features to improve efficiency, and further specialization to cater to specific data types and industry-specific requirements. Continuous innovation and community contributions will remain crucial drivers of growth in this dynamic market segment.

  2. O

    Open Source Data Annotation Tool Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Open Source Data Annotation Tool Report [Dataset]. https://www.marketresearchforecast.com/reports/open-source-data-annotation-tool-46961
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Mar 21, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The open-source data annotation tool market is experiencing robust growth, driven by the increasing demand for high-quality training data in artificial intelligence (AI) and machine learning (ML) applications. The market's expansion is fueled by several key factors: the rising adoption of AI across various industries (including automotive, healthcare, and finance), the need for efficient and cost-effective data annotation solutions, and a growing preference for flexible, customizable tools offered by open-source platforms. While cloud-based solutions currently dominate the market due to scalability and accessibility, on-premise deployments remain significant for organizations with stringent data security requirements. The competitive landscape is dynamic, with numerous established players and emerging startups vying for market share. The market is segmented geographically, with North America and Europe currently holding the largest shares due to early adoption of AI technologies and robust research & development activities. However, the Asia-Pacific region is projected to witness significant growth in the coming years, driven by increasing investments in AI infrastructure and talent development. Challenges remain, such as the need for skilled annotators and the ongoing evolution of annotation techniques to handle increasingly complex data types. The forecast period (2025-2033) suggests continued expansion, with a projected Compound Annual Growth Rate (CAGR) – let's conservatively estimate this at 15% based on typical growth in related software sectors. This growth will be influenced by advancements in automation and semi-automated annotation tools, as well as the emergence of novel annotation paradigms. The market is expected to see further consolidation, with larger players potentially acquiring smaller, specialized companies. The increasing focus on data privacy and security will necessitate the development of more robust and compliant open-source annotation tools. Specific application segments like healthcare, with its stringent regulatory landscape, and the automotive industry, with its reliance on autonomous driving technology, will continue to be major drivers of market growth. The increasing availability of open-source datasets and pre-trained models will indirectly contribute to the market’s expansion by lowering the barrier to entry for AI development.

  3. Image Annotation Tool Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Image Annotation Tool Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/image-annotation-tool-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Image Annotation Tool Market Outlook



    The global image annotation tool market size is projected to grow from approximately $700 million in 2023 to an estimated $2.5 billion by 2032, exhibiting a remarkable compound annual growth rate (CAGR) of 15.2% over the forecast period. The surging demand for machine learning and artificial intelligence applications is driving this robust market expansion. Image annotation tools are crucial for training AI models to recognize and interpret images, a necessity across diverse industries.



    One of the key growth factors fueling the image annotation tool market is the rapid adoption of AI and machine learning technologies across various sectors. Organizations in healthcare, automotive, retail, and many other industries are increasingly leveraging AI to enhance operational efficiency, improve customer experiences, and drive innovation. Accurate image annotation is essential for developing sophisticated AI models, thereby boosting the demand for these tools. Additionally, the proliferation of big data analytics and the growing necessity to manage large volumes of unstructured data have amplified the need for efficient image annotation solutions.



    Another significant driver is the increasing use of autonomous systems and applications. In the automotive industry, for instance, the development of autonomous vehicles relies heavily on annotated images to train algorithms for object detection, lane discipline, and navigation. Similarly, in the healthcare sector, annotated medical images are indispensable for developing diagnostic tools and treatment planning systems powered by AI. This widespread application of image annotation tools in the development of autonomous systems is a critical factor propelling market growth.



    The rise of e-commerce and the digital retail landscape has also spurred demand for image annotation tools. Retailers are using these tools to optimize visual search features, personalize shopping experiences, and enhance inventory management through automated recognition of products and categories. Furthermore, advancements in computer vision technology have expanded the capabilities of image annotation tools, making them more accurate and efficient, which in turn encourages their adoption across various industries.



    Data Annotation Software plays a pivotal role in the image annotation tool market by providing the necessary infrastructure for labeling and categorizing images efficiently. These software solutions are designed to handle various annotation tasks, from simple bounding boxes to complex semantic segmentation, enabling organizations to generate high-quality training datasets for AI models. The continuous advancements in data annotation software, including the integration of machine learning algorithms for automated labeling, have significantly enhanced the accuracy and speed of the annotation process. As the demand for AI-driven applications grows, the reliance on robust data annotation software becomes increasingly critical, supporting the development of sophisticated models across industries.



    Regionally, North America holds the largest share of the image annotation tool market, driven by significant investments in AI and machine learning technologies and the presence of leading technology companies. Europe follows, with strong growth supported by government initiatives promoting AI research and development. The Asia Pacific region presents substantial growth opportunities due to the rapid digital transformation in emerging economies and increasing investments in technology infrastructure. Latin America and the Middle East & Africa are also expected to witness steady growth, albeit at a slower pace, due to the gradual adoption of advanced technologies.



    Component Analysis



    The image annotation tool market by component is segmented into software and services. The software segment dominates the market, encompassing a variety of tools designed for different annotation tasks, from simple image labeling to complex polygonal, semantic, or instance segmentation. The continuous evolution of software platforms, integrating advanced features such as automated annotation and machine learning algorithms, has significantly enhanced the accuracy and efficiency of image annotations. Furthermore, the availability of open-source annotation tools has lowered the entry barrier, allowing more organizations to adopt these technologies.



    Services associated with image ann

  4. Data from: X-ray CT data with semantic annotations for the paper "A workflow...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). X-ray CT data with semantic annotations for the paper "A workflow for segmenting soil and plant X-ray CT images with deep learning in Google’s Colaboratory" [Dataset]. https://catalog.data.gov/dataset/x-ray-ct-data-with-semantic-annotations-for-the-paper-a-workflow-for-segmenting-soil-and-p-d195a
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Leaves from genetically unique Juglans regia plants were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA). Soil samples were collected in Fall of 2017 from the riparian oak forest located at the Russell Ranch Sustainable Agricultural Institute at the University of California Davis. The soil was sieved through a 2 mm mesh and was air dried before imaging. A single soil aggregate was scanned at 23 keV using the 10x objective lens with a pixel resolution of 650 nanometers on beamline 8.3.2 at the ALS. Additionally, a drought stressed almond flower bud (Prunus dulcis) from a plant housed at the University of California, Davis, was scanned using a 4x lens with a pixel resolution of 1.72 µm on beamline 8.3.2 at the ALS Raw tomographic image data was reconstructed using TomoPy. Reconstructions were converted to 8-bit tif or png format using ImageJ or the PIL package in Python before further processing. Images were annotated using Intel’s Computer Vision Annotation Tool (CVAT) and ImageJ. Both CVAT and ImageJ are free to use and open source. Leaf images were annotated in following Théroux-Rancourt et al. (2020). Specifically, Hand labeling was done directly in ImageJ by drawing around each tissue; with 5 images annotated per leaf. Care was taken to cover a range of anatomical variation to help improve the generalizability of the models to other leaves. All slices were labeled by Dr. Mina Momayyezi and Fiona Duong.To annotate the flower bud and soil aggregate, images were imported into CVAT. The exterior border of the bud (i.e. bud scales) and flower were annotated in CVAT and exported as masks. Similarly, the exterior of the soil aggregate and particulate organic matter identified by eye were annotated in CVAT and exported as masks. To annotate air spaces in both the bud and soil aggregate, images were imported into ImageJ. A gaussian blur was applied to the image to decrease noise and then the air space was segmented using thresholding. After applying the threshold, the selected air space region was converted to a binary image with white representing the air space and black representing everything else. This binary image was overlaid upon the original image and the air space within the flower bud and aggregate was selected using the “free hand” tool. Air space outside of the region of interest for both image sets was eliminated. The quality of the air space annotation was then visually inspected for accuracy against the underlying original image; incomplete annotations were corrected using the brush or pencil tool to paint missing air space white and incorrectly identified air space black. Once the annotation was satisfactorily corrected, the binary image of the air space was saved. Finally, the annotations of the bud and flower or aggregate and organic matter were opened in ImageJ and the associated air space mask was overlaid on top of them forming a three-layer mask suitable for training the fully convolutional network. All labeling of the soil aggregate and soil aggregate images was done by Dr. Devin Rippner. These images and annotations are for training deep learning models to identify different constituents in leaves, almond buds, and soil aggregates Limitations: For the walnut leaves, some tissues (stomata, etc.) are not labeled and only represent a small portion of a full leaf. Similarly, both the almond bud and the aggregate represent just one single sample of each. The bud tissues are only divided up into buds scales, flower, and air space. Many other tissues remain unlabeled. For the soil aggregate annotated labels are done by eye with no actual chemical information. Therefore particulate organic matter identification may be incorrect. Resources in this dataset:Resource Title: Annotated X-ray CT images and masks of a Forest Soil Aggregate. File Name: forest_soil_images_masks_for_testing_training.zipResource Description: This aggregate was collected from the riparian oak forest at the Russell Ranch Sustainable Agricultural Facility. The aggreagate was scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 0,0,0; pores spaces have a value of 250,250, 250; mineral solids have a value= 128,0,0; and particulate organic matter has a value of = 000,128,000. These files were used for training a model to segment the forest soil aggregate and for testing the accuracy, precision, recall, and f1 score of the model.Resource Title: Annotated X-ray CT images and masks of an Almond bud (P. Dulcis). File Name: Almond_bud_tube_D_P6_training_testing_images_and_masks.zipResource Description: Drought stressed almond flower bud (Prunis dulcis) from a plant housed at the University of California, Davis, was scanned by X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 4x lens with a pixel resolution of 1.72 µm using. For masks, the background has a value of 0,0,0; air spaces have a value of 255,255, 255; bud scales have a value= 128,0,0; and flower tissues have a value of = 000,128,000. These files were used for training a model to segment the almond bud and for testing the accuracy, precision, recall, and f1 score of the model.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads Resource Title: Annotated X-ray CT images and masks of Walnut leaves (J. Regia) . File Name: 6_leaf_training_testing_images_and_masks_for_paper.zipResource Description: Stems were collected from genetically unique J. regia accessions at the 117 USDA-ARS-NCGR in Wolfskill Experimental Orchard, Winters, California USA to use as scion, and were grafted by Sierra Gold Nursery onto a commonly used commercial rootstock, RX1 (J. microcarpa × J. regia). We used a common rootstock to eliminate any own-root effects and to simulate conditions for a commercial walnut orchard setting, where rootstocks are commonly used. The grafted saplings were repotted and transferred to the Armstrong lathe house facility at the University of California, Davis in June 2019, and kept under natural light and temperature. Leaves from each accession and treatment were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 170,170,170; Epidermis value= 85,85,85; Mesophyll value= 0,0,0; Bundle Sheath Extension value= 152,152,152; Vein value= 220,220,220; Air value = 255,255,255.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads

  5. Data Annotation Tools Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Annotation Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-annotation-tools-market-global-geographical-industry-analysis
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Annotation Tools Market Outlook



    According to our latest research, the global Data Annotation Tools market size reached USD 2.1 billion in 2024. The market is set to expand at a robust CAGR of 26.7% from 2025 to 2033, projecting a remarkable value of USD 18.1 billion by 2033. The primary growth driver for this market is the escalating adoption of artificial intelligence (AI) and machine learning (ML) across various industries, which necessitates high-quality labeled data for model training and validation.




    One of the most significant growth factors propelling the data annotation tools market is the exponential rise in AI-powered applications across sectors such as healthcare, automotive, retail, and BFSI. As organizations increasingly integrate AI and ML into their core operations, the demand for accurately annotated data has surged. Data annotation tools play a crucial role in transforming raw, unstructured data into structured, labeled datasets that can be efficiently used to train sophisticated algorithms. The proliferation of deep learning and natural language processing technologies further amplifies the need for comprehensive data labeling solutions. This trend is particularly evident in industries like healthcare, where annotated medical images are vital for diagnostic algorithms, and in automotive, where labeled sensor data supports the evolution of autonomous vehicles.




    Another prominent driver is the shift toward automation and digital transformation, which has accelerated the deployment of data annotation tools. Enterprises are increasingly adopting automated and semi-automated annotation platforms to enhance productivity, reduce manual errors, and streamline the data preparation process. The emergence of cloud-based annotation solutions has also contributed to market growth by enabling remote collaboration, scalability, and integration with advanced AI development pipelines. Furthermore, the growing complexity and variety of data types, including text, audio, image, and video, necessitate versatile annotation tools capable of handling multimodal datasets, thus broadening the market's scope and applications.




    The market is also benefiting from a surge in government and private investments aimed at fostering AI innovation and digital infrastructure. Several governments across North America, Europe, and Asia Pacific have launched initiatives and funding programs to support AI research and development, including the creation of high-quality, annotated datasets. These efforts are complemented by strategic partnerships between technology vendors, research institutions, and enterprises, which are collectively advancing the capabilities of data annotation tools. As regulatory standards for data privacy and security become more stringent, there is an increasing emphasis on secure, compliant annotation solutions, further driving innovation and market demand.




    From a regional perspective, North America currently dominates the data annotation tools market, driven by the presence of major technology companies, well-established AI research ecosystems, and significant investments in digital transformation. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid industrialization, expanding IT infrastructure, and a burgeoning startup ecosystem focused on AI and data science. Europe also holds a substantial market share, supported by robust regulatory frameworks and active participation in AI research. Latin America and the Middle East & Africa are gradually catching up, with increasing adoption in sectors such as retail, automotive, and government. The global landscape is characterized by dynamic regional trends, with each market contributing uniquely to the overall growth trajectory.





    Component Analysis



    The data annotation tools market is segmented by component into software and services, each playing a pivotal role in the market's overall ecosystem. Software solutions form the backbone of the market, providing the technical infrastructure for auto

  6. w

    Global Image Annotation Tool Market Research Report: By Application (Object...

    • wiseguyreports.com
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Image Annotation Tool Market Research Report: By Application (Object Detection and Recognition, Image Classification, Image Segmentation, Image Generation, Image Editing and Enhancement), By End User (Automotive, Healthcare, Retail, Media and Entertainment, Education, Manufacturing), By Deployment Mode (Cloud-Based, On-Premise, Hybrid), By Access Type (Licensed Software, Software as a Service (SaaS), Open Source), By Image Type (2D Images, 3D Images, Medical Images) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/cn/reports/image-annotation-tool-market
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 7, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20234.1(USD Billion)
    MARKET SIZE 20244.6(USD Billion)
    MARKET SIZE 203211.45(USD Billion)
    SEGMENTS COVEREDApplication ,End User ,Deployment Mode ,Access Type ,Image Type ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICSGrowing AI ML and DL adoption Increasing demand for image analysis and object recognition Cloudbased deployment and subscriptionbased pricing models Emergence of semiautomated and automated annotation tools Competitive landscape with established vendors and new entrants
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDTech Mahindra ,Capgemini ,Whizlabs ,Cognizant ,Tata Consultancy Services ,Larsen & Toubro Infotech ,HCL Technologies ,IBM ,Accenture ,Infosys BPM ,Genpact ,Wipro ,Infosys ,DXC Technology
    MARKET FORECAST PERIOD2024 - 2032
    KEY MARKET OPPORTUNITIES1 AI and ML Advancements 2 Growing Big Data Analytics 3 Cloudbased Image Annotation Tools 4 Image Annotation for Medical Imaging 5 Geospatial Image Annotation
    COMPOUND ANNUAL GROWTH RATE (CAGR) 12.08% (2024 - 2032)
  7. Z

    Data Annotation Tools Market - By Annotation Approach (Automated Annotation...

    • zionmarketresearch.com
    pdf
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zion Market Research (2025). Data Annotation Tools Market - By Annotation Approach (Automated Annotation and Manual Annotation), By Data Type (Text, Audio, and Image/Video), By Application (Healthcare, Automotive, IT & Telecom, BFSI, Agriculture, and Retail), and By Region: Industry Perspective, Comprehensive Analysis, and Forecast, 2024 - 2032- [Dataset]. https://www.zionmarketresearch.com/report/data-annotation-tools-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Zion Market Research
    License

    https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy

    Time period covered
    2022 - 2030
    Area covered
    Global
    Description

    Global Data Annotation Tools Market size at US$ 102.38 Billion in 2023, set to reach US$ 908.57 Billion by 2032 at a CAGR of 24.4% from 2024 to 2032.

  8. z

    ImageCLEF 2012 Image annotation and retrieval dataset (MIRFLICKR)

    • zenodo.org
    • explore.openaire.eu
    txt, zip
    Updated May 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bart Thomee; Adrian Popescu; Bart Thomee; Adrian Popescu (2020). ImageCLEF 2012 Image annotation and retrieval dataset (MIRFLICKR) [Dataset]. http://doi.org/10.5281/zenodo.1246796
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    May 22, 2020
    Dataset provided by
    Zenodo
    Authors
    Bart Thomee; Adrian Popescu; Bart Thomee; Adrian Popescu
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    DESCRIPTION
    For this task, we use a subset of the MIRFLICKR (http://mirflickr.liacs.nl) collection. The entire collection contains 1 million images from the social photo sharing website Flickr and was formed by downloading up to a thousand photos per day that were deemed to be the most interesting according to Flickr. All photos in this collection were released by their users under a Creative Commons license, allowing them to be freely used for research purposes. Of the entire collection, 25 thousand images were manually annotated with a limited number of concepts and many of these annotations have been further refined and expanded over the lifetime of the ImageCLEF photo annotation task. This year we used crowd sourcing to annotate all of these 25 thousand images with the concepts.

    On this page we provide you with more information about the textual features, visual features and concept features we supply with each image in the collection we use for this year's task.


    TEXTUAL FEATURES
    All images are accompanied by the following textual features:

    - Flickr user tags
    These are the tags that the users assigned to the photos their uploaded to Flickr. The 'raw' tags are the original tags, while the 'clean' tags are those collapsed to lowercase and condensed to removed spaces.

    - EXIF metadata
    If available, the EXIF metadata contains information about the camera that took the photo and the parameters used. The 'raw' exif is the original camera data, while the 'clean' exif reduces the verbosity.

    - User information and Creative Commons license information
    This contains information about the user that took the photo and the license associated with it.


    VISUAL FEATURES
    Over the previous years of the photo annotation task we noticed that often the same types of visual features are used by the participants, in particular features based on interest points and bag-of-words are popular. To assist you we have extracted several features for you that you may want to use, so you can focus on the concept detection instead. We additionally give you some pointers to easy to use toolkits that will help you extract other features or the same features but with different default settings.

    - SIFT, C-SIFT, RGB-SIFT, OPPONENT-SIFT
    We used the ISIS Color Descriptors (http://www.colordescriptors.com) toolkit to extract these descriptors. This package provides you with many different types of features based on interest points, mostly using SIFT. It furthermore assists you with building codebooks for bag-of-words. The toolkit is available for Windows, Linux and Mac OS X.

    - SURF
    We used the OpenSURF (http://www.chrisevansdev.com/computer-vision-opensurf.html) toolkit to extract this descriptor. The open source code is available in C++, C#, Java and many more languages.

    - TOP-SURF
    We used the TOP-SURF (http://press.liacs.nl/researchdownloads/topsurf) toolkit to extract this descriptor, which represents images with SURF-based bag-of-words. The website provides codebooks of several different sizes that were created using a combination of images from the MIR-FLICKR collection and from the internet. The toolkit also offers the ability to create custom codebooks from your own image collection. The code is open source, written in C++ and available for Windows, Linux and Mac OS X.

    - GIST
    We used the LabelMe (http://labelme.csail.mit.edu) toolkit to extract this descriptor. The MATLAB-based library offers a comprehensive set of tools for annotating images.

    For the interest point-based features above we used a Fast Hessian-based technique to detect the interest points in each image. This detector is built into the OpenSURF library. In comparison with the Hessian-Laplace technique built into the ColorDescriptors toolkit it detects fewer points, resulting in a considerably reduced memory footprint. We therefore also provide you with the interest point locations in each image that the Fast Hessian-based technique detected, so when you would like to recalculate some features you can use them as a starting point for the extraction. The ColorDescriptors toolkit for instance accepts these locations as a separate parameter. Please go to http://www.imageclef.org/2012/photo-flickr/descriptors for more information on the file format of the visual features and how you can extract them yourself if you want to change the default settings.


    CONCEPT FEATURES
    We have solicited the help of workers on the Amazon Mechanical Turk platform to perform the concept annotation for us. To ensure a high standard of annotation we used the CrowdFlower platform that acts as a quality control layer by removing the judgments of workers that fail to annotate properly. We reused several concepts of last year's task and for most of these we annotated the remaining photos of the MIRFLICKR-25K collection that had not yet been used before in the previous task; for some concepts we reannotated all 25,000 images to boost their quality. For the new concepts we naturally had to annotate all of the images.

    - Concepts
    For each concept we indicate in which images it is present. The 'raw' concepts contain the judgments of all annotators for each image, where a '1' means an annotator indicated the concept was present whereas a '0' means the concept was not present, while the 'clean' concepts only contain the images for which the majority of annotators indicated the concept was present. Some images in the raw data for which we reused last year's annotations only have one judgment for a concept, whereas the other images have between three and five judgments; the single judgment does not mean only one annotator looked at it, as it is the result of a majority vote amongst last year's annotators.

    - Annotations
    For each image we indicate which concepts are present, so this is the reverse version of the data above. The 'raw' annotations contain the average agreement of the annotators on the presence of each concept, while the 'clean' annotations only include those for which there was a majority agreement amongst the annotators.

    You will notice that the annotations are not perfect. Especially when the concepts are more subjective or abstract, the annotators tend to disagree more with each other. The raw versions of the concept annotations should help you get an understanding of the exact judgments given by the annotators.

  9. XIMAGENET-12: An Explainable AI Benchmark CVPR2024

    • kaggle.com
    Updated Sep 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anomly (2023). XIMAGENET-12: An Explainable AI Benchmark CVPR2024 [Dataset]. http://doi.org/10.34740/kaggle/ds/3123294
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anomly
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Introduction:

    XimageNet-12https://qiangli.de/imgs/flowchart2%20(1).png">

    🌟 XimageNet-12 🌟

    An Explainable Visual Benchmark Dataset for Robustness Evaluation. A Dataset for Image Background Exploration!

    Blur Background, Segmented Background, AI-generated Background, Bias of Tools During Annotation, Color in Background, Random Background with Real Environment

    +⭐ Follow Authors for project updates.

    Website: XimageNet-12

    Here, we trying to understand how image background effect the Computer Vision ML model, on topics such as Detection and Classification, based on baseline Li et.al work on ICLR 2022: Explainable AI: Object Recognition With Help From Background, we are now trying to enlarge the dataset, and analysis the following topics: Blur Background / Segmented Background / AI generated Background/ Bias of tools during annotation/ Color in Background / Dependent Factor in Background/ LatenSpace Distance of Foreground/ Random Background with Real Environment! Ultimately, we also define the math equation of Robustness Scores! So if you feel interested How would we make it or join this research project? please feel free to collaborate with us!

    In this paper, we propose an explainable visual dataset, XIMAGENET-12, to evaluate the robustness of visual models. XIMAGENET-12 consists of over 200K images with 15,410 manual semantic annotations. Specifically, we deliberately selected 12 categories from ImageNet, representing objects commonly encountered in practical life. To simulate real-world situations, we incorporated six diverse scenarios, such as overexposure, blurring, and color changes, etc. We further develop a quantitative criterion for robustness assessment, allowing for a nuanced understanding of how visual models perform under varying conditions, notably in relation to the background.

    Progress:

    • Blur Background-> Done! You can find the image Generated in the corresponding folder!
    • Segmented Background -> Done! you can download the image and its corresponding transparent mask image!
    • Color in Background->Done!~~ you can now download the image with different background color modified, and play with different color-ed images!
    • Random Background with Real Environment -> Done! you can also find we generated the image with the photographer's real image as a background and removed the original background of the target object, but similar to the style!
    • Bias of tools during annotation->Done! for this one, you won't get a new image, because this is about math and statistics data analysis when different tools and annotators are applied!
    • AI generated Background-> current on progress ( 12 /12) Done!, So basically you can find one sample folder image we uploaded, please take a look at how real it is, and guess what LLM model we are using to generate the high-resolution background to make it so real :)

    What tool we used to generate those images?

    We employed a combination of tools and methodologies to generate the images in this dataset, ensuring both efficiency and quality in the annotation and synthesis processes.

    • IoG Net: Initially, we utilized the IoG Net, which played a foundational role in our image generation pipeline.
    • Polygon Faster Labeling Tool: To facilitate the annotation process, we developed a custom Polygon Faster Labeling Tool, streamlining the labeling of objects within the images.AnyLabeling Open-source Project: We also experimented with the AnyLabeling open-source project, exploring its potential for our annotation needs.
    • V7 Lab Tool: Eventually, we found that the V7 Lab Tool provided the most efficient labeling speed and delivered high-quality annotations. As a result, we standardized the annotation process using this tool.
    • Data Augmentation: For the synthesis of synthetic images, we relied on a combination of deep learning frameworks, including scikit-learn and OpenCV. These tools allowed us to augment and manipulate images effectively to create a diverse range of backgrounds and variations.
    • GenAI: Our dataset includes images generated using the Stable Diffusion XL model, along with versions 1.5 and 2.0 of the Stable Diffusion model. These generative models played a pivotal role in crafting realistic and varied backgrounds.

    For a detailed breakdown of our prompt engineering and hyperparameters, we invite you to consult our upcoming paper. This publication will provide comprehensive insights into our methodologies, enabling a deeper understanding of the image generation process.

    How to use our dataset?

    this dataset has been/could be downloaded via Kaggl...

  10. d

    Annotated fish imagery data for individual and species recognition with deep...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Annotated fish imagery data for individual and species recognition with deep learning [Dataset]. https://catalog.data.gov/dataset/annotated-fish-imagery-data-for-individual-and-species-recognition-with-deep-learning
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    We provide annotated fish imagery data for use in deep learning models (e.g., convolutional neural networks) for individual and species recognition. For individual recognition models, the dataset consists of annotated .json files of individual brook trout imagery collected at the Eastern Ecological Science Center's Experimental Stream Laboratory. For species recognition models, the dataset consists of annotated .json files for 7 freshwater fish species: lake trout, largemouth bass, smallmouth bass, brook trout, rainbow trout, walleye, and northern pike. Species imagery was compiled from Anglers Atlas and modified to remove human faces for privacy protection. We used open-source VGG image annotation software developed by Oxford University: https://www.robots.ox.ac.uk/~vgg/software/via/via-1.0.6.html.

  11. f

    Sidewalk_road_yolo_dataset.7z

    • figshare.com
    7z
    Updated Mar 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manya Gaur (2024). Sidewalk_road_yolo_dataset.7z [Dataset]. http://doi.org/10.6084/m9.figshare.25481659.v1
    Explore at:
    7zAvailable download formats
    Dataset updated
    Mar 26, 2024
    Dataset provided by
    figshare
    Authors
    Manya Gaur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset (Image) Collection:A dataset of sidewalk and road images is downloaded from roboflow, which is an end-to-end computer vision platform that simplifies the process of building computer vision models.The dataset is a semantic segmentation dataset out of which only the images have been used to create a new dataset because the requirement of the project is to obtain sidewalk and road areas in terms of small bounding boxes, rather than overall bounding box.b. Dataset AnnotationAs for reasons stated in previous section, the images obtained have been annotated manually one by one using a tool, labelImg, which is a sophisticated open-source tool crafted for the purpose of annotating objects present in images. It streamlines the task of delineating bounding boxes around these objects and associating them with class labels. Individuals have the capability to sketch these bounding boxes, store annotations in file types such as PASCAL VOC, YOLO, or CreateML, and leverage them for a range of computer vision assignments. This Python-driven tool is adaptable across various platforms and is extensively employed by developers and researchers alike for the annotation of datasets and the training of models.

  12. Annotated Apple Leaf Disease Dataset - Mask RCNN

    • kaggle.com
    Updated Apr 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    alakananda mitra (2023). Annotated Apple Leaf Disease Dataset - Mask RCNN [Dataset]. http://doi.org/10.34740/kaggle/dsv/5295854
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    alakananda mitra
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    About: This dataset contains annotated images of apple leaves with different diseases from the PlantVillage dataset, which can be used for image segmentation-based research. The Mask RCNN annotation style is used. This dataset will aid in accurately locating the damage on the leaves, which is necessary for predicting the severity of the disease.

    Black Rot, Apple Scab, and Cedar Apple Rust are the three diseases that have been annotated. This annotated dataset can be used to find diseases on apple leaves by using image segmentation.

    The data was annotated using MakeSense.AI, an open-source image annotation application. Annotating the damage on the leaves was done with the "Rect" tool. Annotation files are saved in.xml format with the two diagonally positioned corners of the bounding boxes. Various colors are used for various classes when labeling.

    A total of 850 images have been annotated. 680 images are kept in the train folder, and 170 images are in the valid folder. The folder structure is as follows:

    annotated_apple_leaf_disease | |--train|--annots | |--images | |--valid|--annots | |--images

    If you are using this dataset or found any of this information contributing towards your research, please cite:

    1. @InProceedings{MitraMK_MaskRCNN_2022, author="Mitra, Alakananda and Mohanty, Saraju P. and Kougianos, Elias", title={{A Smart Agriculture Framework to Automatically Track the Spread of Plant Diseases Using Mask Region-Based Convolutional Neural Network}}, booktitle="Proceedings of the 5th IFIP International Internet of Things Conference (IFIP-IoT)", year="2022", pages="68--85", doi={10.1007/978-3-031-18872-5_5} }

    2. @InProceedings{MitraMK_aGROdet_2022, author="Mitra, Alakananda and Mohanty, Saraju P. and Kougianos, Elias", title={{aGROdet: A Novel Framework for Plant Disease Detection and Leaf Damage Estimation}}, booktitle="Proceedings of the 5th IFIP International Internet of Things Conference (IFIP-IoT)", year="2022", pages="3--22", doi={10.1007/978-3-031-18872-5_1} }

    3. Original PlantVillage Dataset: @article{DBLP:journals/corr/HughesS15, Author = {David P. Hughes and Marcel Salath{'{e} } }, Title = {An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning and crowdsourcing}, journal = {CoRR}, volume = {abs/1511.08060}, year = {2015},
      url = {http://arxiv.org/abs/1511.08060}, archivePrefix = {arXiv}, eprint = {1511.08060}, timestamp = {Mon, 13 Aug 2018 16:48:21 +0200}, biburl = {https://dblp.org/rec/bib/journals/corr/HughesS15}, bibsource = {dblp computer science bibliography, https://dblp.org} }

  13. Expert and AI-generated annotations of the tissue types for the...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Bridge; Christopher Bridge; G. Thomas Brown; Hyun Jung; Curtis Lisle; Curtis Lisle; David Clunie; David Clunie; David Milewski; Yanling Liu; Jack Collins; Corinne M. Linardic; Douglas S. Hawkins; Rajkumar Venkatramani; Andrey Fedorov; Andrey Fedorov; Javed Khan; G. Thomas Brown; Hyun Jung; David Milewski; Yanling Liu; Jack Collins; Corinne M. Linardic; Douglas S. Hawkins; Rajkumar Venkatramani; Javed Khan (2025). Expert and AI-generated annotations of the tissue types for the RMS-Mutation-Prediction microscopy images [Dataset]. http://doi.org/10.5281/zenodo.14941043
    Explore at:
    binAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christopher Bridge; Christopher Bridge; G. Thomas Brown; Hyun Jung; Curtis Lisle; Curtis Lisle; David Clunie; David Clunie; David Milewski; Yanling Liu; Jack Collins; Corinne M. Linardic; Douglas S. Hawkins; Rajkumar Venkatramani; Andrey Fedorov; Andrey Fedorov; Javed Khan; G. Thomas Brown; Hyun Jung; David Milewski; Yanling Liu; Jack Collins; Corinne M. Linardic; Douglas S. Hawkins; Rajkumar Venkatramani; Javed Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=RMS-Mutation-Prediction-Expert-Annotations.. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    This dataset contains 2 components:

    1. Annotations of multiple regions of interest performed by an expert pathologist with eight years of experience for a subset of hematoxylin and eosin (H&E) stained images from the RMS-Mutation-Prediction image collection [1,2]. Annotations were generated manually, using the Aperio ImageScope tool, to delineate regions of alveolar rhabdomyosarcoma (ARMS), embryonal rhabdomyosarcoma (ERMS), stroma, and necrosis [3]. The resulting planar contour annotations were originally stored in ImageScope-specific XML format, and subsequently converted into Digital Imaging and Communications in Medicine (DICOM) Structured Report (SR) representation using the open source conversion tool [4].
    2. AI-generated annotations stored as probabilistic segmentations.

    WARNING: After the release of IDC v20 (v2 of this data record), it was discovered that a mistake had been made during data conversion that affected the newly-released segmentations accompanying the "RMS-Mutation-Prediction" collection. Segmentations released in v20 for this collection have the segment labels for alveolar rhabdomyosarcoma (ARMS) and embryonal rhabdomyosarcoma (ERMS) switched in the metadata relative to the correct labels. Thus segment 3 in the released files is labelled in the metadata (the SegmentSequence) as ARMS but should correctly be interpreted as ERMS, and conversely segment 4 in the released files is labelled as ERMS but should be correctly interpreted as ARMS. This mistake was fixed in the version v3 of this record (IDC data release v21).

    Many pixels from the whole slide images annotated by this dataset are not contained inside any annotation contours and are considered to belong to the background class. Other pixels are contained inside only one annotation contour and are assigned to a single class. However, cases also exist in this dataset where annotation contours overlap. In these cases, the pixels contained in multiple contours could be assigned membership in multiple classes. One example is a necrotic tissue contour overlapping an internal subregion of an area designated by a larger ARMS or ERMS annotation. The ordering of annotations in this DICOM dataset preserves the order in the original XML generated using ImageScope. These annotations were converted, in sequence, into segmentation masks and used in the training of several machine learning models. Details on the training methods and model results are presented in [1]. In the case of overlapping contours, the order in which annotations are processed may affect the generated segmentation mask if prior contours are overwritten by later contours in the sequence. It is up to the application consuming this data to decide how to interpret tissues regions annotated with multiple classes. The annotations included in this dataset are available for visualization and exploration from the National Cancer Institute Imaging Data Commons (IDC) [5] (also see IDC Portal at https://imaging.datacommons.cancer.gov) as of data release v18. Direct link to open the collection in IDC Portal: https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=RMS-Mutation-Prediction-Expert-Annotations.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, pan_cancer_nuclei_seg_dicom-collection_id-idc_v19-aws.s5cmd corresponds to the annotations for th eimages in the collection_id collection introduced in IDC data release v19. DICOM Binary segmentations were introduced in IDC v20. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    For each of the collections, the following manifest files are provided:

    1. rms_mutation_prediction_expert_annotations-idc_v20-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. rms_mutation_prediction_expert_annotations-idc_v20-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. rms_mutation_prediction_expert_annotations-idc_v20-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    If you use the files referenced in the attached manifests, we ask you to cite this dataset, as well as the publication describing the original dataset [2] and publication acknowledging IDC [5].

    References

    [1] D. Milewski et al., "Predicting molecular subtype and survival of rhabdomyosarcoma patients using deep learning of H&E images: A report from the Children's Oncology Group," Clin. Cancer Res., vol. 29, no. 2, pp. 364–378, Jan. 2023, doi: 10.1158/1078-0432.CCR-22-1663.

    [2] Clunie, D., Khan, J., Milewski, D., Jung, H., Bowen, J., Lisle, C., Brown, T., Liu, Y., Collins, J., Linardic, C. M., Hawkins, D. S., Venkatramani, R., Clifford, W., Pot, D., Wagner, U., Farahani, K., Kim, E., & Fedorov, A. (2023). DICOM converted whole slide hematoxylin and eosin images of rhabdomyosarcoma from Children's Oncology Group trials [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8225132

    [3] Agaram NP. Evolving classification of rhabdomyosarcoma. Histopathology. 2022 Jan;80(1):98-108. doi: 10.1111/his.14449. PMID: 34958505; PMCID: PMC9425116,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9425116/

    [4] Chris Bridge. (2024). ImagingDataCommons/idc-sm-annotations-conversion: v1.0.0 (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.10632182

    [5] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

  14. f

    Data_Sheet_1_mEMbrain: an interactive deep learning MATLAB tool for...

    • figshare.com
    • frontiersin.figshare.com
    pdf
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elisa C. Pavarino; Emma Yang; Nagaraju Dhanyasi; Mona D. Wang; Flavie Bidel; Xiaotang Lu; Fuming Yang; Core Francisco Park; Mukesh Bangalore Renuka; Brandon Drescher; Aravinthan D. T. Samuel; Binyamin Hochner; Paul S. Katz; Mei Zhen; Jeff W. Lichtman; Yaron Meirovitch (2023). Data_Sheet_1_mEMbrain: an interactive deep learning MATLAB tool for connectomic segmentation on commodity desktops.PDF [Dataset]. http://doi.org/10.3389/fncir.2023.952921.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Frontiers
    Authors
    Elisa C. Pavarino; Emma Yang; Nagaraju Dhanyasi; Mona D. Wang; Flavie Bidel; Xiaotang Lu; Fuming Yang; Core Francisco Park; Mukesh Bangalore Renuka; Brandon Drescher; Aravinthan D. T. Samuel; Binyamin Hochner; Paul S. Katz; Mei Zhen; Jeff W. Lichtman; Yaron Meirovitch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Connectomics is fundamental in propelling our understanding of the nervous system's organization, unearthing cells and wiring diagrams reconstructed from volume electron microscopy (EM) datasets. Such reconstructions, on the one hand, have benefited from ever more precise automatic segmentation methods, which leverage sophisticated deep learning architectures and advanced machine learning algorithms. On the other hand, the field of neuroscience at large, and of image processing in particular, has manifested a need for user-friendly and open source tools which enable the community to carry out advanced analyses. In line with this second vein, here we propose mEMbrain, an interactive MATLAB-based software which wraps algorithms and functions that enable labeling and segmentation of electron microscopy datasets in a user-friendly user interface compatible with Linux and Windows. Through its integration as an API to the volume annotation and segmentation tool VAST, mEMbrain encompasses functions for ground truth generation, image preprocessing, training of deep neural networks, and on-the-fly predictions for proofreading and evaluation. The final goals of our tool are to expedite manual labeling efforts and to harness MATLAB users with an array of semi-automatic approaches for instance segmentation. We tested our tool on a variety of datasets that span different species at various scales, regions of the nervous system and developmental stages. To further expedite research in connectomics, we provide an EM resource of ground truth annotation from four different animals and five datasets, amounting to around 180 h of expert annotations, yielding more than 1.2 GB of annotated EM images. In addition, we provide a set of four pre-trained networks for said datasets. All tools are available from https://lichtman.rc.fas.harvard.edu/mEMbrain/. With our software, our hope is to provide a solution for lab-based neural reconstructions which does not require coding by the user, thus paving the way to affordable connectomics.

  15. f

    Data_Sheet_2_Using Convolutional Neural Networks to Efficiently Extract...

    • frontiersin.figshare.com
    txt
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachel A. Reeb; Naeem Aziz; Samuel M. Lapp; Justin Kitzes; J. Mason Heberling; Sara E. Kuebbing (2023). Data_Sheet_2_Using Convolutional Neural Networks to Efficiently Extract Immense Phenological Data From Community Science Images.CSV [Dataset]. http://doi.org/10.3389/fpls.2021.787407.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Frontiers
    Authors
    Rachel A. Reeb; Naeem Aziz; Samuel M. Lapp; Justin Kitzes; J. Mason Heberling; Sara E. Kuebbing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Community science image libraries offer a massive, but largely untapped, source of observational data for phenological research. The iNaturalist platform offers a particularly rich archive, containing more than 49 million verifiable, georeferenced, open access images, encompassing seven continents and over 278,000 species. A critical limitation preventing scientists from taking full advantage of this rich data source is labor. Each image must be manually inspected and categorized by phenophase, which is both time-intensive and costly. Consequently, researchers may only be able to use a subset of the total number of images available in the database. While iNaturalist has the potential to yield enough data for high-resolution and spatially extensive studies, it requires more efficient tools for phenological data extraction. A promising solution is automation of the image annotation process using deep learning. Recent innovations in deep learning have made these open-source tools accessible to a general research audience. However, it is unknown whether deep learning tools can accurately and efficiently annotate phenophases in community science images. Here, we train a convolutional neural network (CNN) to annotate images of Alliaria petiolata into distinct phenophases from iNaturalist and compare the performance of the model with non-expert human annotators. We demonstrate that researchers can successfully employ deep learning techniques to extract phenological information from community science images. A CNN classified two-stage phenology (flowering and non-flowering) with 95.9% accuracy and classified four-stage phenology (vegetative, budding, flowering, and fruiting) with 86.4% accuracy. The overall accuracy of the CNN did not differ from humans (p = 0.383), although performance varied across phenophases. We found that a primary challenge of using deep learning for image annotation was not related to the model itself, but instead in the quality of the community science images. Up to 4% of A. petiolata images in iNaturalist were taken from an improper distance, were physically manipulated, or were digitally altered, which limited both human and machine annotators in accurately classifying phenology. Thus, we provide a list of photography guidelines that could be included in community science platforms to inform community scientists in the best practices for creating images that facilitate phenological analysis.

  16. f

    Table_1_Using Convolutional Neural Networks to Efficiently Extract Immense...

    • figshare.com
    docx
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachel A. Reeb; Naeem Aziz; Samuel M. Lapp; Justin Kitzes; J. Mason Heberling; Sara E. Kuebbing (2023). Table_1_Using Convolutional Neural Networks to Efficiently Extract Immense Phenological Data From Community Science Images.docx [Dataset]. http://doi.org/10.3389/fpls.2021.787407.s003
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Rachel A. Reeb; Naeem Aziz; Samuel M. Lapp; Justin Kitzes; J. Mason Heberling; Sara E. Kuebbing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Community science image libraries offer a massive, but largely untapped, source of observational data for phenological research. The iNaturalist platform offers a particularly rich archive, containing more than 49 million verifiable, georeferenced, open access images, encompassing seven continents and over 278,000 species. A critical limitation preventing scientists from taking full advantage of this rich data source is labor. Each image must be manually inspected and categorized by phenophase, which is both time-intensive and costly. Consequently, researchers may only be able to use a subset of the total number of images available in the database. While iNaturalist has the potential to yield enough data for high-resolution and spatially extensive studies, it requires more efficient tools for phenological data extraction. A promising solution is automation of the image annotation process using deep learning. Recent innovations in deep learning have made these open-source tools accessible to a general research audience. However, it is unknown whether deep learning tools can accurately and efficiently annotate phenophases in community science images. Here, we train a convolutional neural network (CNN) to annotate images of Alliaria petiolata into distinct phenophases from iNaturalist and compare the performance of the model with non-expert human annotators. We demonstrate that researchers can successfully employ deep learning techniques to extract phenological information from community science images. A CNN classified two-stage phenology (flowering and non-flowering) with 95.9% accuracy and classified four-stage phenology (vegetative, budding, flowering, and fruiting) with 86.4% accuracy. The overall accuracy of the CNN did not differ from humans (p = 0.383), although performance varied across phenophases. We found that a primary challenge of using deep learning for image annotation was not related to the model itself, but instead in the quality of the community science images. Up to 4% of A. petiolata images in iNaturalist were taken from an improper distance, were physically manipulated, or were digitally altered, which limited both human and machine annotators in accurately classifying phenology. Thus, we provide a list of photography guidelines that could be included in community science platforms to inform community scientists in the best practices for creating images that facilitate phenological analysis.

  17. P

    UTFPR-SBD3 Dataset

    • paperswithcode.com
    Updated Oct 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). UTFPR-SBD3 Dataset [Dataset]. https://paperswithcode.com/dataset/utfpr-sbd3
    Explore at:
    Dataset updated
    Oct 12, 2020
    Description

    The semantic segmentation of clothes is a challenging task due to the wide variety of clothing styles, layers and shapes. The UTFPR-SBD3 contains 4,500 images manually annotated at pixel level in 18 classes plus background. To ensure the high quality of the dataset, all images were manually annotated at the pixel level using JS Segment Annotator, 2 a free web-based image annotation tool. The raw images were carefully selected to avoid, as far as possible, classes with low number of instances.

  18. d

    Pixta AI | Imagery Data | Global | 10,000 Stock Images | Annotation and...

    • datarade.ai
    .json, .xml, .csv
    Updated Nov 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pixta AI (2022). Pixta AI | Imagery Data | Global | 10,000 Stock Images | Annotation and Labelling Services Provided | Human Face and Emotion Dataset for AI & ML [Dataset]. https://datarade.ai/data-products/human-emotions-datasets-for-ai-ml-model-pixta-ai
    Explore at:
    .json, .xml, .csvAvailable download formats
    Dataset updated
    Nov 14, 2022
    Dataset authored and provided by
    Pixta AI
    Area covered
    United Kingdom, Philippines, Czech Republic, Hong Kong, United States of America, New Zealand, Canada, Italy, India, Malaysia
    Description
    1. Overview This dataset is a collection of 6,000+ images of mixed race human face with various expressions & emotions that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organisations to enable their creative and machine learning projects.

    2. The data set This dataset contains 6,000+ images of face emotion. Each data set is supported by both AI and human review process to ensure labelling consistency and accuracy. Contact us for more custom datasets.

    3. About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands. Visit us at https://www.pixta.ai/ or contact via our email contact@pixta.ai."

  19. R

    E Waste Dataset

    • universe.roboflow.com
    zip
    Updated Jun 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electronic Waste Detection (2024). E Waste Dataset [Dataset]. https://universe.roboflow.com/electronic-waste-detection/e-waste-dataset-r0ojc
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 11, 2024
    Dataset authored and provided by
    Electronic Waste Detection
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Electronic Waste Bounding Boxes
    Description

    Overview

    The goal of this project was to create a structured dataset which can be used to train computer vision models to detect electronic waste devices, i.e., e-waste or Waste Electrical and Electronic Equipment (WEEE). Due to the often-subjective differences between e-waste and functioning electronic devices, a model trained on this dataset could also be used to detect electronic devices in general. However, it must be noted that for the purposes of e-waste recognition, this dataset does not differentiate between different brands or models of the same type of electronic devices, e.g. smartphones, and it also includes images of damaged equipment.

    The structure of this dataset is based on the UNU-KEYS classification Wang et al., 2012, Forti et al., 2018. Each class in this dataset has a tag containing its corresponding UNU-KEY. This dataset structure has the following benefits: 1. It allows the user to easily classify e-waste devices regardless of which e-waste definition their country or organization uses, thanks to the correlation between the UNU-KEYS and other classifications such as the HS-codes or the EU-6 categories, defined in the WEEE directive; 2. It helps dataset contributors focus on adding e-waste devices with higher priority compared to arbitrarily chosen devices. This is because electronic devices in the same UNU-KEY category have similar function, average weight and life-time distribution as well as comparable material composition, both in terms of hazardous substances and valuable materials, and related end-of-life attributes Forti et al., 2018. 3. It gives dataset contributors a clear goal of which electronic devices still need to be added and a clear understanding of their progress in the seemingly endless task of creating an e-waste dataset.

    This dataset contains annotated images of e-waste from every UNU-KEY category. According to Forti et al., 2018, there are a total of 54 UNU-KEY e-waste categories.

    Description of Classes

    At the time of writing, 22. Apr. 2024, the dataset has 19613 annotated images and 77 classes. The dataset has mixed bounding-box and polygon annotations. Each class of the dataset represents one type of electronic device. Different models of the same type of device belong to the same class. For example, different brands of smartphones are labelled as "Smartphone", regardless of their make or model. Many classes can belong to the same UNU-KEY category and therefore have the same tag. For example, the classes "Smartphone" and "Bar-Phone" both belong to the UNU-KEY category "0306 - Mobile Phones". The images in the dataset are anonymized, meaning that no people were annotated and images containing visible faces were removed.

    The dataset was almost entirely built by cloning annotated images from the following open-source Roboflow datasets: [1]-[91]. Some of the images in the dataset were acquired from the Wikimedia Commons website. Those images were chosen to have an unrestrictive license, i.e., they belong to the public domain. They were manually annotated and added to the dataset.

    Cite This Project

    This work was done as part of the PhD of Dimitar Iliev, student at the Faculty of German Engineering and Industrial Management at the Technical University of Sofia, Bulgaria and in collaboration with the Faculty of Computer Science at Otto-von-Guericke-University Magdeburg, Germany.

    If you use this dataset in a research paper, please cite it using the following BibTeX: @article{iliev2024EwasteDataset, author = "Iliev, Dimitar and Marinov, Marin and Ortmeier, Frank", title = "A proposal for a new e-waste image dataset based on the unu-keys classification", journal = "XXIII-rd International Symposium on Electrical Apparatus and Technologies SIELA 2024", year = 2024, volume = "23", number = "to appear", pages = {to appear} note = {under submission} }

    Contribution Guidelines

    Image Collection

    1. Choose a specific electronic device type to add to the dataset and find its corresponding UNU-KEY. * The chosen type of device should have a characteristic design which an object detection model can learn. For example, CRT monitors look distinctly different than flat panel monitors and should therefore belong to a different class, regardless that they are both monitors. In contrast, LED monitors and LCD monitors look very similar and are therefore both labelled as Flat-Panel-Monitor in this dataset.
    2. Collect images of this type of device. * Take note of the license of those images and their author/s to avoid copyright infringement. * Do not collect images with visible faces to protect personal data and comply w
  20. d

    Data from: Bi-channel image registration and deep-learning segmentation...

    • search.dataone.org
    • zenodo.org
    • +1more
    Updated May 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peng Fei (2025). Bi-channel image registration and deep-learning segmentation (BIRDS) for efficient, versatile 3D mapping of mouse brain [Dataset]. http://doi.org/10.5061/dryad.qnk98sffp
    Explore at:
    Dataset updated
    May 18, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Peng Fei
    Time period covered
    Jan 28, 2021
    Description

    We have developed an open-source software called BIRDS (bi-channel image registration and deep-learning segmentation) for the mapping and analysis of 3D microscopy data and applied this to the mouse brain. The BIRDS pipeline includes image pre-processing, bi-channel registration, automatic annotation, creation of a 3D digital frame, high-resolution visualization, and expandable quantitative analysis. This new bi-channel registration algorithm is adaptive to various types of whole-brain data from different microscopy platforms and shows dramatically improved registration accuracy. Additionally, as this platform combines registration with neural networks, its improved function relative to other platforms lies in the fact that the registration procedure can readily provide training data for network construction, while the trained neural network can efficiently segment incomplete/defective brain data that is otherwise difficult to register. Our software is thus optimized to enable either mi...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data Insights Market (2025). Open Source Data Labeling Tool Report [Dataset]. https://www.datainsightsmarket.com/reports/open-source-data-labeling-tool-1421234

Open Source Data Labeling Tool Report

Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
May 31, 2025
Dataset authored and provided by
Data Insights Market
License

https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

The open-source data labeling tool market is experiencing robust growth, driven by the increasing demand for high-quality training data in various AI applications. The market's expansion is fueled by several key factors: the rising adoption of machine learning and deep learning algorithms across industries, the need for efficient and cost-effective data annotation solutions, and a growing preference for customizable and flexible tools that can adapt to diverse data types and project requirements. While proprietary solutions exist, the open-source ecosystem offers advantages including community support, transparency, cost-effectiveness, and the ability to tailor tools to specific needs, fostering innovation and accessibility. The market is segmented by tool type (image, text, video, audio), deployment model (cloud, on-premise), and industry (automotive, healthcare, finance). We project a market size of approximately $500 million in 2025, with a compound annual growth rate (CAGR) of 25% from 2025 to 2033, reaching approximately $2.7 billion by 2033. This growth is tempered by challenges such as the complexities associated with data security, the need for skilled personnel to manage and use these tools effectively, and the inherent limitations of certain open-source solutions compared to their commercial counterparts. Despite these restraints, the open-source model's inherent flexibility and cost advantages will continue to attract a significant user base. The market's competitive landscape includes established players like Alecion and Appen, alongside numerous smaller companies and open-source communities actively contributing to the development and improvement of these tools. Geographical expansion is expected across North America, Europe, and Asia-Pacific, with the latter projected to witness significant growth due to the increasing adoption of AI and machine learning in developing economies. Future market trends point towards increased integration of automated labeling techniques within open-source tools, enhanced collaborative features to improve efficiency, and further specialization to cater to specific data types and industry-specific requirements. Continuous innovation and community contributions will remain crucial drivers of growth in this dynamic market segment.

Search
Clear search
Close search
Google apps
Main menu