38 datasets found
  1. 4

    Biased bird image dataset for the evaluation of explainability methods...

    • data.4tu.nl
    zip
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agathe Balayn (2022). Biased bird image dataset for the evaluation of explainability methods applied to computer vision (deep-learning-based) models [Dataset]. http://doi.org/10.4121/7337b632-c7f5-4b6e-9d0e-820658e3cd4b.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 13, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Agathe Balayn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets of images, their ground truth, their saliency map for one model, and the manual annotations of the saliency maps with semantic concepts of various granularities, representing 10 species of birds. The images were selected in order to inject class-specific biases in the dataset.

  2. d

    Social B(eye)as over Time (SBT) Dataset

    • dataone.org
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pınar Barlas; Maximilian Krahn; Styliani Kleanthous; Kyriakos Kyriakou; Jahna Otterbacher (2023). Social B(eye)as over Time (SBT) Dataset [Dataset]. http://doi.org/10.7910/DVN/CFHZS3
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Pınar Barlas; Maximilian Krahn; Styliani Kleanthous; Kyriakos Kyriakou; Jahna Otterbacher
    Description

    Many eyes have scrutinized the social behaviors of computer vision services, given their popularity with researchers and developers. When analyzing images depicting people, their descriptions often reflect social inequalities and stereotypes, yet the proprietary nature of these services mean that it is difficult to anticipate or explain their behaviors. Mechanisms providing oversight of these processes can enable more responsible use, allowing stakeholders to audit their behaviors and track potential changes over time. Previously, in 2019, we audited image tagging algorithms for social bias when processing images of people. In this work, we i) present data from an audit of the same services three years later, with ii) additional outputs for input images depicting other racial/ethnic groups and iii) a toolkit enabling several fully-automated analyses on the algorithms' behaviors across time.

  3. ObjectNET [7 of 10]

    • kaggle.com
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darien Schettler (2022). ObjectNET [7 of 10] [Dataset]. https://www.kaggle.com/datasets/dschettler8845/objectnet-7-of-10/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Darien Schettler
    Description

    NOTE: BY USING THIS DATASET YOU ACKNOWLEDGE THAT YOU HAVE READ THE LICENSE AND WILL ABIDE BY THE TERMS THEREWITHIN

    THE LICENSE

    ObjectNet is free to use for both research and commercial
    applications. The authors own the source images and allow their use
    under a license derived from Creative Commons Attribution 4.0 with
    two additional clauses:
    
    1. ObjectNet may never be used to tune the parameters of any
      model. This includes, but is not limited to, computing statistics
      on ObjectNet and including those statistics into a model,
      fine-tuning on ObjectNet, performing gradient updates on any
      parameters based on these images.
    
    2. Any individual images from ObjectNet may only be posted to the web
      including their 1 pixel red border.
    
    If you post this archive in a public location, please leave the password
    intact as "objectnetisatestset".
    
    [Other General License Information Conforms to Attribution 4.0 International]
    


    ⚠️🛑⚠️ ⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️

    IMPORTANT NOTE ––– THIS DATASET IS ONLY FOR VALIDATION/TESTING * YOU CANNOT USE IT TO TRAIN MODELS IN ANY WAY * IF YOU TRAIN A MODEL WITH IT YOU ARE VIOLATING THE LICENSE AGREEMENT * IF YOU POST IMAGES FROM THIS DATASET ANYWHERE YOU MUST ADD A RED BORDER TO THE IMAGE * IF YOU POST IMAGES WITHOUT THE BORDER YOU ARE VIOLATING THE LICENSE AGREEMENT

    ⚠️🛑⚠️ ⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️⚠️🛑⚠️



    This is Part 7 of 10 * Original Paper Link * ObjectNet Website


    The links to the various parts of the dataset are:



    Description From ObjectNET Homepage



    What is ObjectNet?

    • A new kind of vision dataset borrowing the idea of controls from other areas of science.
    • No training set, only a test set! Put your vision system through its paces.
    • Collected to intentionally show objects from new viewpoints on new backgrounds.
    • 50,000 image test set, same as ImageNet, with controls for rotation, background, and viewpoint.
    • 313 object classes with 113 overlapping ImageNet
    • Large performance drop, what you can expect from vision systems in the real world!
    • Robust to fine-tuning and a very difficult transfer learning problem


    Controls For Biases Increase Variation


    https://objectnet.dev/images/objectnet_controls_table.png">



    Easy For Humans, Hard For Machines

    • Ready to help develop the next generation of object recognition algorithms that have robustness, bias, and safety in mind.
    • Controls can remove bias from other datasets machine learning, not just vision.


    https://objectnet.dev/images/objectnet_results.png">



    Full Description

    ObjectNet is a large real-world test set for object recognition with control where object backgrounds, rotations, and imaging viewpoints are random.

    Most scientific experiments have controls, confounds which are removed from the data, to ensure that subjects cannot perform a task by exploiting trivial correlations in the data. Historically, large machine learning and computer vision datasets have lacked such controls. This has resulted in models that must be fine-tuned for new datasets and perform better on datasets than in real-world applications. When tested on ObjectNet, object detectors show a 40-45% drop in performance, with respect to their performance on other benchmarks, due to the controls for biases. Controls make ObjectNet robust to fine-tuning showing only small performance increases.

    We develop a highly automated platform that enables gathering datasets with controls by crowdsourcing image capturing and annotation. ObjectNet is the same size as the ImageNet test set (50,000 images), and by design does not come paired with a training set in order to encourage generaliz...

  4. f

    Geographically Diverse Dataset

    • figshare.com
    zip
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Mandal; Susan Leavy; Suzanne Little (2024). Geographically Diverse Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27263091.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    figshare
    Authors
    Abhishek Mandal; Susan Leavy; Suzanne Little
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Geographically diverse dataset introduced in the paper Dataset Diversity: Measuring and Mitigating Geographical Bias in Image Search and Retrieval. https://doi.org/10.1145/3475731.3484956

  5. P

    XImageNet-12 Dataset

    • paperswithcode.com
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiang Li; Dan Zhang; Shengzhao Lei; Xun Zhao; Porawit Kamnoedboon; Weiwei Li; Junhao Dong; Shuyan Li (2023). XImageNet-12 Dataset [Dataset]. https://paperswithcode.com/dataset/ximagenet-12
    Explore at:
    Dataset updated
    Oct 11, 2023
    Authors
    Qiang Li; Dan Zhang; Shengzhao Lei; Xun Zhao; Porawit Kamnoedboon; Weiwei Li; Junhao Dong; Shuyan Li
    Description

    Enlarge the dataset to understand how image background effect the Computer Vision ML model. With the following topics: Blur Background / Segmented Background / AI generated Background/ Bias of tools during annotation/ Color in Background / Dependent Factor in Background/ LatenSpace Distance of Foreground/ Random Background with Real Environment!

    We introduce XIMAGENET-12, an explainable benchmark dataset with over 200K images and 15,600 manual semantic annotations. Covering 12 categories from ImageNet to represent objects commonly encountered in practical life and simulating six diverse scenarios, including overexposure, blurring, color changing, etc.,

    Our research builds upon the foundation laid by "Noise or Signal: The Role of Image Backgrounds in Object Recognition" (Xiao et al., ICLR 2022), "Explainable AI: Object Recognition With Help From Background" (Qiang et al., ICLR Workshop 2022), reinforced the notion that models trained solely on backgrounds can substantially improve accuracy. One noteworthy discovery highlighted in their studies is that more accurate models tend to rely less on backgrounds.

  6. d

    Emotion Bias Dataset (EBD)

    • search.dataone.org
    Updated Nov 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kyriakou, Kyriakos; Kleanthous, Styliani; Otterbarcher, Jahna; Papadopoulos, George (2023). Emotion Bias Dataset (EBD) [Dataset]. http://doi.org/10.7910/DVN/8MW0RA
    Explore at:
    Dataset updated
    Nov 12, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Kyriakou, Kyriakos; Kleanthous, Styliani; Otterbarcher, Jahna; Papadopoulos, George
    Description

    Vision-based cognitive services (CogS) have become crucial in a wide range of applications, from real-time security and social networks to smartphone applications. Many services focus on analyzing people images. When it comes to facial analysis, these services can be misleading or even inaccurate, raising ethical concerns such as the amplification of social stereotypes. We analyzed popular Image Tagging CogS that infer emotion from a person’s face, considering whether they perpetuate racial and gender stereotypes concerning emotion. By comparing both CogS and Human-generated descriptions on a set of controlled images, we highlight the need for transparency and fairness in CogS. In particular, we document evidence that CogS may actually be more likely than crowdworkers to perpetuate the stereotype of the “angry black man" and often attribute black race individuals with “emotions of hostility". This dataset consists of the raw data collected for this work, both from Emotion Analysis Services (EAS) and Crowdsourcing (Crowdworkers from the Appen (formerly known as FigureEight) Platform targeting US and India participants. We’ve used the Chicago Face Database (CFD) as our primary dataset for testing the behavior of the target EAS.

  7. C

    Computer Vision Development Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Computer Vision Development Report [Dataset]. https://www.archivemarketresearch.com/reports/computer-vision-development-55505
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Mar 10, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Computer Vision Development market is experiencing robust growth, driven by increasing adoption across diverse sectors. While the exact market size for 2025 is not provided, considering a plausible CAGR of 20% (a reasonable estimate given the rapid technological advancements in AI and its applications) and assuming a 2024 market size of $15 billion (a conservative estimate based on industry reports), the market size in 2025 could be estimated at approximately $18 billion. This substantial growth is projected to continue throughout the forecast period (2025-2033), with the market potentially exceeding $50 billion by 2033, driven by factors such as the increasing availability of large datasets for training algorithms, advancements in deep learning techniques, and the rising demand for automation across industries. Key application segments fueling this expansion include mobile internet, security, finance, retail, and healthcare, with unmanned systems and education also showing significant potential. The market is largely shaped by the competition between leading companies like Face++, SenseTime, YITU, CloudWalk, and Deepblue, each striving for market share through innovative product development and strategic partnerships. The market segmentation reveals a dynamic landscape with various approaches to computer vision development, including SDKs, APIs, and other custom solutions. This diversity caters to different needs and technical capabilities across a broad range of applications. While geographical distribution varies, North America and Asia-Pacific are expected to remain dominant regions due to their mature technological infrastructure and robust investments in AI research and development. Restraints to market growth may include data privacy concerns, ethical considerations surrounding AI bias, and the high cost of developing and deploying sophisticated computer vision systems. However, ongoing technological advancements and increasing governmental support for AI initiatives are expected to mitigate these challenges and propel continued expansion in the computer vision development market.

  8. o

    Social B(eye)as Dataset

    • explore.openaire.eu
    Updated Jan 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinar Barlas; Kyriakos Kyriakou; Styliani Kleanthous; Jahna Otterbacher (2019). Social B(eye)as Dataset [Dataset]. http://doi.org/10.7910/dvn/apzkss
    Explore at:
    Dataset updated
    Jan 1, 2019
    Authors
    Pinar Barlas; Kyriakos Kyriakou; Styliani Kleanthous; Jahna Otterbacher
    Description

    Image analysis algorithms have become an indispensable tool in our information ecosystem, facilitating new forms of visual communication and information sharing. At the same time, they enable large-scale socio-technical research which would otherwise be difficult to carry out. However, their outputs may exhibit social bias, especially when analyzing people images. Since most algorithms are proprietary and opaque, we pro-pose a method of auditing their outputs for social biases. To be able to compare how algorithms interpret a controlled set of people images, we collected descriptions across six image tagging APIs. In order to com-pare these results to human behavior, we also collected descriptions on the same images from crowdworkers in two anglophone regions. While the APIs do not output explicitly offensive descriptions, as humans do, future work should consider if and how they reinforce social inequalities in implicit ways. Beyond computer vision auditing, the dataset of human- and machine-produced tags, and the typology of tags, can be used to explore a range of research questions related to both algorithmic and human behaviors.

  9. s

    Data and codes for "Disentangling Multi-view Representations Beyond...

    • researchdata.smu.edu.sg
    zip
    Updated Oct 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guanzhou KE; Yang YU; Guoqing CHAO; Xiaoli WANG; Chenyang XU; Shengfeng He (2023). Data and codes for "Disentangling Multi-view Representations Beyond Inductive Bias" [Dataset]. http://doi.org/10.25440/smu.24249238.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 6, 2023
    Dataset provided by
    SMU Research Data Repository (RDR)
    Authors
    Guanzhou KE; Yang YU; Guoqing CHAO; Xiaoli WANG; Chenyang XU; Shengfeng He
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This record contains the data and codes for this paper:Guanzhou Ke, Yang Yu, Guoqing Chao, Xiaoli Wang, Chenyang Xu, and Shengfeng He. 2023. "Disentangling Multi-view Representations Beyond Inductive Bias." In Proceedings of the 31st ACM International Conference on Multimedia (MM '23), October 29–November 3, 2023, Ottawa, ON, Canada. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3581783.3611794dmrib-weights is the file for pre-trained weights. DMRIB-main is a copy of the project's GitHub Repository at https://github.com/Guanzhou-Ke/DMRIBThe official repos for ""Disentangling Multi-view Representations Beyond Inductive Bias"" (DMRIB)Status: Accepted in ACM MM 2023.Training stepWe show that how DMRIB train on the EdgeMnist dataset.Before the training step, you need to set the CUDA_VISIBLE_DEVICES, because of the faiss will use all gpu. It means that it will cause some error if you using tensor.to() to set a specific device.set environment.export CUDA_VISIBLE_DEVICES=0train the pretext model. First, we need to run the pretext training script src/train_pretext.py. We use simclr-style to training a self-supervised learning model to mine neighbors information. The pretext config commonly put at configs/pretext. You just need to run the following command in you terminal:python train_pretext.py -f ./configs/pretext/pretext_EdgeMnist.yamltrain the self-label clustering model. Then, we could use the pretext model to training clustering model via src/train_scan.py.python train_scan.py -f ./configs/scan/scan_EdgeMnist.yamlAfter that, we use the fine-tune script to train clustering model scr/train_selflabel.py.python train_selflabel.py -f ./configs/scan/selflabel_EdgeMnist.yamltraining the view-specific encoder and disentangled. Finally, we could set the self-label clustering model as the consisten encoder. And train the second stage via src/train_dmrib.py.python train_dmrib.py -f ./configs/dmrib/dmrib_EdgeMnist.yamlValidationNote: you can find the pre-train weights in the file dmrib-weights. And put the pretrained models into the following folders path to/{config.train.log_dir}/{results}/{config.dataset.name}/eid-{config.experiment_id}/dmrib/final_model.pth, respectively. For example, if you try to validate the EdgeMnist dataset, the default folder is ./experiments/results/EdgeMnist/eid-0/dmrib. And then, put the pretrained model edge-mnist.pth into this folder and rename it to final_model.pth.If you do not want to use the default setting, you have to modify the line 58 of the validate.py.python validate.py -f ./configs/dmrib/dmrib_EdgeMnist.yamlCreditThanks: Van Gansbeke, Wouter, et al. "Scan: Learning to classify images without labels." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X. Cham: Springer International Publishing, 2020.CitationGuanzhou Ke, Yang Yu, Guoqing Chao, Xiaoli Wang, Chenyang Xu,and Shengfeng He. 2023. Disentangling Multi-view Representations Be-yond Inductive Bias. In Proceedings of the 31st ACM International Conferenceon Multimedia (MM ’23), October 29–November 3, 2023, Ottawa, ON, Canada.ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3581783.3611794

  10. m

    Multi-Source Dental X-Ray Dataset Using Image-to-Image Transformation

    • data.mendeley.com
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Al Rafi Aurnob (2025). Multi-Source Dental X-Ray Dataset Using Image-to-Image Transformation [Dataset]. http://doi.org/10.17632/cgwnxmdp3b.1
    Explore at:
    Dataset updated
    Apr 2, 2025
    Authors
    Al Rafi Aurnob
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Teeth View X-ray Image Dataset is a collection of dental X-ray images gathered from different dental clinics. It is designed for machine learning tasks such as object detection. The dataset is organized into one main folder: the object detection dataset. The object detection folder contains 1,674 augmented images with corresponding labels in JSON format. By applying a diffusion model image to image, we have generated additional X-ray images of teeth scans to enhance the Teeth View X-ray Image Dataset to be deeper, more diverse, and beneficial for model tuning. The synthetic image contains various anatomy, density, and acquiring conditions of the tooth and thereby enhances the model's robustness for unusual dental conditions. The increased size of the dataset works to counterbalance class imbalance by having more examples of the under-represented classes and reducing model bias. Also, diffusion models demonstrate high effectiveness in reproducing noise patterns, making training more robust; improving the clarity of images through denoising processes; and enabling highly precise segmentation mask predictions to facilitate efficient boundary detection.

    Variables: BDC-BDR Teeth -208 Caries Teeth - 342 Fractured Teeth -52 Healthy Teeth - 893 Impacted Teeth - 348 Inflection Teeth - 92

    Folder structure:

    Teeth view X-ray Image dataset | |---Objectect Detection | |----image |----Labels.

  11. Evaluating reproducibility of AI algorithms in digital pathology with DAPPER...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Bizzego; Nicole Bussola; Marco Chierici; Valerio Maggio; Margherita Francescatto; Luca Cima; Marco Cristoforetti; Giuseppe Jurman; Cesare Furlanello (2023). Evaluating reproducibility of AI algorithms in digital pathology with DAPPER [Dataset]. http://doi.org/10.1371/journal.pcbi.1006269
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Andrea Bizzego; Nicole Bussola; Marco Chierici; Valerio Maggio; Margherita Francescatto; Luca Cima; Marco Cristoforetti; Giuseppe Jurman; Cesare Furlanello
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artificial Intelligence is exponentially increasing its impact on healthcare. As deep learning is mastering computer vision tasks, its application to digital pathology is natural, with the promise of aiding in routine reporting and standardizing results across trials. Deep learning features inferred from digital pathology scans can improve validity and robustness of current clinico-pathological features, up to identifying novel histological patterns, e.g., from tumor infiltrating lymphocytes. In this study, we examine the issue of evaluating accuracy of predictive models from deep learning features in digital pathology, as an hallmark of reproducibility. We introduce the DAPPER framework for validation based on a rigorous Data Analysis Plan derived from the FDA’s MAQC project, designed to analyze causes of variability in predictive biomarkers. We apply the framework on models that identify tissue of origin on 787 Whole Slide Images from the Genotype-Tissue Expression (GTEx) project. We test three different deep learning architectures (VGG, ResNet, Inception) as feature extractors and three classifiers (a fully connected multilayer, Support Vector Machine and Random Forests) and work with four datasets (5, 10, 20 or 30 classes), for a total of 53, 000 tiles at 512 × 512 resolution. We analyze accuracy and feature stability of the machine learning classifiers, also demonstrating the need for diagnostic tests (e.g., random labels) to identify selection bias and risks for reproducibility. Further, we use the deep features from the VGG model from GTEx on the KIMIA24 dataset for identification of slide of origin (24 classes) to train a classifier on 1, 060 annotated tiles and validated on 265 unseen ones. The DAPPER software, including its deep learning pipeline and the Histological Imaging—Newsy Tiles (HINT) benchmark dataset derived from GTEx, is released as a basis for standardization and validation initiatives in AI for digital pathology.

  12. V

    Visual Question Answering Technology Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Visual Question Answering Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/visual-question-answering-technology-58031
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Visual Question Answering (VQA) technology market is experiencing robust growth, driven by increasing adoption across diverse industries and advancements in artificial intelligence (AI). While precise market size figures for 2025 aren't provided, considering similar AI segments showing compound annual growth rates (CAGR) between 20% and 30%, and given the substantial investment and progress in VQA, a reasonable estimate for the 2025 market size would be around $1.5 billion. Projecting a conservative CAGR of 25% for the forecast period (2025-2033), the market is poised to reach approximately $12 billion by 2033. This significant expansion is fueled by several key drivers: the rising demand for automated image analysis in various sectors (software, computer, and electronics industries), improvements in deep learning algorithms enhancing VQA accuracy and efficiency, and the increasing availability of large, labeled image datasets for model training. The segmentation into image identification and classification, along with diverse application areas, further contributes to the market's dynamism. However, challenges remain, including the need for robust data annotation processes, the computational cost of training complex models, and concerns surrounding data privacy and bias in algorithms. The market's geographical distribution is expected to be fairly diverse, with North America and Asia Pacific leading the charge initially, followed by Europe and other regions. Key players like Toshiba, Amazon Science, and Cognex are actively contributing to technological advancements and market penetration. The continued development of more accurate and efficient VQA systems, alongside the expansion into new application domains like healthcare, autonomous vehicles, and robotics, will play a crucial role in shaping the future of this rapidly evolving market. The current focus on improving the robustness of VQA systems against adversarial attacks and addressing ethical concerns around bias and fairness will further define the market landscape in the coming years.

  13. I

    Image Recognition Market Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Image Recognition Market Report [Dataset]. https://www.marketreportanalytics.com/reports/image-recognition-market-10656
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Mar 18, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The image recognition market is experiencing robust growth, projected to reach a value of $52.77 billion in 2025, expanding at a Compound Annual Growth Rate (CAGR) of 25.49%. This significant expansion is driven by several key factors. The increasing adoption of artificial intelligence (AI) and machine learning (ML) across diverse sectors fuels demand for sophisticated image recognition technologies. The rise of e-commerce and the need for efficient product categorization and search functionalities are major contributors. Furthermore, advancements in computer vision algorithms and the availability of powerful, cost-effective processing capabilities are enabling wider deployment across industries. The media and entertainment sector leverages image recognition for content analysis, personalization, and copyright protection. Retail and e-commerce utilize it for visual search, inventory management, and personalized recommendations. The BFSI sector employs it for fraud detection and enhanced security measures, while IT and telecom leverage it for network optimization and customer service improvements. Looking ahead, several trends will shape the market's trajectory. The increasing integration of image recognition into Internet of Things (IoT) devices will lead to a proliferation of applications in smart homes, smart cities, and industrial automation. The development of more accurate and efficient algorithms, particularly for handling complex images and diverse lighting conditions, will further expand the market's potential. However, challenges remain, including concerns around data privacy and security, the need for robust data annotation for model training, and the potential for algorithmic bias. Overcoming these hurdles will be crucial for continued market expansion. The competitive landscape is dynamic, with established technology companies and specialized startups vying for market share. Strategic partnerships, acquisitions, and continuous innovation in algorithm development and hardware capabilities will be key to success in this rapidly evolving market.

  14. Artificial Intelligence (AI) Training Dataset Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Artificial Intelligence (AI) Training Dataset Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/artificial-intelligence-training-dataset-market-global-industry-analysis
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Artificial Intelligence (AI) Training Dataset Market Outlook



    According to our latest research, the global Artificial Intelligence (AI) Training Dataset market size reached USD 3.15 billion in 2024, reflecting robust industry momentum. The market is expanding at a notable CAGR of 20.8% and is forecasted to attain USD 20.92 billion by 2033. This impressive growth is primarily attributed to the surging demand for high-quality, annotated datasets to fuel machine learning and deep learning models across diverse industry verticals. The proliferation of AI-driven applications, coupled with rapid advancements in data labeling technologies, is further accelerating the adoption and expansion of the AI training dataset market globally.




    One of the most significant growth factors propelling the AI training dataset market is the exponential rise in data-driven AI applications across industries such as healthcare, automotive, retail, and finance. As organizations increasingly rely on AI-powered solutions for automation, predictive analytics, and personalized customer experiences, the need for large, diverse, and accurately labeled datasets has become critical. Enhanced data annotation techniques, including manual, semi-automated, and fully automated methods, are enabling organizations to generate high-quality datasets at scale, which is essential for training sophisticated AI models. The integration of AI in edge devices, smart sensors, and IoT platforms is further amplifying the demand for specialized datasets tailored for unique use cases, thereby fueling market growth.




    Another key driver is the ongoing innovation in machine learning and deep learning algorithms, which require vast and varied training data to achieve optimal performance. The increasing complexity of AI models, especially in areas such as computer vision, natural language processing, and autonomous systems, necessitates the availability of comprehensive datasets that accurately represent real-world scenarios. Companies are investing heavily in data collection, annotation, and curation services to ensure their AI solutions can generalize effectively and deliver reliable outcomes. Additionally, the rise of synthetic data generation and data augmentation techniques is helping address challenges related to data scarcity, privacy, and bias, further supporting the expansion of the AI training dataset market.




    The market is also benefiting from the growing emphasis on ethical AI and regulatory compliance, particularly in data-sensitive sectors like healthcare, finance, and government. Organizations are prioritizing the use of high-quality, unbiased, and diverse datasets to mitigate algorithmic bias and ensure transparency in AI decision-making processes. This focus on responsible AI development is driving demand for curated datasets that adhere to strict quality and privacy standards. Moreover, the emergence of data marketplaces and collaborative data-sharing initiatives is making it easier for organizations to access and exchange valuable training data, fostering innovation and accelerating AI adoption across multiple domains.




    From a regional perspective, North America currently dominates the AI training dataset market, accounting for the largest revenue share in 2024, driven by significant investments in AI research, a mature technology ecosystem, and the presence of leading AI companies and data annotation service providers. Europe and Asia Pacific are also witnessing rapid growth, with increasing government support for AI initiatives, expanding digital infrastructure, and a rising number of AI startups. While North America sets the pace in terms of technological innovation, Asia Pacific is expected to exhibit the highest CAGR during the forecast period, fueled by the digital transformation of emerging economies and the proliferation of AI applications across various industry sectors.





    Data Type Analysis



    The AI training dataset market is segmented by data type into Text, Image/Video, Audio, and Others, each playing a crucial role in powering different AI applications. Text da

  15. US Deep Learning Market Analysis, Size, and Forecast 2025-2029

    • technavio.com
    Updated Mar 24, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2017). US Deep Learning Market Analysis, Size, and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/us-deep-learning-market-industry-analysis
    Explore at:
    Dataset updated
    Mar 24, 2017
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States
    Description

    Snapshot img

    US Deep Learning Market Size 2025-2029

    The deep learning market size in US is forecast to increase by USD 5.02 billion at a CAGR of 30.1% between 2024 and 2029.

    The deep learning market is experiencing robust growth, driven by the increasing adoption of artificial intelligence (AI) in various industries for advanced solutioning. This trend is fueled by the availability of vast amounts of data, which is a key requirement for deep learning algorithms to function effectively. Industry-specific solutions are gaining traction, as businesses seek to leverage deep learning for specific use cases such as image and speech recognition, fraud detection, and predictive maintenance. Alongside, intuitive data visualization tools are simplifying complex neural network outputs, helping stakeholders understand and validate insights. 
    
    
    However, challenges remain, including the need for powerful computing resources, data privacy concerns, and the high cost of implementing and maintaining deep learning systems. Despite these hurdles, the market's potential for innovation and disruption is immense, making it an exciting space for businesses to explore further. Semi-supervised learning, data labeling, and data cleaning facilitate efficient training of deep learning models. Cloud analytics is another significant trend, as companies seek to leverage cloud computing for cost savings and scalability. 
    

    What will be the Size of the market During the Forecast Period?

    Request Free Sample

    Deep learning, a subset of machine learning, continues to shape industries by enabling advanced applications such as image and speech recognition, text generation, and pattern recognition. Reinforcement learning, a type of deep learning, gains traction, with deep reinforcement learning leading the charge. Anomaly detection, a crucial application of unsupervised learning, safeguards systems against security vulnerabilities. Ethical implications and fairness considerations are increasingly important in deep learning, with emphasis on explainable AI and model interpretability. Graph neural networks and attention mechanisms enhance data preprocessing for sequential data modeling and object detection. Time series forecasting and dataset creation further expand deep learning's reach, while privacy preservation and bias mitigation ensure responsible use.

    In summary, deep learning's market dynamics reflect a constant pursuit of innovation, efficiency, and ethical considerations. The Deep Learning Market in the US is flourishing as organizations embrace intelligent systems powered by supervised learning and emerging self-supervised learning techniques. These methods refine predictive capabilities and reduce reliance on labeled data, boosting scalability. BFSI firms utilize AI image recognition for various applications, including personalizing customer communication, maintaining a competitive edge, and automating repetitive tasks to boost productivity. Sophisticated feature extraction algorithms now enable models to isolate patterns with high precision, particularly in applications such as image classification for healthcare, security, and retail.

    How is this market segmented and which is the largest segment?

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Application
    
      Image recognition
      Voice recognition
      Video surveillance and diagnostics
      Data mining
    
    
    Type
    
      Software
      Services
      Hardware
    
    
    End-user
    
      Security
      Automotive
      Healthcare
      Retail and commerce
      Others
    
    
    Geography
    
      North America
    
        US
    

    By Application Insights

    The Image recognition segment is estimated to witness significant growth during the forecast period. In the realm of artificial intelligence (AI) and machine learning, image recognition, a subset of computer vision, is gaining significant traction. This technology utilizes neural networks, deep learning models, and various machine learning algorithms to decipher visual data from images and videos. Image recognition is instrumental in numerous applications, including visual search, product recommendations, and inventory management. Consumers can take photographs of products to discover similar items, enhancing the online shopping experience. In the automotive sector, image recognition is indispensable for advanced driver assistance systems (ADAS) and autonomous vehicles, enabling the identification of pedestrians, other vehicles, road signs, and lane markings.

    Furthermore, image recognition plays a pivotal role in augmented reality (AR) and virtual reality (VR) applications, where it tracks physical objects and overlays digital content onto real-world scenarios. The model training process involves the backpropagation algorithm, which calculates

  16. V

    Visual Question Answering Technology Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Visual Question Answering Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/visual-question-answering-technology-58313
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 15, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Visual Question Answering (VQA) technology market is experiencing robust growth, driven by increasing demand for advanced image analysis and AI-powered solutions across diverse industries. The market, estimated at $2 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033. This significant growth is fueled by several key factors. The proliferation of big data and the advancements in deep learning algorithms are enabling more accurate and efficient VQA systems. Furthermore, the rising adoption of VQA in sectors such as healthcare (for medical image analysis), retail (for enhanced customer experience), and autonomous vehicles (for scene understanding) is significantly boosting market expansion. The increasing availability of powerful cloud computing resources further facilitates the development and deployment of complex VQA models. While challenges such as data bias and the need for robust annotation techniques remain, the overall market outlook for VQA technology is extremely positive. Segmentation analysis reveals strong growth across various application areas. The software industry currently leads in VQA adoption, followed by the computer and electronics industries. Within the technology itself, image classification and image identification are the dominant segments, indicating a strong focus on practical applications. Geographically, North America and Europe currently hold the largest market shares, but the Asia-Pacific region is expected to witness substantial growth in the coming years, driven by increasing investments in AI and technological advancements in countries like China and India. Key players like Toshiba Corporation, Amazon Science, and Cognex are actively contributing to market growth through continuous innovation and strategic partnerships. The competitive landscape is dynamic, with both established tech giants and emerging startups vying for market share. The long-term outlook suggests that VQA technology will continue to be a critical component of various emerging technologies and will play a pivotal role in shaping the future of artificial intelligence.

  17. f

    Distribution of images over classes in the Children Trajectory dataset.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shady Elbassuoni; Hala Ghattas; Jalila El Ati; Yorgo Zoughby; Aline Semaan; Christelle Akl; Tarek Trabelsi; Reem Talhouk; Houda Ben Gharbia; Zoulfikar Shmayssani; Aya Mourad (2023). Distribution of images over classes in the Children Trajectory dataset. [Dataset]. http://doi.org/10.1371/journal.pdig.0000211.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS Digital Health
    Authors
    Shady Elbassuoni; Hala Ghattas; Jalila El Ati; Yorgo Zoughby; Aline Semaan; Christelle Akl; Tarek Trabelsi; Reem Talhouk; Houda Ben Gharbia; Zoulfikar Shmayssani; Aya Mourad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Distribution of images over classes in the Children Trajectory dataset.

  18. Performance of DAPPER framework for VGG backend network, and classifier...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Bizzego; Nicole Bussola; Marco Chierici; Valerio Maggio; Margherita Francescatto; Luca Cima; Marco Cristoforetti; Giuseppe Jurman; Cesare Furlanello (2023). Performance of DAPPER framework for VGG backend network, and classifier heads (FCH, SVM, RF) on KIMIA24 dataset. [Dataset]. http://doi.org/10.1371/journal.pcbi.1006269.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Andrea Bizzego; Nicole Bussola; Marco Chierici; Valerio Maggio; Margherita Francescatto; Luca Cima; Marco Cristoforetti; Giuseppe Jurman; Cesare Furlanello
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The average cross validation MCC (K24-MCCt), and ACC (K24-ACCt) with 95% CI, as well as MCC (K24-MCCv), and ACC (K24-ACCv) on external validation set are reported.

  19. d

    Investigating human repeatability of a computer vision based task to...

    • search.dataone.org
    • datadryad.org
    • +1more
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edwin W. Harris; Georgina A. Wager; Matt Butler; Joseph M. Mhango; James M. Monaghan; Richard Green (2025). Investigating human repeatability of a computer vision based task to identify meristems on a potato plant (Solanum tuberosum) [Dataset]. http://doi.org/10.5061/dryad.2rbnzs7pz
    Explore at:
    Dataset updated
    May 16, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Edwin W. Harris; Georgina A. Wager; Matt Butler; Joseph M. Mhango; James M. Monaghan; Richard Green
    Time period covered
    Jan 8, 2022
    Description

    Labelled training data in artificial intelligence (AI) is used to teach so-called 'supervised learning models'. However, such data may contain error or bias, which can impact model prediction accuracy. Thus, obtaining accurate training data is of high importance. In applications of AI, such as in classification and detection problems, raw training data is not always made available in published research. Likewise, the process of obtaining labelled data is not always documented well enough to enable reproducibility. This training data set captures a repeatability exercise in AI training data collection for a task that is difficult for humans to perform, delineating a bounding box in a two-dimensional image of a growing apical meristem in potato plants.

  20. Multimodal AI Model Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Multimodal AI Model Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, and UK), APAC (China, India, Japan, and South Korea), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/multimodal-ai-model-market-industry-analysis
    Explore at:
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Canada, Germany, United States, Global
    Description

    Snapshot img

    Multimodal AI Model Market Size 2025-2029

    The multimodal AI model market size is forecast to increase by USD 4.23 billion at a CAGR of 34.8% between 2024 and 2029.

    The market is experiencing significant growth due to the surging demand for enhanced contextual understanding and automation. Companies are increasingly investing in multimodal artificial intelligence models to support human-machine interaction through various modes such as speech, text, and visual data. This shift toward natively multimodal and real-time interactive systems is transforming industries, from customer service and healthcare to education and entertainment. However, the market faces challenges that require strategic navigation. Prohibitive computational costs and resource scarcity pose significant obstacles to widespread adoption.
    The complexity of managing multiple data integration modalities and ensuring seamless integration adds to the challenges. To capitalize on the market opportunities and navigate these challenges effectively, companies must focus on optimizing computational resources and developing efficient multimodal AI models. By addressing these challenges, organizations can unlock the full potential of multimodal AI models to enhance user experiences, streamline operations, and drive innovation.
    

    What will be the Size of the Multimodal AI Model Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free Sample

    In the dynamic market, human-in-the-loop systems play a crucial role in error analysis, ensuring the accuracy of models through feedback loops and debugging techniques. API integration methods connect application programming interfaces to continuous integration and deployment processes, enhancing model performance and efficiency. Dataset bias detection and computational complexity are critical factors in model development, requiring careful performance benchmarking and user interface design. Ethical considerations and security protocols are integral to model deployment strategies, with model versioning and model performance metrics essential for tracking progress.

    Cloud computing platforms facilitate model training and deployment, while hardware acceleration and runtime optimization optimize computational resources. Deep learning frameworks employ feature extraction techniques, and software development kits streamline development processes. Model monitoring and evaluation metrics provide valuable insights into model behavior, with a data preprocessing pipeline and data annotation tools ensuring high-quality training data. Memory management and model deployment strategies further optimize model performance, making the market a vibrant and evolving landscape for US businesses.

    How is this Multimodal AI Model Industry segmented?

    The multimodal AI model industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    End-user
    
      Finance and BFSI
      Healthcare
      Media and entertainment
      Automotive and transportation
      Education
    
    
    Deployment
    
      Cloud-based
      On premises
    
    
    Business Segment
    
      Large enterprises
      SMEs
    
    
    Technology
    
      Image
      Text
      Video and audio
      Speech and voice
    
    
    Geography
    
      North America
    
        US
        Canada
    
    
      Europe
    
        France
        Germany
        Italy
        UK
    
    
      APAC
    
        China
        India
        Japan
        South Korea
    
    
      Rest of World (ROW)
    

    By End-user Insights

    The Finance and BFSI segment is estimated to witness significant growth during the forecast period. In the financial services industry, multimodal AI models are becoming essential tools for navigating the intricate terrain of risk management, regulation, and evolving customer expectations. The sector's reliance on a multitude of data sources, including quantitative market data, textual news reports, legal documents, audio from customer interactions, and satellite imagery, makes it an optimal domain for multimodal technology application. One of the most significant applications of multimodal AI models is in the realm of risk management and fraud detection. These advanced systems can analyze transactions beyond their numerical value, considering the context of a customer's historical behavior, the text of a contemporaneous support chat, and location data from their device.

    By employing techniques such as embedding vectors, bias mitigation methods, and zero-shot learning in computer vision, neural network architecture, and generative adversarial networks, these models can achieve high levels of accuracy and robustness in real-time. Attention mechanisms and transformer networks enable contextual understanding, while large language models and few-shot learning

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agathe Balayn (2022). Biased bird image dataset for the evaluation of explainability methods applied to computer vision (deep-learning-based) models [Dataset]. http://doi.org/10.4121/7337b632-c7f5-4b6e-9d0e-820658e3cd4b.v1

Biased bird image dataset for the evaluation of explainability methods applied to computer vision (deep-learning-based) models

Explore at:
zipAvailable download formats
Dataset updated
Feb 13, 2022
Dataset provided by
4TU.ResearchData
Authors
Agathe Balayn
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Datasets of images, their ground truth, their saliency map for one model, and the manual annotations of the saliency maps with semantic concepts of various granularities, representing 10 species of birds. The images were selected in order to inject class-specific biases in the dataset.

Search
Clear search
Close search
Google apps
Main menu