100+ datasets found
  1. Opinion on mitigating AI data bias in healthcare worldwide 2024

    • statista.com
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Opinion on mitigating AI data bias in healthcare worldwide 2024 [Dataset]. https://www.statista.com/statistics/1559311/ways-to-mitigate-ai-bias-in-healthcare-worldwide/
    Explore at:
    Dataset updated
    Jul 18, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2023 - Mar 2024
    Area covered
    Worldwide
    Description

    According to a survey of healthcare leaders carried out globally in 2024, almost half of respondents believed that by making AI more transparent and interpretable, this would mitigate the risk of data bias in AI applications for healthcare. Furthermore, ** percent of healthcare leaders thought there should be continuous training and education in AI.

  2. Bias in Advertising Data

    • kaggle.com
    zip
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bahraleloom Mahjoub Alsadeg Abdalrahem (2024). Bias in Advertising Data [Dataset]. https://www.kaggle.com/datasets/bahraleloom/bias-in-advertising-data
    Explore at:
    zip(18491738 bytes)Available download formats
    Dataset updated
    Apr 6, 2024
    Authors
    Bahraleloom Mahjoub Alsadeg Abdalrahem
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    To demonstrate discovery, measurement, and mitigation of bias in advertising, we provide a dataset that contains synthetic generated data for users who were shown a certain advertisement (ad). Each instance of the dataset is specific to a user and has feature attributes such as gender, age, income, political/religious affiliation, parental status, home ownership, area (rural/urban), and education status. In addition to the features we also provide information on whether users actually clicked on or were predicted to click on the ad. Clicking on the ad is known as conversion, and the three outcome variables included are: (1) The predicted probability of conversion, (2) Predicted conversion (binary 0/1) which is obtained by thresholding the predicted probability, (3) True conversion (binary 0/1) that indicates whether the user actually clicked on the ad.

  3. h

    bias-shades

    • huggingface.co
    Updated Feb 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Catalogue Data (2023). bias-shades [Dataset]. https://huggingface.co/datasets/bigscience-catalogue-data/bias-shades
    Explore at:
    Dataset updated
    Feb 22, 2023
    Dataset authored and provided by
    BigScience Catalogue Data
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This is a preliminary version of the bias SHADES dataset for evaluating LMs for social biases.

  4. data bias corr

    • kaggle.com
    zip
    Updated Mar 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tyur muthia (2022). data bias corr [Dataset]. https://www.kaggle.com/datasets/tyurmuthia/data-bias-corr
    Explore at:
    zip(4119 bytes)Available download formats
    Dataset updated
    Mar 11, 2022
    Authors
    tyur muthia
    Description

    Dataset

    This dataset was created by tyur muthia

    Contents

  5. Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at...

    • frontiersin.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee (2023). Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at an Early Stage of COVID-19.docx [Dataset]. http://doi.org/10.3389/fphys.2021.778720.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artificial intelligence (AI) technologies have been applied in various medical domains to predict patient outcomes with high accuracy. As AI becomes more widely adopted, the problem of model bias is increasingly apparent. In this study, we investigate the model bias that can occur when training a model using datasets for only one particular gender and aim to present new insights into the bias issue. For the investigation, we considered an AI model that predicts severity at an early stage based on the medical records of coronavirus disease (COVID-19) patients. For 5,601 confirmed COVID-19 patients, we used 37 medical records, namely, basic patient information, physical index, initial examination findings, clinical findings, comorbidity diseases, and general blood test results at an early stage. To investigate the gender-based AI model bias, we trained and evaluated two separate models—one that was trained using only the male group, and the other using only the female group. When the model trained by the male-group data was applied to the female testing data, the overall accuracy decreased—sensitivity from 0.93 to 0.86, specificity from 0.92 to 0.86, accuracy from 0.92 to 0.86, balanced accuracy from 0.93 to 0.86, and area under the curve (AUC) from 0.97 to 0.94. Similarly, when the model trained by the female-group data was applied to the male testing data, once again, the overall accuracy decreased—sensitivity from 0.97 to 0.90, specificity from 0.96 to 0.91, accuracy from 0.96 to 0.91, balanced accuracy from 0.96 to 0.90, and AUC from 0.97 to 0.95. Furthermore, when we evaluated each gender-dependent model with the test data from the same gender used for training, the resultant accuracy was also lower than that from the unbiased model.

  6. T

    Replication Data for: Cognitive Bias Heterogeneity

    • dataverse.tdl.org
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Molly McNamara; Molly McNamara (2025). Replication Data for: Cognitive Bias Heterogeneity [Dataset]. http://doi.org/10.18738/T8/754FZT
    Explore at:
    text/x-r-notebook(12370), text/x-r-notebook(15773), application/x-rlang-transport(20685), text/x-r-notebook(20656)Available download formats
    Dataset updated
    Aug 15, 2025
    Dataset provided by
    Texas Data Repository
    Authors
    Molly McNamara; Molly McNamara
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data and code can be used to replicate the main analysis for "Who Exhibits Cognitive Biases? Mapping Heterogeneity in Attention, Interpretation, and Rumination in Depression." Of note- to protect this dataset from deidentification consistent with best practices, we have removed the zip code variable and binned age. The analysis code may need to be adjusted slightly to account for this, and the results may very slightly from the ones in the manuscript as a result.

  7. Data bias

    • kaggle.com
    zip
    Updated Mar 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tyur muthia (2022). Data bias [Dataset]. https://www.kaggle.com/datasets/tyurmuthia/data-bias
    Explore at:
    zip(654062 bytes)Available download formats
    Dataset updated
    Mar 11, 2022
    Authors
    tyur muthia
    Description

    Dataset

    This dataset was created by tyur muthia

    Contents

  8. Data and Code for: Confidence, Self-Selection and Bias in the Aggregate

    • openicpsr.org
    delimited
    Updated Mar 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Enke; Thomas Graeber; Ryan Oprea (2023). Data and Code for: Confidence, Self-Selection and Bias in the Aggregate [Dataset]. http://doi.org/10.3886/E185741V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Mar 2, 2023
    Dataset provided by
    American Economic Associationhttp://www.aeaweb.org/
    Authors
    Benjamin Enke; Thomas Graeber; Ryan Oprea
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The influence of behavioral biases on aggregate outcomes depends in part on self-selection: whether rational people opt more strongly into aggregate interactions than biased individuals. In betting market, auction and committee experiments, we document that some errors are strongly reduced through self-selection, while others are not affected at all or even amplified. A large part of this variation is explained by differences in the relationship between confidence and performance. In some tasks, they are positively correlated, such that self-selection attenuates errors. In other tasks, rational and biased people are equally confident, such that self-selection has no effects on aggregate quantities.

  9. H

    Replication data for: Selection Bias in Comparative Research: The Case of...

    • dataverse.harvard.edu
    Updated Mar 8, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Hug (2010). Replication data for: Selection Bias in Comparative Research: The Case of Incomplete Data Sets [Dataset]. http://doi.org/10.7910/DVN/QO28VG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 8, 2010
    Dataset provided by
    Harvard Dataverse
    Authors
    Simon Hug
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Selection bias is an important but often neglected problem in comparative research. While comparative case studies pay some attention to this problem, this is less the case in broader cross-national studies, where this problem may appear through the way the data used are generated. The article discusses three examples: studies of the success of newly formed political parties, research on protest events, and recent work on ethnic conflict. In all cases the data at hand are likely to be afflicted by selection bias. Failing to take into consideration this problem leads to serious biases in the estimation of simple relationships. Empirical examples illustrate a possible solution (a variation of a Tobit model) to the problems in these cases. The article also discusses results of Monte Carlo simulations, illustrating under what conditions the proposed estimation procedures lead to improved results.

  10. h

    news-bias-full-data

    • huggingface.co
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    News Media Biases (2023). news-bias-full-data [Dataset]. https://huggingface.co/datasets/newsmediabias/news-bias-full-data
    Explore at:
    Dataset updated
    Oct 25, 2023
    Dataset authored and provided by
    News Media Biases
    Description

    **Please access the latest verison of data that is here https://huggingface.co/datasets/shainar/BEAD **

    email at shaina.raza@torontomu.ca for usage of data

      Please cite us if you use it
    

    @article{raza2024beads, title={BEADs: Bias Evaluation Across Domains}, author={Raza, Shaina and Rahman, Mizanur and Zhang, Michael R}, journal={arXiv preprint arXiv:2406.04220}, year={2024} }

      license: cc-by-nc-4.0
    

    language: - en pretty_name: Navigating News… See the full description on the dataset page: https://huggingface.co/datasets/newsmediabias/news-bias-full-data.

  11. f

    Data Sheet 1_Biases in AI: acknowledging and addressing the inevitable...

    • frontiersin.figshare.com
    pdf
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bjørn Hofmann (2025). Data Sheet 1_Biases in AI: acknowledging and addressing the inevitable ethical issues.pdf [Dataset]. http://doi.org/10.3389/fdgth.2025.1614105.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 20, 2025
    Dataset provided by
    Frontiers
    Authors
    Bjørn Hofmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Biases in artificial intelligence (AI) systems pose a range of ethical issues. The myriads of biases in AI systems are briefly reviewed and divided in three main categories: input bias, system bias, and application bias. These biases pose a series of basic ethical challenges: injustice, bad output/outcome, loss of autonomy, transformation of basic concepts and values, and erosion of accountability. A review of the many ways to identify, measure, and mitigate these biases reveals commendable efforts to avoid or reduce bias; however, it also highlights the persistence of unresolved biases. Residual and undetected biases present epistemic challenges with substantial ethical implications. The article further investigates whether the general principles, checklists, guidelines, frameworks, or regulations of AI ethics could address the identified ethical issues with bias. Unfortunately, the depth and diversity of these challenges often exceed the capabilities of existing approaches. Consequently, the article suggests that we must acknowledge and accept some residual ethical issues related to biases in AI systems. By utilizing insights from ethics and moral psychology, we can better navigate this landscape. To maximize the benefits and minimize the harms of biases in AI, it is imperative to identify and mitigate existing biases and remain transparent about the consequences of those we cannot eliminate. This necessitates close collaboration between scientists and ethicists.

  12. f

    Data from: Towards Identifying and Reducing the Bias of Disease Information...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 9, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang, Hong-Yan; Sui, Daniel Z.; Wang, Jin-Feng; Huang, Ji-Xia; Xu, Cheng-Dong; Hu, Mao-Gui; Huang, Da-Cang (2016). Towards Identifying and Reducing the Bias of Disease Information Extracted from Search Engine Data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001587385
    Explore at:
    Dataset updated
    Jun 9, 2016
    Authors
    Zhang, Hong-Yan; Sui, Daniel Z.; Wang, Jin-Feng; Huang, Ji-Xia; Xu, Cheng-Dong; Hu, Mao-Gui; Huang, Da-Cang
    Description

    The estimation of disease prevalence in online search engine data (e.g., Google Flu Trends (GFT)) has received a considerable amount of scholarly and public attention in recent years. While the utility of search engine data for disease surveillance has been demonstrated, the scientific community still seeks ways to identify and reduce biases that are embedded in search engine data. The primary goal of this study is to explore new ways of improving the accuracy of disease prevalence estimations by combining traditional disease data with search engine data. A novel method, Biased Sentinel Hospital-based Area Disease Estimation (B-SHADE), is introduced to reduce search engine data bias from a geographical perspective. To monitor search trends on Hand, Foot and Mouth Disease (HFMD) in Guangdong Province, China, we tested our approach by selecting 11 keywords from the Baidu index platform, a Chinese big data analyst similar to GFT. The correlation between the number of real cases and the composite index was 0.8. After decomposing the composite index at the city level, we found that only 10 cities presented a correlation of close to 0.8 or higher. These cities were found to be more stable with respect to search volume, and they were selected as sample cities in order to estimate the search volume of the entire province. After the estimation, the correlation improved from 0.8 to 0.864. After fitting the revised search volume with historical cases, the mean absolute error was 11.19% lower than it was when the original search volume and historical cases were combined. To our knowledge, this is the first study to reduce search engine data bias levels through the use of rigorous spatial sampling strategies.

  13. n

    Data from: Approach-induced biases in human information sampling

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Jan 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laurence T. Hunt; Robb B. Rutledge; W. M. Nishantha Malalasekera; Steven W. Kennerley; Raymond J. Dolan (2017). Approach-induced biases in human information sampling [Dataset]. http://doi.org/10.5061/dryad.nb41c
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 5, 2017
    Dataset provided by
    University College London
    Authors
    Laurence T. Hunt; Robb B. Rutledge; W. M. Nishantha Malalasekera; Steven W. Kennerley; Raymond J. Dolan
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    IInformation sampling is often biased towards seeking evidence that confirms one’s prior beliefs. Despite such biases being a pervasive feature of human behavior, their underlying causes remain unclear. Many accounts of these biases appeal to limitations of human hypothesis testing and cognition, de facto evoking notions of bounded rationality, but neglect more basic aspects of behavioral control. Here, we investigated a potential role for Pavlovian approach in biasing which information humans will choose to sample. We collected a large novel dataset from 32,445 human subjects, making over 3 million decisions, who played a gambling task designed to measure the latent causes and extent of information-sampling biases. We identified three novel approach-related biases, formalized by comparing subject behavior to a dynamic programming model of optimal information gathering. These biases reflected the amount of information sampled (“positive evidence approach”), the selection of which information to sample (“sampling the favorite”), and the interaction between information sampling and subsequent choices (“rejecting unsampled options”). The prevalence of all three biases was related to a Pavlovian approach-avoid parameter quantified within an entirely independent economic decision task. Our large dataset also revealed that individual differences in the amount of information gathered are a stable trait across multiple gameplays and can be related to demographic measures, including age and educational attainment. As well as revealing limitations in cognitive processing, our findings suggest information sampling biases reflect the expression of primitive, yet potentially ecologically adaptive, behavioral repertoires. One such behavior is sampling from options that will eventually be chosen, even when other sources of information are more pertinent for guiding future action.

  14. m

    Data from: Prolific observer bias in the life sciences: why we need blind...

    • figshare.mq.edu.au
    • datasetcatalog.nlm.nih.gov
    • +4more
    bin
    Updated Jun 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions (2023). Data from: Prolific observer bias in the life sciences: why we need blind data recording [Dataset]. http://doi.org/10.5061/dryad.hn40n
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    Macquarie University
    Authors
    Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Observer bias and other “experimenter effects” occur when researchers’ expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work “blind,” meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.

    Usage Notes Evolution literature review dataExact p value datasetjournal_categoriesp values data 24 SeptProportion of significant p values per paperR script to filter and classify the p value dataQuiz answers - guessing effect size from abstractsThe answers provided by the 9 evolutionary biologists to quiz we designed, which aimed to test whether trained specialists are able to infer the relative size/direction of effect size from a paper's title and abstract.readmeDescription of the contents of all the other files in this Dryad submission.R script to statistically analyse the p value dataR script detailing the statistical analyses we performed on the p value datasets.

  15. News Bias Data

    • kaggle.com
    zip
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitish Kumar Thakur (2025). News Bias Data [Dataset]. https://www.kaggle.com/datasets/nitishxthakur/news-bias-data/data
    Explore at:
    zip(367303570 bytes)Available download formats
    Dataset updated
    Apr 8, 2025
    Authors
    Nitish Kumar Thakur
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.

    Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).

    Data Format: The format of data is:

    ID: Numeric unique identifier. Text: Main content. Dimension: Categorical descriptor of the text. Biased_Words: List of words considered biased. Aspect: Specific topic within the text. Label: Neutral, Slightly Biased , Highly Biased

    Annotation Scheme: The annotation scheme is based on Active learning, which is Manual Labeling --> Semi-Supervised Learning --> Human Verifications (iterative process)

    Bias Label: Indicate the presence/absence of bias (e.g., no bias, mild, strong). Words/Phrases Level Biases: Identify specific biased words/phrases. Subjective Bias (Aspect): Capture biases related to content aspects.

  16. H

    Replication Data for: Assessing Political Bias and Value Misalignment in...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jun 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabio Y. S. Motoki; Valdemar Pinho Neto; Victor Rangel (2024). Replication Data for: Assessing Political Bias and Value Misalignment in Generative Artificial Intelligence [Dataset]. http://doi.org/10.7910/DVN/VZRKWP
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Fabio Y. S. Motoki; Valdemar Pinho Neto; Victor Rangel
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Our analysis reveals a concerning misalignment of values between ChatGPT and the average American. We also show that ChatGPT displays political leanings when generating text and images, but the degree and direction of skew depend on the theme. Notably, ChatGPT repeatedly refused to generate content representing certain mainstream perspectives, citing concerns over misinformation and bias. As generative AI systems like ChatGPT become ubiquitous, such misalignment with societal norms poses risks of distorting public discourse. Without proper safeguards, these systems threaten to exacerbate societal divides and depart from principles that underpin free societies.

  17. Z

    Data from: Diversity matters: Robustness of bias measurements in Wikidata

    • data.niaid.nih.gov
    Updated May 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paramita das; Sai Keerthana Karnam; Anirban Panda; Bhanu Prakash Reddy Guda; Soumya Sarkar; Animesh Mukherjee (2023). Diversity matters: Robustness of bias measurements in Wikidata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7881057
    Explore at:
    Dataset updated
    May 1, 2023
    Dataset provided by
    Indian Institute of Technology Kharagpur
    Microsoft Research
    Carnegie Mellon University
    Authors
    Paramita das; Sai Keerthana Karnam; Anirban Panda; Bhanu Prakash Reddy Guda; Soumya Sarkar; Animesh Mukherjee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the widespread use of knowledge graphs (KG) in various automated AI systems and applications, it is very important to ensure that information retrieval algorithms leveraging them are free from societal biases. Previous works have depicted biases that persist in KGs, as well as employed several metrics for measuring the biases. However, such studies lack the systematic exploration of the sensitivity of the bias measurements, through varying sources of data, or the embedding algorithms used. To address this research gap, in this work, we present a holistic analysis of bias measurement on the knowledge graph. First, we attempt to reveal data biases that surface in Wikidata for thirteen different demographics selected from seven continents. Next, we attempt to unfold the variance in the detection of biases by two different knowledge graph embedding algorithms - TransE and ComplEx. We conduct our extensive experiments on a large number of occupations sampled from the thirteen demographics with respect to the sensitive attribute, i.e., gender. Our results show that the inherent data bias that persists in KG can be altered by specific algorithm bias as incorporated by KG embedding learning algorithms. Further, we show that the choice of the state-of-the-art KG embedding algorithm has a strong impact on the ranking of biased occupations irrespective of gender. We observe that the similarity of the biased occupations across demographics is minimal which reflects the socio-cultural differences around the globe. We believe that this full-scale audit of the bias measurement pipeline will raise awareness among the community while deriving insights related to design choices of data and algorithms both and refrain from the popular dogma of ``one-size-fits-all''.

  18. G

    Bias Detection Platform Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Bias Detection Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/bias-detection-platform-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Oct 6, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Bias Detection Platform Market Outlook



    According to our latest research, the global Bias Detection Platform market size reached USD 1.42 billion in 2024, reflecting a surge in demand for advanced, ethical, and transparent decision-making tools across industries. The market is expected to grow at a CAGR of 17.8% during the forecast period, reaching a projected value of USD 6.13 billion by 2033. This robust growth is primarily driven by the increasing adoption of artificial intelligence (AI) and machine learning (ML) technologies, which has highlighted the urgent need for solutions that can identify and mitigate bias in automated systems and data-driven processes. As organizations worldwide strive for fairness, compliance, and inclusivity, bias detection platforms are becoming a cornerstone of responsible digital transformation.




    One of the key growth factors for the Bias Detection Platform market is the rapid integration of AI and ML algorithms into critical business operations. As enterprises leverage these technologies to automate decision-making in areas such as recruitment, financial services, and healthcare, the risk of unintentional bias in algorithms has become a significant concern. Regulatory bodies and industry watchdogs are increasingly mandating transparency and accountability in automated systems, prompting organizations to invest in bias detection platforms to ensure compliance and mitigate reputational risks. Furthermore, the proliferation of big data analytics has amplified the need for robust tools that can scrutinize massive datasets for hidden biases, ensuring that business insights and actions are both accurate and equitable.




    Another major driver fueling market growth is the heightened focus on diversity, equity, and inclusion (DEI) initiatives across both public and private sectors. Organizations are under mounting pressure from stakeholders, including customers, investors, and employees, to demonstrate their commitment to fair and unbiased practices. Bias detection platforms are being deployed to audit hiring processes, marketing campaigns, lending decisions, and other critical workflows, helping organizations identify and rectify discriminatory patterns. The increasing availability of advanced software and services that can seamlessly integrate with existing IT infrastructure is further accelerating adoption, making bias detection accessible to enterprises of all sizes.




    The evolution of regulatory frameworks and ethical standards around AI and data usage is also acting as a catalyst for market expansion. Governments and international bodies are introducing stringent guidelines to govern the ethical use of AI, with a particular emphasis on eliminating bias and ensuring fairness. This regulatory momentum is compelling organizations to adopt proactive measures, including the implementation of bias detection platforms, to avoid legal liabilities and maintain public trust. Additionally, the growing awareness of the social and economic consequences of biased systems is encouraging a broader range of industries to prioritize bias detection as a core component of their risk management and governance strategies.




    From a regional perspective, North America continues to dominate the Bias Detection Platform market, accounting for the largest share of global revenue in 2024. This leadership is attributed to the region’s early adoption of AI technologies, strong regulatory oversight, and a high concentration of technology-driven enterprises. Europe follows closely, benefiting from progressive data protection laws and a robust emphasis on ethical AI. Meanwhile, the Asia Pacific region is emerging as a high-growth market, driven by rapid digitalization, expanding IT infrastructure, and increasing awareness of bias-related challenges in diverse sectors. Latin America and the Middle East & Africa are also witnessing steady growth, supported by rising investments in digital transformation and regulatory advancements.





    Component Analysis



    The Bias Detection Platform market is

  19. D

    Data from: Wide range screening of algorithmic bias in word embedding models...

    • datasetcatalog.nlm.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Apr 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rozado, David (2020). Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types [Dataset]. http://doi.org/10.5061/dryad.rbnzs7h7w
    Explore at:
    Dataset updated
    Apr 7, 2020
    Authors
    Rozado, David
    Description

    Concerns about gender bias in word embedding models have captured substantial attention in the algorithmic bias research literature. Other bias types however have received lesser amounts of scrutiny. This work describes a large-scale analysis of sentiment associations in popular word embedding models along the lines of gender and ethnicity but also along the less frequently studied dimensions of socioeconomic status, age, physical appearance, sexual orientation, religious sentiment and political leanings. Consistent with previous scholarly literature, this work has found systemic bias against given names popular among African-Americans in most embedding models examined. Gender bias in embedding models however appears to be multifaceted and often reversed in polarity to what has been regularly reported. Interestingly, using the common operationalization of the term bias in the fairness literature, novel types of so far unreported bias types in word embedding models have also been identified. Specifically, the popular embedding models analyzed here display negative biases against middle and working-class socioeconomic status, male children, senior citizens, plain physical appearance and intellectual phenomena such as Islamic religious faith, non-religiosity and conservative political orientation. Reasons for the paradoxical underreporting of these bias types in the relevant literature are probably manifold but widely held blind spots when searching for algorithmic bias and a lack of widespread technical jargon to unambiguously describe a variety of algorithmic associations could conceivably be playing a role. The causal origins for the multiplicity of loaded associations attached to distinct demographic groups within embedding models are often unclear but the heterogeneity of said associations and their potential multifactorial roots raises doubts about the validity of grouping them all under the umbrella term bias. Richer and more fine-grained terminology as well as a more comprehensive exploration of the bias landscape could help the fairness epistemic community to characterize and neutralize algorithmic discrimination more efficiently.

  20. h

    Dutch-Government-Data-for-Bias-detection

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Milena, Dutch-Government-Data-for-Bias-detection [Dataset]. https://huggingface.co/datasets/milenamileentje/Dutch-Government-Data-for-Bias-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Milena
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Politics of the Netherlands, Netherlands
    Description

    milenamileentje/Dutch-Government-Data-for-Bias-detection dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Opinion on mitigating AI data bias in healthcare worldwide 2024 [Dataset]. https://www.statista.com/statistics/1559311/ways-to-mitigate-ai-bias-in-healthcare-worldwide/
Organization logo

Opinion on mitigating AI data bias in healthcare worldwide 2024

Explore at:
Dataset updated
Jul 18, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2023 - Mar 2024
Area covered
Worldwide
Description

According to a survey of healthcare leaders carried out globally in 2024, almost half of respondents believed that by making AI more transparent and interpretable, this would mitigate the risk of data bias in AI applications for healthcare. Furthermore, ** percent of healthcare leaders thought there should be continuous training and education in AI.

Search
Clear search
Close search
Google apps
Main menu