Facebook
TwitterAccording to a survey of healthcare leaders carried out globally in 2024, almost half of respondents believed that by making AI more transparent and interpretable, this would mitigate the risk of data bias in AI applications for healthcare. Furthermore, ** percent of healthcare leaders thought there should be continuous training and education in AI.
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
To demonstrate discovery, measurement, and mitigation of bias in advertising, we provide a dataset that contains synthetic generated data for users who were shown a certain advertisement (ad). Each instance of the dataset is specific to a user and has feature attributes such as gender, age, income, political/religious affiliation, parental status, home ownership, area (rural/urban), and education status. In addition to the features we also provide information on whether users actually clicked on or were predicted to click on the ad. Clicking on the ad is known as conversion, and the three outcome variables included are: (1) The predicted probability of conversion, (2) Predicted conversion (binary 0/1) which is obtained by thresholding the predicted probability, (3) True conversion (binary 0/1) that indicates whether the user actually clicked on the ad.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This is a preliminary version of the bias SHADES dataset for evaluating LMs for social biases.
Facebook
TwitterThis dataset was created by tyur muthia
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Artificial intelligence (AI) technologies have been applied in various medical domains to predict patient outcomes with high accuracy. As AI becomes more widely adopted, the problem of model bias is increasingly apparent. In this study, we investigate the model bias that can occur when training a model using datasets for only one particular gender and aim to present new insights into the bias issue. For the investigation, we considered an AI model that predicts severity at an early stage based on the medical records of coronavirus disease (COVID-19) patients. For 5,601 confirmed COVID-19 patients, we used 37 medical records, namely, basic patient information, physical index, initial examination findings, clinical findings, comorbidity diseases, and general blood test results at an early stage. To investigate the gender-based AI model bias, we trained and evaluated two separate models—one that was trained using only the male group, and the other using only the female group. When the model trained by the male-group data was applied to the female testing data, the overall accuracy decreased—sensitivity from 0.93 to 0.86, specificity from 0.92 to 0.86, accuracy from 0.92 to 0.86, balanced accuracy from 0.93 to 0.86, and area under the curve (AUC) from 0.97 to 0.94. Similarly, when the model trained by the female-group data was applied to the male testing data, once again, the overall accuracy decreased—sensitivity from 0.97 to 0.90, specificity from 0.96 to 0.91, accuracy from 0.96 to 0.91, balanced accuracy from 0.96 to 0.90, and AUC from 0.97 to 0.95. Furthermore, when we evaluated each gender-dependent model with the test data from the same gender used for training, the resultant accuracy was also lower than that from the unbiased model.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data and code can be used to replicate the main analysis for "Who Exhibits Cognitive Biases? Mapping Heterogeneity in Attention, Interpretation, and Rumination in Depression." Of note- to protect this dataset from deidentification consistent with best practices, we have removed the zip code variable and binned age. The analysis code may need to be adjusted slightly to account for this, and the results may very slightly from the ones in the manuscript as a result.
Facebook
TwitterThis dataset was created by tyur muthia
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The influence of behavioral biases on aggregate outcomes depends in part on self-selection: whether rational people opt more strongly into aggregate interactions than biased individuals. In betting market, auction and committee experiments, we document that some errors are strongly reduced through self-selection, while others are not affected at all or even amplified. A large part of this variation is explained by differences in the relationship between confidence and performance. In some tasks, they are positively correlated, such that self-selection attenuates errors. In other tasks, rational and biased people are equally confident, such that self-selection has no effects on aggregate quantities.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Selection bias is an important but often neglected problem in comparative research. While comparative case studies pay some attention to this problem, this is less the case in broader cross-national studies, where this problem may appear through the way the data used are generated. The article discusses three examples: studies of the success of newly formed political parties, research on protest events, and recent work on ethnic conflict. In all cases the data at hand are likely to be afflicted by selection bias. Failing to take into consideration this problem leads to serious biases in the estimation of simple relationships. Empirical examples illustrate a possible solution (a variation of a Tobit model) to the problems in these cases. The article also discusses results of Monte Carlo simulations, illustrating under what conditions the proposed estimation procedures lead to improved results.
Facebook
Twitter**Please access the latest verison of data that is here https://huggingface.co/datasets/shainar/BEAD **
email at shaina.raza@torontomu.ca for usage of data
Please cite us if you use it
@article{raza2024beads, title={BEADs: Bias Evaluation Across Domains}, author={Raza, Shaina and Rahman, Mizanur and Zhang, Michael R}, journal={arXiv preprint arXiv:2406.04220}, year={2024} }
license: cc-by-nc-4.0
language: - en pretty_name: Navigating News… See the full description on the dataset page: https://huggingface.co/datasets/newsmediabias/news-bias-full-data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Biases in artificial intelligence (AI) systems pose a range of ethical issues. The myriads of biases in AI systems are briefly reviewed and divided in three main categories: input bias, system bias, and application bias. These biases pose a series of basic ethical challenges: injustice, bad output/outcome, loss of autonomy, transformation of basic concepts and values, and erosion of accountability. A review of the many ways to identify, measure, and mitigate these biases reveals commendable efforts to avoid or reduce bias; however, it also highlights the persistence of unresolved biases. Residual and undetected biases present epistemic challenges with substantial ethical implications. The article further investigates whether the general principles, checklists, guidelines, frameworks, or regulations of AI ethics could address the identified ethical issues with bias. Unfortunately, the depth and diversity of these challenges often exceed the capabilities of existing approaches. Consequently, the article suggests that we must acknowledge and accept some residual ethical issues related to biases in AI systems. By utilizing insights from ethics and moral psychology, we can better navigate this landscape. To maximize the benefits and minimize the harms of biases in AI, it is imperative to identify and mitigate existing biases and remain transparent about the consequences of those we cannot eliminate. This necessitates close collaboration between scientists and ethicists.
Facebook
TwitterThe estimation of disease prevalence in online search engine data (e.g., Google Flu Trends (GFT)) has received a considerable amount of scholarly and public attention in recent years. While the utility of search engine data for disease surveillance has been demonstrated, the scientific community still seeks ways to identify and reduce biases that are embedded in search engine data. The primary goal of this study is to explore new ways of improving the accuracy of disease prevalence estimations by combining traditional disease data with search engine data. A novel method, Biased Sentinel Hospital-based Area Disease Estimation (B-SHADE), is introduced to reduce search engine data bias from a geographical perspective. To monitor search trends on Hand, Foot and Mouth Disease (HFMD) in Guangdong Province, China, we tested our approach by selecting 11 keywords from the Baidu index platform, a Chinese big data analyst similar to GFT. The correlation between the number of real cases and the composite index was 0.8. After decomposing the composite index at the city level, we found that only 10 cities presented a correlation of close to 0.8 or higher. These cities were found to be more stable with respect to search volume, and they were selected as sample cities in order to estimate the search volume of the entire province. After the estimation, the correlation improved from 0.8 to 0.864. After fitting the revised search volume with historical cases, the mean absolute error was 11.19% lower than it was when the original search volume and historical cases were combined. To our knowledge, this is the first study to reduce search engine data bias levels through the use of rigorous spatial sampling strategies.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
IInformation sampling is often biased towards seeking evidence that confirms one’s prior beliefs. Despite such biases being a pervasive feature of human behavior, their underlying causes remain unclear. Many accounts of these biases appeal to limitations of human hypothesis testing and cognition, de facto evoking notions of bounded rationality, but neglect more basic aspects of behavioral control. Here, we investigated a potential role for Pavlovian approach in biasing which information humans will choose to sample. We collected a large novel dataset from 32,445 human subjects, making over 3 million decisions, who played a gambling task designed to measure the latent causes and extent of information-sampling biases. We identified three novel approach-related biases, formalized by comparing subject behavior to a dynamic programming model of optimal information gathering. These biases reflected the amount of information sampled (“positive evidence approach”), the selection of which information to sample (“sampling the favorite”), and the interaction between information sampling and subsequent choices (“rejecting unsampled options”). The prevalence of all three biases was related to a Pavlovian approach-avoid parameter quantified within an entirely independent economic decision task. Our large dataset also revealed that individual differences in the amount of information gathered are a stable trait across multiple gameplays and can be related to demographic measures, including age and educational attainment. As well as revealing limitations in cognitive processing, our findings suggest information sampling biases reflect the expression of primitive, yet potentially ecologically adaptive, behavioral repertoires. One such behavior is sampling from options that will eventually be chosen, even when other sources of information are more pertinent for guiding future action.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Observer bias and other “experimenter effects” occur when researchers’ expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work “blind,” meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.
Usage Notes Evolution literature review dataExact p value datasetjournal_categoriesp values data 24 SeptProportion of significant p values per paperR script to filter and classify the p value dataQuiz answers - guessing effect size from abstractsThe answers provided by the 9 evolutionary biologists to quiz we designed, which aimed to test whether trained specialists are able to infer the relative size/direction of effect size from a paper's title and abstract.readmeDescription of the contents of all the other files in this Dryad submission.R script to statistically analyse the p value dataR script detailing the statistical analyses we performed on the p value datasets.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.
Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).
Data Format: The format of data is:
ID: Numeric unique identifier. Text: Main content. Dimension: Categorical descriptor of the text. Biased_Words: List of words considered biased. Aspect: Specific topic within the text. Label: Neutral, Slightly Biased , Highly Biased
Annotation Scheme: The annotation scheme is based on Active learning, which is Manual Labeling --> Semi-Supervised Learning --> Human Verifications (iterative process)
Bias Label: Indicate the presence/absence of bias (e.g., no bias, mild, strong). Words/Phrases Level Biases: Identify specific biased words/phrases. Subjective Bias (Aspect): Capture biases related to content aspects.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Our analysis reveals a concerning misalignment of values between ChatGPT and the average American. We also show that ChatGPT displays political leanings when generating text and images, but the degree and direction of skew depend on the theme. Notably, ChatGPT repeatedly refused to generate content representing certain mainstream perspectives, citing concerns over misinformation and bias. As generative AI systems like ChatGPT become ubiquitous, such misalignment with societal norms poses risks of distorting public discourse. Without proper safeguards, these systems threaten to exacerbate societal divides and depart from principles that underpin free societies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the widespread use of knowledge graphs (KG) in various automated AI systems and applications, it is very important to ensure that information retrieval algorithms leveraging them are free from societal biases. Previous works have depicted biases that persist in KGs, as well as employed several metrics for measuring the biases. However, such studies lack the systematic exploration of the sensitivity of the bias measurements, through varying sources of data, or the embedding algorithms used. To address this research gap, in this work, we present a holistic analysis of bias measurement on the knowledge graph. First, we attempt to reveal data biases that surface in Wikidata for thirteen different demographics selected from seven continents. Next, we attempt to unfold the variance in the detection of biases by two different knowledge graph embedding algorithms - TransE and ComplEx. We conduct our extensive experiments on a large number of occupations sampled from the thirteen demographics with respect to the sensitive attribute, i.e., gender. Our results show that the inherent data bias that persists in KG can be altered by specific algorithm bias as incorporated by KG embedding learning algorithms. Further, we show that the choice of the state-of-the-art KG embedding algorithm has a strong impact on the ranking of biased occupations irrespective of gender. We observe that the similarity of the biased occupations across demographics is minimal which reflects the socio-cultural differences around the globe. We believe that this full-scale audit of the bias measurement pipeline will raise awareness among the community while deriving insights related to design choices of data and algorithms both and refrain from the popular dogma of ``one-size-fits-all''.
Facebook
Twitter
According to our latest research, the global Bias Detection Platform market size reached USD 1.42 billion in 2024, reflecting a surge in demand for advanced, ethical, and transparent decision-making tools across industries. The market is expected to grow at a CAGR of 17.8% during the forecast period, reaching a projected value of USD 6.13 billion by 2033. This robust growth is primarily driven by the increasing adoption of artificial intelligence (AI) and machine learning (ML) technologies, which has highlighted the urgent need for solutions that can identify and mitigate bias in automated systems and data-driven processes. As organizations worldwide strive for fairness, compliance, and inclusivity, bias detection platforms are becoming a cornerstone of responsible digital transformation.
One of the key growth factors for the Bias Detection Platform market is the rapid integration of AI and ML algorithms into critical business operations. As enterprises leverage these technologies to automate decision-making in areas such as recruitment, financial services, and healthcare, the risk of unintentional bias in algorithms has become a significant concern. Regulatory bodies and industry watchdogs are increasingly mandating transparency and accountability in automated systems, prompting organizations to invest in bias detection platforms to ensure compliance and mitigate reputational risks. Furthermore, the proliferation of big data analytics has amplified the need for robust tools that can scrutinize massive datasets for hidden biases, ensuring that business insights and actions are both accurate and equitable.
Another major driver fueling market growth is the heightened focus on diversity, equity, and inclusion (DEI) initiatives across both public and private sectors. Organizations are under mounting pressure from stakeholders, including customers, investors, and employees, to demonstrate their commitment to fair and unbiased practices. Bias detection platforms are being deployed to audit hiring processes, marketing campaigns, lending decisions, and other critical workflows, helping organizations identify and rectify discriminatory patterns. The increasing availability of advanced software and services that can seamlessly integrate with existing IT infrastructure is further accelerating adoption, making bias detection accessible to enterprises of all sizes.
The evolution of regulatory frameworks and ethical standards around AI and data usage is also acting as a catalyst for market expansion. Governments and international bodies are introducing stringent guidelines to govern the ethical use of AI, with a particular emphasis on eliminating bias and ensuring fairness. This regulatory momentum is compelling organizations to adopt proactive measures, including the implementation of bias detection platforms, to avoid legal liabilities and maintain public trust. Additionally, the growing awareness of the social and economic consequences of biased systems is encouraging a broader range of industries to prioritize bias detection as a core component of their risk management and governance strategies.
From a regional perspective, North America continues to dominate the Bias Detection Platform market, accounting for the largest share of global revenue in 2024. This leadership is attributed to the region’s early adoption of AI technologies, strong regulatory oversight, and a high concentration of technology-driven enterprises. Europe follows closely, benefiting from progressive data protection laws and a robust emphasis on ethical AI. Meanwhile, the Asia Pacific region is emerging as a high-growth market, driven by rapid digitalization, expanding IT infrastructure, and increasing awareness of bias-related challenges in diverse sectors. Latin America and the Middle East & Africa are also witnessing steady growth, supported by rising investments in digital transformation and regulatory advancements.
The Bias Detection Platform market is
Facebook
TwitterConcerns about gender bias in word embedding models have captured substantial attention in the algorithmic bias research literature. Other bias types however have received lesser amounts of scrutiny. This work describes a large-scale analysis of sentiment associations in popular word embedding models along the lines of gender and ethnicity but also along the less frequently studied dimensions of socioeconomic status, age, physical appearance, sexual orientation, religious sentiment and political leanings. Consistent with previous scholarly literature, this work has found systemic bias against given names popular among African-Americans in most embedding models examined. Gender bias in embedding models however appears to be multifaceted and often reversed in polarity to what has been regularly reported. Interestingly, using the common operationalization of the term bias in the fairness literature, novel types of so far unreported bias types in word embedding models have also been identified. Specifically, the popular embedding models analyzed here display negative biases against middle and working-class socioeconomic status, male children, senior citizens, plain physical appearance and intellectual phenomena such as Islamic religious faith, non-religiosity and conservative political orientation. Reasons for the paradoxical underreporting of these bias types in the relevant literature are probably manifold but widely held blind spots when searching for algorithmic bias and a lack of widespread technical jargon to unambiguously describe a variety of algorithmic associations could conceivably be playing a role. The causal origins for the multiplicity of loaded associations attached to distinct demographic groups within embedding models are often unclear but the heterogeneity of said associations and their potential multifactorial roots raises doubts about the validity of grouping them all under the umbrella term bias. Richer and more fine-grained terminology as well as a more comprehensive exploration of the bias landscape could help the fairness epistemic community to characterize and neutralize algorithmic discrimination more efficiently.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
milenamileentje/Dutch-Government-Data-for-Bias-detection dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAccording to a survey of healthcare leaders carried out globally in 2024, almost half of respondents believed that by making AI more transparent and interpretable, this would mitigate the risk of data bias in AI applications for healthcare. Furthermore, ** percent of healthcare leaders thought there should be continuous training and education in AI.