100+ datasets found

Bias in Advertising Data
kaggle.com
zip
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bahraleloom Mahjoub Alsadeg Abdalrahem (2024). Bias in Advertising Data [Dataset]. https://www.kaggle.com/datasets/bahraleloom/bias-in-advertising-data
Explore at:
zip(18491738 bytes)Available download formats
Dataset updated
Apr 6, 2024
Authors
Bahraleloom Mahjoub Alsadeg Abdalrahem
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
To demonstrate discovery, measurement, and mitigation of bias in advertising, we provide a dataset that contains synthetic generated data for users who were shown a certain advertisement (ad). Each instance of the dataset is specific to a user and has feature attributes such as gender, age, income, political/religious affiliation, parental status, home ownership, area (rural/urban), and education status. In addition to the features we also provide information on whether users actually clicked on or were predicted to click on the ad. Clicking on the ad is known as conversion, and the three outcome variables included are: (1) The predicted probability of conversion, (2) Predicted conversion (binary 0/1) which is obtained by thresholding the predicted probability, (3) True conversion (binary 0/1) that indicates whether the user actually clicked on the ad.
h
news-bias-full-data
huggingface.co
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
News Media Biases (2023). news-bias-full-data [Dataset]. https://huggingface.co/datasets/newsmediabias/news-bias-full-data
Explore at:
Dataset updated
Oct 25, 2023
Dataset authored and provided by
News Media Biases
Description
**Please access the latest verison of data that is here https://huggingface.co/datasets/shainar/BEAD **

email at shaina.raza@torontomu.ca for usage of data

Please cite us if you use it

@article{raza2024beads, title={BEADs: Bias Evaluation Across Domains}, author={Raza, Shaina and Rahman, Mizanur and Zhang, Michael R}, journal={arXiv preprint arXiv:2406.04220}, year={2024} }

license: cc-by-nc-4.0

language: - en pretty_name: Navigating News… See the full description on the dataset page: https://huggingface.co/datasets/newsmediabias/news-bias-full-data.
Data and Code for: Confidence, Self-Selection and Bias in the Aggregate
openicpsr.org
delimited
Updated Mar 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Enke; Thomas Graeber; Ryan Oprea (2023). Data and Code for: Confidence, Self-Selection and Bias in the Aggregate [Dataset]. http://doi.org/10.3886/E185741V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E185741V1
Dataset updated
Mar 2, 2023
Dataset provided by
American Economic Associationhttp://www.aeaweb.org/
Authors
Benjamin Enke; Thomas Graeber; Ryan Oprea
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The influence of behavioral biases on aggregate outcomes depends in part on self-selection: whether rational people opt more strongly into aggregate interactions than biased individuals. In betting market, auction and committee experiments, we document that some errors are strongly reduced through self-selection, while others are not affected at all or even amplified. A large part of this variation is explained by differences in the relationship between confidence and performance. In some tasks, they are positively correlated, such that self-selection attenuates errors. In other tasks, rational and biased people are equally confident, such that self-selection has no effects on aggregate quantities.
Opinion on mitigating AI data bias in healthcare worldwide 2024
statista.com
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Opinion on mitigating AI data bias in healthcare worldwide 2024 [Dataset]. https://www.statista.com/statistics/1559311/ways-to-mitigate-ai-bias-in-healthcare-worldwide/
Explore at:
Dataset updated
Jul 18, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2023 - Mar 2024
Area covered
Worldwide
Description
According to a survey of healthcare leaders carried out globally in 2024, almost half of respondents believed that by making AI more transparent and interpretable, this would mitigate the risk of data bias in AI applications for healthcare. Furthermore, ** percent of healthcare leaders thought there should be continuous training and education in AI.
T
Replication Data for: Cognitive Bias Heterogeneity
dataverse.tdl.org
Updated Aug 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Molly McNamara; Molly McNamara (2025). Replication Data for: Cognitive Bias Heterogeneity [Dataset]. http://doi.org/10.18738/T8/754FZT
Explore at:
text/x-r-notebook(12370), text/x-r-notebook(15773), application/x-rlang-transport(20685), text/x-r-notebook(20656)Available download formats
Unique identifier
https://doi.org/10.18738/T8/754FZT
Dataset updated
Aug 15, 2025
Dataset provided by
Texas Data Repository
Authors
Molly McNamara; Molly McNamara
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data and code can be used to replicate the main analysis for "Who Exhibits Cognitive Biases? Mapping Heterogeneity in Attention, Interpretation, and Rumination in Depression." Of note- to protect this dataset from deidentification consistent with best practices, we have removed the zip code variable and binned age. The analysis code may need to be adjusted slightly to account for this, and the results may very slightly from the ones in the manuscript as a result.
Z
Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...
data.niaid.nih.gov
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haak, Fabian; Schaer, Philipp (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
Explore at:
Dataset updated
Mar 1, 2023
Dataset provided by
Technische Hochschule Köln
Authors
Haak, Fabian; Schaer, Philipp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

Dataset 2: Search Query Suggestions (suggestions.csv)

The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

AllSides Scraper

At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
News Bias Data
kaggle.com
zip
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nitish Kumar Thakur (2025). News Bias Data [Dataset]. https://www.kaggle.com/datasets/nitishxthakur/news-bias-data/data
Explore at:
zip(367303570 bytes)Available download formats
Dataset updated
Apr 8, 2025
Authors
Nitish Kumar Thakur
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.

Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).

Data Format: The format of data is:

ID: Numeric unique identifier. Text: Main content. Dimension: Categorical descriptor of the text. Biased_Words: List of words considered biased. Aspect: Specific topic within the text. Label: Neutral, Slightly Biased , Highly Biased

Annotation Scheme: The annotation scheme is based on Active learning, which is Manual Labeling --> Semi-Supervised Learning --> Human Verifications (iterative process)

Bias Label: Indicate the presence/absence of bias (e.g., no bias, mild, strong). Words/Phrases Level Biases: Identify specific biased words/phrases. Subjective Bias (Aspect): Capture biases related to content aspects.
H
Replication data for: Selection Bias in Comparative Research: The Case of...
dataverse.harvard.edu
Updated Mar 8, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Hug (2010). Replication data for: Selection Bias in Comparative Research: The Case of Incomplete Data Sets [Dataset]. http://doi.org/10.7910/DVN/QO28VG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/QO28VG
Dataset updated
Mar 8, 2010
Dataset provided by
Harvard Dataverse
Authors
Simon Hug
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Selection bias is an important but often neglected problem in comparative research. While comparative case studies pay some attention to this problem, this is less the case in broader cross-national studies, where this problem may appear through the way the data used are generated. The article discusses three examples: studies of the success of newly formed political parties, research on protest events, and recent work on ethnic conflict. In all cases the data at hand are likely to be afflicted by selection bias. Failing to take into consideration this problem leads to serious biases in the estimation of simple relationships. Empirical examples illustrate a possible solution (a variation of a Tobit model) to the problems in these cases. The article also discusses results of Monte Carlo simulations, illustrating under what conditions the proposed estimation procedures lead to improved results.
Marketing Bias data
kaggle.com
zip
Updated Oct 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad (2023). Marketing Bias data [Dataset]. https://www.kaggle.com/datasets/pypiahmad/marketing-bias-data
Explore at:
zip(50328 bytes)Available download formats
Dataset updated
Oct 29, 2023
Authors
Ahmad
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Marketing Bias dataset encapsulates the interactions between users and products on ModCloth and Amazon Electronics, emphasizing on the potential marketing bias inherent in product recommendations. This bias is explored through attributes related to product marketing and user/item interactions.

Basic Statistics:
- ModCloth: - Reviews: 99,893 - Items: 1,020 - Users: 44,783 - Bias Type: Body Shape

Amazon Electronics:

Reviews: 1,292,954

Items: 9,560

Users: 1,157,633

Bias Type: Gender

Metadata: - Ratings - Product Images - User Identities - Item Sizes, User Genders

Example (ModCloth): The data example provided showcases a snippet from ModCloth data with columns like item_id, user_id, rating, timestamp, size, fit, user_attr, model_attr, and others.

Download Links: Visit the project page for download links.

Citation: If you utilize this dataset, please cite the following:

Title: Addressing Marketing Bias in Product Recommendations Authors: Mengting Wan, Jianmo Ni, Rishabh Misra, Julian McAuley Published In: WSDM, 2020 PDF Link

Dataset Files: - df_electronics.csv - df_modcloth.csv

The dataset is structured to provide a comprehensive overview of user-item interactions and attributes that may contribute to marketing bias, making it a valuable resource for anyone investigating marketing strategies and recommendation systems.
Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at...
frontiersin.figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee (2023). Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at an Early Stage of COVID-19.docx [Dataset]. http://doi.org/10.3389/fphys.2021.778720.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphys.2021.778720.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Artificial intelligence (AI) technologies have been applied in various medical domains to predict patient outcomes with high accuracy. As AI becomes more widely adopted, the problem of model bias is increasingly apparent. In this study, we investigate the model bias that can occur when training a model using datasets for only one particular gender and aim to present new insights into the bias issue. For the investigation, we considered an AI model that predicts severity at an early stage based on the medical records of coronavirus disease (COVID-19) patients. For 5,601 confirmed COVID-19 patients, we used 37 medical records, namely, basic patient information, physical index, initial examination findings, clinical findings, comorbidity diseases, and general blood test results at an early stage. To investigate the gender-based AI model bias, we trained and evaluated two separate models—one that was trained using only the male group, and the other using only the female group. When the model trained by the male-group data was applied to the female testing data, the overall accuracy decreased—sensitivity from 0.93 to 0.86, specificity from 0.92 to 0.86, accuracy from 0.92 to 0.86, balanced accuracy from 0.93 to 0.86, and area under the curve (AUC) from 0.97 to 0.94. Similarly, when the model trained by the female-group data was applied to the male testing data, once again, the overall accuracy decreased—sensitivity from 0.97 to 0.90, specificity from 0.96 to 0.91, accuracy from 0.96 to 0.91, balanced accuracy from 0.96 to 0.90, and AUC from 0.97 to 0.95. Furthermore, when we evaluated each gender-dependent model with the test data from the same gender used for training, the resultant accuracy was also lower than that from the unbiased model.
Navigating News Narratives: A Media Bias Analysis Dataset
figshare.com
txt
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaina Raza (2023). Navigating News Narratives: A Media Bias Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24422122.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24422122.v4
Dataset updated
Dec 8, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Shaina Raza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0
h
bias-shades
huggingface.co
Updated Feb 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigScience Catalogue Data (2023). bias-shades [Dataset]. https://huggingface.co/datasets/bigscience-catalogue-data/bias-shades
Explore at:
Dataset updated
Feb 22, 2023
Dataset authored and provided by
BigScience Catalogue Data
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This is a preliminary version of the bias SHADES dataset for evaluating LMs for social biases.
o
Data from: Deconstructing Bias in Social Preferences Reveals Groupy and Not...
openicpsr.org
stata
Updated Aug 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rachel Kranton; Matthew Pease; Seth Sanders; Scott Heutell (2020). Deconstructing Bias in Social Preferences Reveals Groupy and Not Groupy Behavior [Dataset]. http://doi.org/10.3886/E120555V1
Explore at:
stataAvailable download formats
Unique identifier
https://doi.org/10.3886/E120555V1
Dataset updated
Aug 5, 2020
Dataset provided by
Cornell University
UPMC
Duke University
Authors
Rachel Kranton; Matthew Pease; Seth Sanders; Scott Heutell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2010 - 2020
Area covered
NC, Durham
Description
Group divisions are a continual feature of human history, with biases toward people’s own groups shown in both experimental and natural settings. Using a novel within-subject design, this work deconstructs group biases to find significant and robust individual differences; some individuals consistently respond to group divisions, while others do not. We examined individual behavior in two treatments in which subjects make pairwise decisions that determine own and others’ income. In a political treatment, which divided subjects into groups based on their political leanings, political party members showed more ingroup bias than Independents who professed the same political opinions. But this greater bias was also present in a minimal group treatment, showing that stronger group identification was not the driver of higher favoritism in the political setting. Analyzing individual choices across the experiment, we categorize participants as “groupy” or “not groupy,” such that groupy participants have social preferences that change for ingroup and outgroup recipients, while not-groupy participants’ preferences do not change across group context. Demonstrating further that the group identity of the recipient mattered less to their choices, strongly not-groupy subjects made allocation decisions faster. We conclude that observed ingroup biases build on a foundation of heterogeneity in individual groupiness.
Z
Data from: Results and further resources concerning our pre-studies...
data.niaid.nih.gov
Updated Sep 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamborg, Felix; Spinde, Timo; Heinser, Kim; Donnay, Karsten; Gipp, Bela (2021). Results and further resources concerning our pre-studies concerning revealing biases in news articles [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5517400
Explore at:
Dataset updated
Sep 20, 2021
Dataset provided by
University of Konstanz
University of Zurich
University of Wuppertal
Authors
Hamborg, Felix; Spinde, Timo; Heinser, Kim; Donnay, Karsten; Gipp, Bela
Description
Slanted news coverage strongly affects public opinion. This is especially true for coverage on politics and related issues, where studies have shown that bias in the news may strongly influence elections and other collective decisions. Due to its viable importance, news coverage has long been studied in the social sciences, resulting in comprehensive models to describe it and effective yet costly methods to analyze it, such as content analysis. We present an in-progress system for news recommendation that is the first to automate the manual procedure of content analysis to reveal person-targeting biases in news articles reporting on policy issues. In a large-scale user study, we find very promising results regarding this interdisciplinary research direction. Our recommender detects and reveals substantial frames that are actually present in individual news articles. In contrast, prior work rather only facilitates the visibility of biases, e.g., by distinguishing left- and right-wing outlets. Further, our study shows that recommending news articles that differently frame an event significantly improves respondents' awareness of bias.
f
Sex Bias in COVID-19 Data - Supplementary Table 1
datasetcatalog.nlm.nih.gov
zivahub.uct.ac.za
Updated Sep 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosser, Lizzy; Wedderburn, Lucy; Webb, Kate; Ciurtin, Coziana; Radziszewska, Ania; Deakin, Claire; Peckham, Hannah; Raine, Charles; De Gruijter, Nina (2020). Sex Bias in COVID-19 Data - Supplementary Table 1 [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000588710
Explore at:
Dataset updated
Sep 28, 2020
Authors
Rosser, Lizzy; Wedderburn, Lucy; Webb, Kate; Ciurtin, Coziana; Radziszewska, Ania; Deakin, Claire; Peckham, Hannah; Raine, Charles; De Gruijter, Nina
Description
An online search of government websites and published literature was performed for regional data reports on COVID-19 cases that included sex as a variable from 1st January 2020 up until 1st June 2020 (Search terms: COVID-19/case/sex/country/data/death/ICU/ITU). In order to ensure unbiased representation from as many regions as possible, a cross check was done using the list of countries reporting data on ‘Worldometer’, and an attempt was made to include as many regions reporting sex data as possible. Reports were translated using Google translate if they were not in English.Data selection, extraction and synthesisReports were included if they contained sex as a variable in data describing case number, intensive treatment unit (ITU) admission, or mortality. Data were entered directly by individual researchers into an online structured data extraction table. For some sources, counts of male confirmed cases or male deaths were not provided, but percentages of male cases or male deaths were provided instead. To include these sources and avoid biases that might be introduced by their exclusion, we calculated counts of male confirmed cases and male deaths from the reported percentages with rounding to the nearest integer. We acknowledge that this approach assumes that the reported percentages are reflective of the true percentages. For some sources, data included confirmed cases and deaths of unknown sex. For these sources, the reported totals were used where the proportion of unknown sex was small. This approach was preferred to excluding cases of unknown sex in order to avoid bias. The estimates represent the proportion of known male infections and odds ratios for mortality associated with known male sex, and will differ slightly from what the true values would be if the sex had been reported for all cases. Data were available at the level of country or regional summary data representing distinct individuals for each report, but not at the level of covariates for all individuals within a study. Consequently, covariates such as lifestyle, comorbidities, testing method and case type (hospital vs. community) could not be controlled for.
s
Data from: Correction for bias in meta-analysis of little-replicated studies...
eprints.soton.ac.uk
data.niaid.nih.gov
+3more
Updated Jan 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doncaster, C. Patrick; Spake, Rebecca (2018). Data from: Correction for bias in meta-analysis of little-replicated studies [Dataset]. http://doi.org/10.5061/dryad.5f4g6
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.5f4g6
Dataset updated
Jan 1, 2018
Dataset provided by
DRYAD
Authors
Doncaster, C. Patrick; Spake, Rebecca
Description
Data S1R script for simulations. Simulations of fixed- and random-effects meta-analysis using alternative estimators: one-sample mean, two-sample Hedges' g, and two-sample lnR, for comparison of performance by inverse-variance weighting and inverse-adjusted-variance weighting.Doncaster&Spake_Data_S1.txtData S2R script for calculating mean-adjusted error variance. Finds the mean-adjusted study variance for all the primary studies contributing to a meta-analysis, for a one-sample mean, or two-sample log response ratio, or or two-sample Hedges' g.Doncaster&Spake_Data_S2.txt,1. Meta-analyses conventionally weight study estimates on the inverse of their error variance, in order to maximize precision. Unbiased variability in the estimates of these study-level error variances increases with the inverse of study-level replication. Here we demonstrate how this variability accumulates asymmetrically across studies in precision-weighted meta-analysis, to cause undervaluation of the meta-level effect size or its error variance (the meta-effect and meta-variance). 2. Small samples, typical of the ecological literature, induce big sampling errors in variance estimation, which substantially bias precision-weighted meta-analysis. Simulations revealed that biases differed little between random- and fixed-effects tests. Meta-estimation of a one-sample mean from 20 studies, with sample sizes of 3 to 20 observations, undervalued the meta-variance by ~20%. Meta-analysis of two-sample designs from 20 studies, with sample sizes of 3 to 10 observations, undervalued the meta-variance by 15-20% for the log response ratio (lnR); it undervalued the meta-effect by ~10% for the standardised mean difference (SMD). 3. For all estimators, biases were eliminated or reduced by a simple adjustment to the weighting on study precision. The study-specific component of error variance prone to sampling error and not parametrically attributable to study-specific replication was replaced by its cross-study mean, on the assumption of random sampling from the same population variance for all studies, and sufficient studies for averaging. Weighting each study by the inverse of this mean-adjusted error variance universally improved accuracy in estimation of both the meta-effect and its significance, regardless of number of studies. For comparison, weighting only on sample size gave the same improvement in accuracy, but could not sensibly estimate significance. 4. For the one-sample mean and two-sample lnR, adjusted weighting also improved estimation of between-study variance by DerSimonian-Laird and REML methods. For random-effects meta-analysis of SMD from little-replicated studies, the most accurate meta-estimates obtained from adjusted weights following conventionally-weighted estimation of between-study variance. 5. We recommend adoption of weighting by inverse adjusted-variance for meta-analyses of well- and little-replicated studies, because it improves accuracy and significance of meta-estimates, and it can extend the scope of the meta-analysis to include some studies without variance estimates.
Data bias
kaggle.com
zip
Updated Mar 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tyur muthia (2022). Data bias [Dataset]. https://www.kaggle.com/datasets/tyurmuthia/data-bias
Explore at:
zip(654062 bytes)Available download formats
Dataset updated
Mar 11, 2022
Authors
tyur muthia
Description
Dataset

This dataset was created by tyur muthia

Contents
m
Data from: Prolific observer bias in the life sciences: why we need blind...
figshare.mq.edu.au
datasetcatalog.nlm.nih.gov
+4more
bin
Updated Jun 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions (2023). Data from: Prolific observer bias in the life sciences: why we need blind data recording [Dataset]. http://doi.org/10.5061/dryad.hn40n
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.hn40n
Dataset updated
Jun 14, 2023
Dataset provided by
Macquarie University
Authors
Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Observer bias and other “experimenter effects” occur when researchers’ expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work “blind,” meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.

Usage Notes Evolution literature review dataExact p value datasetjournal_categoriesp values data 24 SeptProportion of significant p values per paperR script to filter and classify the p value dataQuiz answers - guessing effect size from abstractsThe answers provided by the 9 evolutionary biologists to quiz we designed, which aimed to test whether trained specialists are able to infer the relative size/direction of effect size from a paper's title and abstract.readmeDescription of the contents of all the other files in this Dryad submission.R script to statistically analyse the p value dataR script detailing the statistical analyses we performed on the p value datasets.
f
Data Sheet 1_Biases in AI: acknowledging and addressing the inevitable...
frontiersin.figshare.com
pdf
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bjørn Hofmann (2025). Data Sheet 1_Biases in AI: acknowledging and addressing the inevitable ethical issues.pdf [Dataset]. http://doi.org/10.3389/fdgth.2025.1614105.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fdgth.2025.1614105.s001
Dataset updated
Aug 20, 2025
Dataset provided by
Frontiers
Authors
Bjørn Hofmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Biases in artificial intelligence (AI) systems pose a range of ethical issues. The myriads of biases in AI systems are briefly reviewed and divided in three main categories: input bias, system bias, and application bias. These biases pose a series of basic ethical challenges: injustice, bad output/outcome, loss of autonomy, transformation of basic concepts and values, and erosion of accountability. A review of the many ways to identify, measure, and mitigate these biases reveals commendable efforts to avoid or reduce bias; however, it also highlights the persistence of unresolved biases. Residual and undetected biases present epistemic challenges with substantial ethical implications. The article further investigates whether the general principles, checklists, guidelines, frameworks, or regulations of AI ethics could address the identified ethical issues with bias. Unfortunately, the depth and diversity of these challenges often exceed the capabilities of existing approaches. Consequently, the article suggests that we must acknowledge and accept some residual ethical issues related to biases in AI systems. By utilizing insights from ethics and moral psychology, we can better navigate this landscape. To maximize the benefits and minimize the harms of biases in AI, it is imperative to identify and mitigate existing biases and remain transparent about the consequences of those we cannot eliminate. This necessitates close collaboration between scientists and ethicists.
Data from: Decisions reduce sensitivity to subsequent information
zenodo.org
datadryad.org
Updated May 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zohar Z. Bronfman; Noam Brezis; Rani Moran; Konstantinos Tsetsos; Tobias Donner; Marius Usher; Zohar Z. Bronfman; Noam Brezis; Rani Moran; Konstantinos Tsetsos; Tobias Donner; Marius Usher (2022). Data from: Decisions reduce sensitivity to subsequent information [Dataset]. http://doi.org/10.5061/dryad.40f6v
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.40f6v
Dataset updated
May 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zohar Z. Bronfman; Noam Brezis; Rani Moran; Konstantinos Tsetsos; Tobias Donner; Marius Usher; Zohar Z. Bronfman; Noam Brezis; Rani Moran; Konstantinos Tsetsos; Tobias Donner; Marius Usher
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Behavioural studies over half a century indicate that making categorical choices alters beliefs about the state of the world. People seem biased to confirm previous choices, and to suppress contradicting information. These choice-dependent biases imply a fundamental bound of human rationality. However, it remains unclear whether these effects extend to lower level decisions, and only little is known about the computational mechanisms underlying them. Building on the framework of sequential-sampling models of decision-making, we developed novel psychophysical protocols that enable us to dissect quantitatively how choices affect the way decision-makers accumulate additional noisy evidence. We find robust choice-induced biases in the accumulation of abstract numerical (experiment 1) and low-level perceptual (experiment 2) evidence. These biases deteriorate estimations of the mean value of the numerical sequence (experiment 1) and reduce the likelihood to revise decisions (experiment 2). Computational modelling reveals that choices trigger a reduction of sensitivity to subsequent evidence via multiplicative gain modulation, rather than shifting the decision variable towards the chosen alternative in an additive fashion. Our results thus show that categorical choices alter the evidence accumulation mechanism itself, rather than just its outcome, rendering the decision-maker less sensitive to new information.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bahraleloom Mahjoub Alsadeg Abdalrahem (2024). Bias in Advertising Data [Dataset]. https://www.kaggle.com/datasets/bahraleloom/bias-in-advertising-data

Bias in Advertising Data

Explore at:

zip(18491738 bytes)Available download formats

Dataset updated

Apr 6, 2024

Authors

Bahraleloom Mahjoub Alsadeg Abdalrahem

License

https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

Description

To demonstrate discovery, measurement, and mitigation of bias in advertising, we provide a dataset that contains synthetic generated data for users who were shown a certain advertisement (ad). Each instance of the dataset is specific to a user and has feature attributes such as gender, age, income, political/religious affiliation, parental status, home ownership, area (rural/urban), and education status. In addition to the features we also provide information on whether users actually clicked on or were predicted to click on the ad. Clicking on the ad is known as conversion, and the three outcome variables included are: (1) The predicted probability of conversion, (2) Predicted conversion (binary 0/1) which is obtained by thresholding the predicted probability, (3) True conversion (binary 0/1) that indicates whether the user actually clicked on the ad.

Clear search

Close search

Google apps

Main menu

Bias in Advertising Data

news-bias-full-data

Data and Code for: Confidence, Self-Selection and Bias in the Aggregate

Opinion on mitigating AI data bias in healthcare worldwide 2024

Replication Data for: Cognitive Bias Heterogeneity

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

News Bias Data

Replication data for: Selection Bias in Comparative Research: The Case of...

Marketing Bias data

Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at...

Navigating News Narratives: A Media Bias Analysis Dataset

bias-shades

Data from: Deconstructing Bias in Social Preferences Reveals Groupy and Not...

Data from: Results and further resources concerning our pre-studies...

Sex Bias in COVID-19 Data - Supplementary Table 1

Data from: Correction for bias in meta-analysis of little-replicated studies...

Data bias

Dataset

Contents

Data from: Prolific observer bias in the life sciences: why we need blind...

Data Sheet 1_Biases in AI: acknowledging and addressing the inevitable...

Data from: Decisions reduce sensitivity to subsequent information

Bias in Advertising Data