100+ datasets found
  1. Bias in Advertising Data

    • kaggle.com
    zip
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bahraleloom Mahjoub Alsadeg Abdalrahem (2024). Bias in Advertising Data [Dataset]. https://www.kaggle.com/datasets/bahraleloom/bias-in-advertising-data
    Explore at:
    zip(18491738 bytes)Available download formats
    Dataset updated
    Apr 6, 2024
    Authors
    Bahraleloom Mahjoub Alsadeg Abdalrahem
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    To demonstrate discovery, measurement, and mitigation of bias in advertising, we provide a dataset that contains synthetic generated data for users who were shown a certain advertisement (ad). Each instance of the dataset is specific to a user and has feature attributes such as gender, age, income, political/religious affiliation, parental status, home ownership, area (rural/urban), and education status. In addition to the features we also provide information on whether users actually clicked on or were predicted to click on the ad. Clicking on the ad is known as conversion, and the three outcome variables included are: (1) The predicted probability of conversion, (2) Predicted conversion (binary 0/1) which is obtained by thresholding the predicted probability, (3) True conversion (binary 0/1) that indicates whether the user actually clicked on the ad.

  2. h

    news-bias-full-data

    • huggingface.co
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    News Media Biases (2023). news-bias-full-data [Dataset]. https://huggingface.co/datasets/newsmediabias/news-bias-full-data
    Explore at:
    Dataset updated
    Oct 25, 2023
    Dataset authored and provided by
    News Media Biases
    Description

    **Please access the latest verison of data that is here https://huggingface.co/datasets/shainar/BEAD **

    email at shaina.raza@torontomu.ca for usage of data

      Please cite us if you use it
    

    @article{raza2024beads, title={BEADs: Bias Evaluation Across Domains}, author={Raza, Shaina and Rahman, Mizanur and Zhang, Michael R}, journal={arXiv preprint arXiv:2406.04220}, year={2024} }

      license: cc-by-nc-4.0
    

    language: - en pretty_name: Navigating News… See the full description on the dataset page: https://huggingface.co/datasets/newsmediabias/news-bias-full-data.

  3. Data and Code for: Confidence, Self-Selection and Bias in the Aggregate

    • openicpsr.org
    delimited
    Updated Mar 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Enke; Thomas Graeber; Ryan Oprea (2023). Data and Code for: Confidence, Self-Selection and Bias in the Aggregate [Dataset]. http://doi.org/10.3886/E185741V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Mar 2, 2023
    Dataset provided by
    American Economic Associationhttp://www.aeaweb.org/
    Authors
    Benjamin Enke; Thomas Graeber; Ryan Oprea
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The influence of behavioral biases on aggregate outcomes depends in part on self-selection: whether rational people opt more strongly into aggregate interactions than biased individuals. In betting market, auction and committee experiments, we document that some errors are strongly reduced through self-selection, while others are not affected at all or even amplified. A large part of this variation is explained by differences in the relationship between confidence and performance. In some tasks, they are positively correlated, such that self-selection attenuates errors. In other tasks, rational and biased people are equally confident, such that self-selection has no effects on aggregate quantities.

  4. Opinion on mitigating AI data bias in healthcare worldwide 2024

    • statista.com
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Opinion on mitigating AI data bias in healthcare worldwide 2024 [Dataset]. https://www.statista.com/statistics/1559311/ways-to-mitigate-ai-bias-in-healthcare-worldwide/
    Explore at:
    Dataset updated
    Jul 18, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2023 - Mar 2024
    Area covered
    Worldwide
    Description

    According to a survey of healthcare leaders carried out globally in 2024, almost half of respondents believed that by making AI more transparent and interpretable, this would mitigate the risk of data bias in AI applications for healthcare. Furthermore, ** percent of healthcare leaders thought there should be continuous training and education in AI.

  5. T

    Replication Data for: Cognitive Bias Heterogeneity

    • dataverse.tdl.org
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Molly McNamara; Molly McNamara (2025). Replication Data for: Cognitive Bias Heterogeneity [Dataset]. http://doi.org/10.18738/T8/754FZT
    Explore at:
    text/x-r-notebook(12370), text/x-r-notebook(15773), application/x-rlang-transport(20685), text/x-r-notebook(20656)Available download formats
    Dataset updated
    Aug 15, 2025
    Dataset provided by
    Texas Data Repository
    Authors
    Molly McNamara; Molly McNamara
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data and code can be used to replicate the main analysis for "Who Exhibits Cognitive Biases? Mapping Heterogeneity in Attention, Interpretation, and Rumination in Depression." Of note- to protect this dataset from deidentification consistent with best practices, we have removed the zip code variable and binned age. The analysis code may need to be adjusted slightly to account for this, and the results may very slightly from the ones in the manuscript as a result.

  6. Z

    Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

    • data.niaid.nih.gov
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haak, Fabian; Schaer, Philipp (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
    Explore at:
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Technische Hochschule Köln
    Authors
    Haak, Fabian; Schaer, Philipp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

    Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

    Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

    The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

    To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

    Dataset 2: Search Query Suggestions (suggestions.csv)

    The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

    The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

    We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

    AllSides Scraper

    At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

    We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

  7. News Bias Data

    • kaggle.com
    zip
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitish Kumar Thakur (2025). News Bias Data [Dataset]. https://www.kaggle.com/datasets/nitishxthakur/news-bias-data/data
    Explore at:
    zip(367303570 bytes)Available download formats
    Dataset updated
    Apr 8, 2025
    Authors
    Nitish Kumar Thakur
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.

    Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).

    Data Format: The format of data is:

    ID: Numeric unique identifier. Text: Main content. Dimension: Categorical descriptor of the text. Biased_Words: List of words considered biased. Aspect: Specific topic within the text. Label: Neutral, Slightly Biased , Highly Biased

    Annotation Scheme: The annotation scheme is based on Active learning, which is Manual Labeling --> Semi-Supervised Learning --> Human Verifications (iterative process)

    Bias Label: Indicate the presence/absence of bias (e.g., no bias, mild, strong). Words/Phrases Level Biases: Identify specific biased words/phrases. Subjective Bias (Aspect): Capture biases related to content aspects.

  8. H

    Replication data for: Selection Bias in Comparative Research: The Case of...

    • dataverse.harvard.edu
    Updated Mar 8, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Hug (2010). Replication data for: Selection Bias in Comparative Research: The Case of Incomplete Data Sets [Dataset]. http://doi.org/10.7910/DVN/QO28VG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 8, 2010
    Dataset provided by
    Harvard Dataverse
    Authors
    Simon Hug
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Selection bias is an important but often neglected problem in comparative research. While comparative case studies pay some attention to this problem, this is less the case in broader cross-national studies, where this problem may appear through the way the data used are generated. The article discusses three examples: studies of the success of newly formed political parties, research on protest events, and recent work on ethnic conflict. In all cases the data at hand are likely to be afflicted by selection bias. Failing to take into consideration this problem leads to serious biases in the estimation of simple relationships. Empirical examples illustrate a possible solution (a variation of a Tobit model) to the problems in these cases. The article also discusses results of Monte Carlo simulations, illustrating under what conditions the proposed estimation procedures lead to improved results.

  9. Marketing Bias data

    • kaggle.com
    zip
    Updated Oct 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad (2023). Marketing Bias data [Dataset]. https://www.kaggle.com/datasets/pypiahmad/marketing-bias-data
    Explore at:
    zip(50328 bytes)Available download formats
    Dataset updated
    Oct 29, 2023
    Authors
    Ahmad
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Marketing Bias dataset encapsulates the interactions between users and products on ModCloth and Amazon Electronics, emphasizing on the potential marketing bias inherent in product recommendations. This bias is explored through attributes related to product marketing and user/item interactions.

    Basic Statistics:
    - ModCloth: - Reviews: 99,893 - Items: 1,020 - Users: 44,783 - Bias Type: Body Shape

    • Amazon Electronics:
      • Reviews: 1,292,954
      • Items: 9,560
      • Users: 1,157,633
      • Bias Type: Gender

    Metadata: - Ratings - Product Images - User Identities - Item Sizes, User Genders

    Example (ModCloth): The data example provided showcases a snippet from ModCloth data with columns like item_id, user_id, rating, timestamp, size, fit, user_attr, model_attr, and others.

    Download Links: Visit the project page for download links.

    Citation: If you utilize this dataset, please cite the following:

    Title: Addressing Marketing Bias in Product Recommendations Authors: Mengting Wan, Jianmo Ni, Rishabh Misra, Julian McAuley Published In: WSDM, 2020 PDF Link

    Dataset Files: - df_electronics.csv - df_modcloth.csv

    The dataset is structured to provide a comprehensive overview of user-item interactions and attributes that may contribute to marketing bias, making it a valuable resource for anyone investigating marketing strategies and recommendation systems.

  10. Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at...

    • frontiersin.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee (2023). Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at an Early Stage of COVID-19.docx [Dataset]. http://doi.org/10.3389/fphys.2021.778720.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artificial intelligence (AI) technologies have been applied in various medical domains to predict patient outcomes with high accuracy. As AI becomes more widely adopted, the problem of model bias is increasingly apparent. In this study, we investigate the model bias that can occur when training a model using datasets for only one particular gender and aim to present new insights into the bias issue. For the investigation, we considered an AI model that predicts severity at an early stage based on the medical records of coronavirus disease (COVID-19) patients. For 5,601 confirmed COVID-19 patients, we used 37 medical records, namely, basic patient information, physical index, initial examination findings, clinical findings, comorbidity diseases, and general blood test results at an early stage. To investigate the gender-based AI model bias, we trained and evaluated two separate models—one that was trained using only the male group, and the other using only the female group. When the model trained by the male-group data was applied to the female testing data, the overall accuracy decreased—sensitivity from 0.93 to 0.86, specificity from 0.92 to 0.86, accuracy from 0.92 to 0.86, balanced accuracy from 0.93 to 0.86, and area under the curve (AUC) from 0.97 to 0.94. Similarly, when the model trained by the female-group data was applied to the male testing data, once again, the overall accuracy decreased—sensitivity from 0.97 to 0.90, specificity from 0.96 to 0.91, accuracy from 0.96 to 0.91, balanced accuracy from 0.96 to 0.90, and AUC from 0.97 to 0.95. Furthermore, when we evaluated each gender-dependent model with the test data from the same gender used for training, the resultant accuracy was also lower than that from the unbiased model.

  11. Navigating News Narratives: A Media Bias Analysis Dataset

    • figshare.com
    txt
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaina Raza (2023). Navigating News Narratives: A Media Bias Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24422122.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Shaina Raza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0

  12. h

    bias-shades

    • huggingface.co
    Updated Feb 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Catalogue Data (2023). bias-shades [Dataset]. https://huggingface.co/datasets/bigscience-catalogue-data/bias-shades
    Explore at:
    Dataset updated
    Feb 22, 2023
    Dataset authored and provided by
    BigScience Catalogue Data
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This is a preliminary version of the bias SHADES dataset for evaluating LMs for social biases.

  13. o

    Data from: Deconstructing Bias in Social Preferences Reveals Groupy and Not...

    • openicpsr.org
    stata
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachel Kranton; Matthew Pease; Seth Sanders; Scott Heutell (2020). Deconstructing Bias in Social Preferences Reveals Groupy and Not Groupy Behavior [Dataset]. http://doi.org/10.3886/E120555V1
    Explore at:
    stataAvailable download formats
    Dataset updated
    Aug 5, 2020
    Dataset provided by
    Cornell University
    UPMC
    Duke University
    Authors
    Rachel Kranton; Matthew Pease; Seth Sanders; Scott Heutell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2010 - 2020
    Area covered
    NC, Durham
    Description

    Group divisions are a continual feature of human history, with biases toward people’s own groups shown in both experimental and natural settings. Using a novel within-subject design, this work deconstructs group biases to find significant and robust individual differences; some individuals consistently respond to group divisions, while others do not. We examined individual behavior in two treatments in which subjects make pairwise decisions that determine own and others’ income. In a political treatment, which divided subjects into groups based on their political leanings, political party members showed more ingroup bias than Independents who professed the same political opinions. But this greater bias was also present in a minimal group treatment, showing that stronger group identification was not the driver of higher favoritism in the political setting. Analyzing individual choices across the experiment, we categorize participants as “groupy” or “not groupy,” such that groupy participants have social preferences that change for ingroup and outgroup recipients, while not-groupy participants’ preferences do not change across group context. Demonstrating further that the group identity of the recipient mattered less to their choices, strongly not-groupy subjects made allocation decisions faster. We conclude that observed ingroup biases build on a foundation of heterogeneity in individual groupiness.

  14. Z

    Data from: Results and further resources concerning our pre-studies...

    • data.niaid.nih.gov
    Updated Sep 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamborg, Felix; Spinde, Timo; Heinser, Kim; Donnay, Karsten; Gipp, Bela (2021). Results and further resources concerning our pre-studies concerning revealing biases in news articles [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5517400
    Explore at:
    Dataset updated
    Sep 20, 2021
    Dataset provided by
    University of Konstanz
    University of Zurich
    University of Wuppertal
    Authors
    Hamborg, Felix; Spinde, Timo; Heinser, Kim; Donnay, Karsten; Gipp, Bela
    Description

    Slanted news coverage strongly affects public opinion. This is especially true for coverage on politics and related issues, where studies have shown that bias in the news may strongly influence elections and other collective decisions. Due to its viable importance, news coverage has long been studied in the social sciences, resulting in comprehensive models to describe it and effective yet costly methods to analyze it, such as content analysis. We present an in-progress system for news recommendation that is the first to automate the manual procedure of content analysis to reveal person-targeting biases in news articles reporting on policy issues. In a large-scale user study, we find very promising results regarding this interdisciplinary research direction. Our recommender detects and reveals substantial frames that are actually present in individual news articles. In contrast, prior work rather only facilitates the visibility of biases, e.g., by distinguishing left- and right-wing outlets. Further, our study shows that recommending news articles that differently frame an event significantly improves respondents' awareness of bias.

  15. f

    Sex Bias in COVID-19 Data - Supplementary Table 1

    • datasetcatalog.nlm.nih.gov
    • zivahub.uct.ac.za
    Updated Sep 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosser, Lizzy; Wedderburn, Lucy; Webb, Kate; Ciurtin, Coziana; Radziszewska, Ania; Deakin, Claire; Peckham, Hannah; Raine, Charles; De Gruijter, Nina (2020). Sex Bias in COVID-19 Data - Supplementary Table 1 [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000588710
    Explore at:
    Dataset updated
    Sep 28, 2020
    Authors
    Rosser, Lizzy; Wedderburn, Lucy; Webb, Kate; Ciurtin, Coziana; Radziszewska, Ania; Deakin, Claire; Peckham, Hannah; Raine, Charles; De Gruijter, Nina
    Description

    An online search of government websites and published literature was performed for regional data reports on COVID-19 cases that included sex as a variable from 1st January 2020 up until 1st June 2020 (Search terms: COVID-19/case/sex/country/data/death/ICU/ITU). In order to ensure unbiased representation from as many regions as possible, a cross check was done using the list of countries reporting data on ‘Worldometer’, and an attempt was made to include as many regions reporting sex data as possible. Reports were translated using Google translate if they were not in English.Data selection, extraction and synthesisReports were included if they contained sex as a variable in data describing case number, intensive treatment unit (ITU) admission, or mortality. Data were entered directly by individual researchers into an online structured data extraction table. For some sources, counts of male confirmed cases or male deaths were not provided, but percentages of male cases or male deaths were provided instead. To include these sources and avoid biases that might be introduced by their exclusion, we calculated counts of male confirmed cases and male deaths from the reported percentages with rounding to the nearest integer. We acknowledge that this approach assumes that the reported percentages are reflective of the true percentages. For some sources, data included confirmed cases and deaths of unknown sex. For these sources, the reported totals were used where the proportion of unknown sex was small. This approach was preferred to excluding cases of unknown sex in order to avoid bias. The estimates represent the proportion of known male infections and odds ratios for mortality associated with known male sex, and will differ slightly from what the true values would be if the sex had been reported for all cases. Data were available at the level of country or regional summary data representing distinct individuals for each report, but not at the level of covariates for all individuals within a study. Consequently, covariates such as lifestyle, comorbidities, testing method and case type (hospital vs. community) could not be controlled for.

  16. s

    Data from: Correction for bias in meta-analysis of little-replicated studies...

    • eprints.soton.ac.uk
    • data.niaid.nih.gov
    • +3more
    Updated Jan 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doncaster, C. Patrick; Spake, Rebecca (2018). Data from: Correction for bias in meta-analysis of little-replicated studies [Dataset]. http://doi.org/10.5061/dryad.5f4g6
    Explore at:
    Dataset updated
    Jan 1, 2018
    Dataset provided by
    DRYAD
    Authors
    Doncaster, C. Patrick; Spake, Rebecca
    Description

    Data S1R script for simulations. Simulations of fixed- and random-effects meta-analysis using alternative estimators: one-sample mean, two-sample Hedges' g, and two-sample lnR, for comparison of performance by inverse-variance weighting and inverse-adjusted-variance weighting.Doncaster&Spake_Data_S1.txtData S2R script for calculating mean-adjusted error variance. Finds the mean-adjusted study variance for all the primary studies contributing to a meta-analysis, for a one-sample mean, or two-sample log response ratio, or or two-sample Hedges' g.Doncaster&Spake_Data_S2.txt,1. Meta-analyses conventionally weight study estimates on the inverse of their error variance, in order to maximize precision. Unbiased variability in the estimates of these study-level error variances increases with the inverse of study-level replication. Here we demonstrate how this variability accumulates asymmetrically across studies in precision-weighted meta-analysis, to cause undervaluation of the meta-level effect size or its error variance (the meta-effect and meta-variance). 2. Small samples, typical of the ecological literature, induce big sampling errors in variance estimation, which substantially bias precision-weighted meta-analysis. Simulations revealed that biases differed little between random- and fixed-effects tests. Meta-estimation of a one-sample mean from 20 studies, with sample sizes of 3 to 20 observations, undervalued the meta-variance by ~20%. Meta-analysis of two-sample designs from 20 studies, with sample sizes of 3 to 10 observations, undervalued the meta-variance by 15-20% for the log response ratio (lnR); it undervalued the meta-effect by ~10% for the standardised mean difference (SMD). 3. For all estimators, biases were eliminated or reduced by a simple adjustment to the weighting on study precision. The study-specific component of error variance prone to sampling error and not parametrically attributable to study-specific replication was replaced by its cross-study mean, on the assumption of random sampling from the same population variance for all studies, and sufficient studies for averaging. Weighting each study by the inverse of this mean-adjusted error variance universally improved accuracy in estimation of both the meta-effect and its significance, regardless of number of studies. For comparison, weighting only on sample size gave the same improvement in accuracy, but could not sensibly estimate significance. 4. For the one-sample mean and two-sample lnR, adjusted weighting also improved estimation of between-study variance by DerSimonian-Laird and REML methods. For random-effects meta-analysis of SMD from little-replicated studies, the most accurate meta-estimates obtained from adjusted weights following conventionally-weighted estimation of between-study variance. 5. We recommend adoption of weighting by inverse adjusted-variance for meta-analyses of well- and little-replicated studies, because it improves accuracy and significance of meta-estimates, and it can extend the scope of the meta-analysis to include some studies without variance estimates.

  17. Data bias

    • kaggle.com
    zip
    Updated Mar 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tyur muthia (2022). Data bias [Dataset]. https://www.kaggle.com/datasets/tyurmuthia/data-bias
    Explore at:
    zip(654062 bytes)Available download formats
    Dataset updated
    Mar 11, 2022
    Authors
    tyur muthia
    Description

    Dataset

    This dataset was created by tyur muthia

    Contents

  18. m

    Data from: Prolific observer bias in the life sciences: why we need blind...

    • figshare.mq.edu.au
    • datasetcatalog.nlm.nih.gov
    • +4more
    bin
    Updated Jun 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions (2023). Data from: Prolific observer bias in the life sciences: why we need blind data recording [Dataset]. http://doi.org/10.5061/dryad.hn40n
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    Macquarie University
    Authors
    Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Observer bias and other “experimenter effects” occur when researchers’ expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work “blind,” meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.

    Usage Notes Evolution literature review dataExact p value datasetjournal_categoriesp values data 24 SeptProportion of significant p values per paperR script to filter and classify the p value dataQuiz answers - guessing effect size from abstractsThe answers provided by the 9 evolutionary biologists to quiz we designed, which aimed to test whether trained specialists are able to infer the relative size/direction of effect size from a paper's title and abstract.readmeDescription of the contents of all the other files in this Dryad submission.R script to statistically analyse the p value dataR script detailing the statistical analyses we performed on the p value datasets.

  19. f

    Data Sheet 1_Biases in AI: acknowledging and addressing the inevitable...

    • frontiersin.figshare.com
    pdf
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bjørn Hofmann (2025). Data Sheet 1_Biases in AI: acknowledging and addressing the inevitable ethical issues.pdf [Dataset]. http://doi.org/10.3389/fdgth.2025.1614105.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 20, 2025
    Dataset provided by
    Frontiers
    Authors
    Bjørn Hofmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Biases in artificial intelligence (AI) systems pose a range of ethical issues. The myriads of biases in AI systems are briefly reviewed and divided in three main categories: input bias, system bias, and application bias. These biases pose a series of basic ethical challenges: injustice, bad output/outcome, loss of autonomy, transformation of basic concepts and values, and erosion of accountability. A review of the many ways to identify, measure, and mitigate these biases reveals commendable efforts to avoid or reduce bias; however, it also highlights the persistence of unresolved biases. Residual and undetected biases present epistemic challenges with substantial ethical implications. The article further investigates whether the general principles, checklists, guidelines, frameworks, or regulations of AI ethics could address the identified ethical issues with bias. Unfortunately, the depth and diversity of these challenges often exceed the capabilities of existing approaches. Consequently, the article suggests that we must acknowledge and accept some residual ethical issues related to biases in AI systems. By utilizing insights from ethics and moral psychology, we can better navigate this landscape. To maximize the benefits and minimize the harms of biases in AI, it is imperative to identify and mitigate existing biases and remain transparent about the consequences of those we cannot eliminate. This necessitates close collaboration between scientists and ethicists.

  20. Data from: Decisions reduce sensitivity to subsequent information

    • zenodo.org
    • datadryad.org
    Updated May 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zohar Z. Bronfman; Noam Brezis; Rani Moran; Konstantinos Tsetsos; Tobias Donner; Marius Usher; Zohar Z. Bronfman; Noam Brezis; Rani Moran; Konstantinos Tsetsos; Tobias Donner; Marius Usher (2022). Data from: Decisions reduce sensitivity to subsequent information [Dataset]. http://doi.org/10.5061/dryad.40f6v
    Explore at:
    Dataset updated
    May 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zohar Z. Bronfman; Noam Brezis; Rani Moran; Konstantinos Tsetsos; Tobias Donner; Marius Usher; Zohar Z. Bronfman; Noam Brezis; Rani Moran; Konstantinos Tsetsos; Tobias Donner; Marius Usher
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Behavioural studies over half a century indicate that making categorical choices alters beliefs about the state of the world. People seem biased to confirm previous choices, and to suppress contradicting information. These choice-dependent biases imply a fundamental bound of human rationality. However, it remains unclear whether these effects extend to lower level decisions, and only little is known about the computational mechanisms underlying them. Building on the framework of sequential-sampling models of decision-making, we developed novel psychophysical protocols that enable us to dissect quantitatively how choices affect the way decision-makers accumulate additional noisy evidence. We find robust choice-induced biases in the accumulation of abstract numerical (experiment 1) and low-level perceptual (experiment 2) evidence. These biases deteriorate estimations of the mean value of the numerical sequence (experiment 1) and reduce the likelihood to revise decisions (experiment 2). Computational modelling reveals that choices trigger a reduction of sensitivity to subsequent evidence via multiplicative gain modulation, rather than shifting the decision variable towards the chosen alternative in an additive fashion. Our results thus show that categorical choices alter the evidence accumulation mechanism itself, rather than just its outcome, rendering the decision-maker less sensitive to new information.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bahraleloom Mahjoub Alsadeg Abdalrahem (2024). Bias in Advertising Data [Dataset]. https://www.kaggle.com/datasets/bahraleloom/bias-in-advertising-data
Organization logo

Bias in Advertising Data

Explore at:
zip(18491738 bytes)Available download formats
Dataset updated
Apr 6, 2024
Authors
Bahraleloom Mahjoub Alsadeg Abdalrahem
License

https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

Description

To demonstrate discovery, measurement, and mitigation of bias in advertising, we provide a dataset that contains synthetic generated data for users who were shown a certain advertisement (ad). Each instance of the dataset is specific to a user and has feature attributes such as gender, age, income, political/religious affiliation, parental status, home ownership, area (rural/urban), and education status. In addition to the features we also provide information on whether users actually clicked on or were predicted to click on the ad. Clicking on the ad is known as conversion, and the three outcome variables included are: (1) The predicted probability of conversion, (2) Predicted conversion (binary 0/1) which is obtained by thresholding the predicted probability, (3) True conversion (binary 0/1) that indicates whether the user actually clicked on the ad.

Search
Clear search
Close search
Google apps
Main menu