100+ datasets found

f
Navigating News Narratives: A Media Bias Analysis Dataset
figshare.com
txt
Updated Dec 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaina Raza (2023). Navigating News Narratives: A Media Bias Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24422122.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24422122.v4
Dataset updated
Dec 8, 2023
Dataset provided by
figshare
Authors
Shaina Raza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0
Opinion on mitigating AI data bias in healthcare worldwide 2024
statista.com
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Opinion on mitigating AI data bias in healthcare worldwide 2024 [Dataset]. https://www.statista.com/statistics/1559311/ways-to-mitigate-ai-bias-in-healthcare-worldwide/
Explore at:
Dataset updated
Mar 20, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2023 - Mar 2024
Area covered
Worldwide
Description
According to a survey of healthcare leaders carried out globally in 2024, almost half of respondents believed that by making AI more transparent and interpretable, this would mitigate the risk of data bias in AI applications for healthcare. Furthermore, 46 percent of healthcare leaders thought there should be continuous training and education in AI.
o
Data and Code for: Confidence, Self-Selection and Bias in the Aggregate
openicpsr.org
delimited
Updated Mar 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Enke; Thomas Graeber; Ryan Oprea (2023). Data and Code for: Confidence, Self-Selection and Bias in the Aggregate [Dataset]. http://doi.org/10.3886/E185741V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E185741V1
Dataset updated
Mar 2, 2023
Dataset provided by
American Economic Association
Authors
Benjamin Enke; Thomas Graeber; Ryan Oprea
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The influence of behavioral biases on aggregate outcomes depends in part on self-selection: whether rational people opt more strongly into aggregate interactions than biased individuals. In betting market, auction and committee experiments, we document that some errors are strongly reduced through self-selection, while others are not affected at all or even amplified. A large part of this variation is explained by differences in the relationship between confidence and performance. In some tasks, they are positively correlated, such that self-selection attenuates errors. In other tasks, rational and biased people are equally confident, such that self-selection has no effects on aggregate quantities.
f
Data_Sheet_1_Data and model bias in artificial intelligence for healthcare...
frontiersin.figshare.com
zip
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith (2023). Data_Sheet_1_Data and model bias in artificial intelligence for healthcare applications in New Zealand.zip [Dataset]. http://doi.org/10.3389/fcomp.2022.1070493.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fcomp.2022.1070493.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New Zealand
Description
IntroductionDevelopments in Artificial Intelligence (AI) are adopted widely in healthcare. However, the introduction and use of AI may come with biases and disparities, resulting in concerns about healthcare access and outcomes for underrepresented indigenous populations. In New Zealand, Māori experience significant inequities in health compared to the non-Indigenous population. This research explores equity concepts and fairness measures concerning AI for healthcare in New Zealand.MethodsThis research considers data and model bias in NZ-based electronic health records (EHRs). Two very distinct NZ datasets are used in this research, one obtained from one hospital and another from multiple GP practices, where clinicians obtain both datasets. To ensure research equality and fair inclusion of Māori, we combine expertise in Artificial Intelligence (AI), New Zealand clinical context, and te ao Māori. The mitigation of inequity needs to be addressed in data collection, model development, and model deployment. In this paper, we analyze data and algorithmic bias concerning data collection and model development, training and testing using health data collected by experts. We use fairness measures such as disparate impact scores, equal opportunities and equalized odds to analyze tabular data. Furthermore, token frequencies, statistical significance testing and fairness measures for word embeddings, such as WEAT and WEFE frameworks, are used to analyze bias in free-form medical text. The AI model predictions are also explained using SHAP and LIME.ResultsThis research analyzed fairness metrics for NZ EHRs while considering data and algorithmic bias. We show evidence of bias due to the changes made in algorithmic design. Furthermore, we observe unintentional bias due to the underlying pre-trained models used to represent text data. This research addresses some vital issues while opening up the need and opportunity for future research.DiscussionsThis research takes early steps toward developing a model of socially responsible and fair AI for New Zealand's population. We provided an overview of reproducible concepts that can be adopted toward any NZ population data. Furthermore, we discuss the gaps and future research avenues that will enable more focused development of fairness measures suitable for the New Zealand population's needs and social structure. One of the primary focuses of this research was ensuring fair inclusions. As such, we combine expertise in AI, clinical knowledge, and the representation of indigenous populations. This inclusion of experts will be vital moving forward, proving a stepping stone toward the integration of AI for better outcomes in healthcare.
l
Data - Biases in the metabarcoding of plant pathogens - Dataset - DataStore
datastore.landcareresearch.co.nz
Updated Dec 13, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Data - Biases in the metabarcoding of plant pathogens - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/biases-in-the-metabarcoding-of-plant-pathogens
Explore at:
Dataset updated
Dec 13, 2018
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We investigated and analysed the causes of differences between next-generation sequencing metabarcoding approaches and traditional DNA cloning in the detection and quantification of recognized species of rust fungi from environmental samples. The data support this article: Makiola A, Dickie IA, Holdaway RJ, Wood JR, Orwin KH, Lee CK, Glare TR. 2018. Biases in the metabarcoding of plant pathogens using rust fungi as a model system. MicrobiologyOpen. The resources (data files) represent the raw sequence data for analysis supporting this manuscript. Leaf samples from 30 sites were collected and analysed using Illumina MiSeq (folder ‘Illumina’), Ion Torrent PGM (file ‘IonTorrent.fastq’), cloning followed by Sanger sequencing (file ‘CloningSanger.fna’). The ‘barcodes.csv’ file contains the barcode names and the corresponding sites.
h
debiased_dataset
huggingface.co
Updated Sep 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
News Media Biases (2023). debiased_dataset [Dataset]. http://doi.org/10.57967/hf/1050
Explore at:
Unique identifier
https://doi.org/10.57967/hf/1050
Dataset updated
Sep 18, 2023
Dataset authored and provided by
News Media Biases
License
https://choosealicense.com/licenses/creativeml-openrail-m/https://choosealicense.com/licenses/creativeml-openrail-m/
Description
Dataset Description

About the Dataset: This dataset contains text data that has been processed to identify biased statements based on dimensions and aspects. Each entry has been processed using the GPT-4 language model and manually verified by 5 human annotators for quality assurance. Purpose: The dataset aims to help train and evaluate machine learning models in detecting, classifying, and correcting biases in text content, making it essential for NLP research related to fairness… See the full description on the dataset page: https://huggingface.co/datasets/newsmediabias/debiased_dataset.
NewsMediaBias-Plus Dataset
zenodo.org
huggingface.co
bin, zip
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaina Raza; Shaina Raza (2024). NewsMediaBias-Plus Dataset [Dataset]. http://doi.org/10.5281/zenodo.13961155
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13961155
Dataset updated
Nov 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shaina Raza; Shaina Raza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NewsMediaBias-Plus Dataset

Overview

The NewsMediaBias-Plus dataset is designed for the analysis of media bias and disinformation by combining textual and visual data from news articles. It aims to support research in detecting, categorizing, and understanding biased reporting in media outlets.

Dataset Description

NewsMediaBias-Plus pairs news articles with relevant images and annotations indicating perceived biases and the reliability of the content. It adds a multimodal dimension for bias detection in news media.

Contents

unique_id: Unique identifier for each news item. Each unique_id matches an image for the same article.

outlet: The publisher of the article.

headline: The headline of the article.

article_text: The full content of the news article.

image_description: Description of the paired image.

image: The file path of the associated image.

date_published: The date the article was published.

source_url: The original URL of the article.

canonical_link: The canonical URL of the article.

new_categories: Categories assigned to the article.

news_categories_confidence_scores: Confidence scores for each category.

Annotation Labels

text_label: Indicates the likelihood of the article being disinformation:

Likely: Likely to be disinformation.

Unlikely: Unlikely to be disinformation.

multimodal_label: Indicates the likelihood of disinformation from the combination of the text snippet and image content:

Likely: Likely to be disinformation.

Unlikely: Unlikely to be disinformation.

Getting Started

Prerequisites

Python 3.6+

Pandas

Hugging Face Datasets

Hugging Face Hub

Installation

Load the dataset into Python:

python

Copy code

from datasets import load_dataset ds = load_dataset("vector-institute/newsmediabias-plus") print(ds) # View structure and splits print(ds['train'][0]) # Access the first record of the train split print(ds['train'][:5]) # Access the first five records

Load a Few Records

python

Copy code

from datasets import load_dataset # Load the dataset in streaming mode streamed_dataset = load_dataset("vector-institute/newsmediabias-plus", streaming=True) # Get an iterable dataset dataset_iterable = streamed_dataset['train'].take(5) # Print the records for record in dataset_iterable: print(record)

Contributions

Contributions are welcome! You can:

Add Data: Contribute more data points.

Refine Annotations: Improve annotation accuracy.

Share Usage Examples: Help others use the dataset effectively.

To contribute, fork the repository and create a pull request with your changes.

License

This dataset is released under a non-commercial license. See the LICENSE file for more details.

Citation

Please cite the dataset using this BibTeX entry:

bibtex

Copy code

@misc{vector_institute_2024_newsmediabias_plus, title={NewsMediaBias-Plus: A Multimodal Dataset for Analyzing Media Bias}, author={Vector Institute Research Team}, year={2024}, url={https://huggingface.co/datasets/vector-institute/newsmediabias-plus} }

Contact

For questions or support, contact Shaina Raza at: shaina.raza@vectorinstitute.ai

Disclaimer and User Guidance

Disclaimer: The labels Likely and Unlikely are based on LLM annotations and expert assessments, intended for informational use only. They should not be considered final judgments.

Guidance: This dataset is for research purposes. Cross-reference findings with other reliable sources before drawing conclusions. The dataset aims to encourage critical thinking, not provide definitive classifications.
f
fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical...
frontiersin.figshare.com
pdf
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra Olteanu; Carlos Castillo; Fernando Diaz; Emre Kıcıman (2023). fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries.pdf [Dataset]. http://doi.org/10.3389/fdata.2019.00013.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00013.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Alexandra Olteanu; Carlos Castillo; Fernando Diaz; Emre Kıcıman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Social data in digital form—including user-generated content, expressed or implicit relations between people, and behavioral traces—are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many, including understanding “what the world thinks” about a social issue, brand, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the naïve usage of social data. There are biases and inaccuracies occurring at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This paper recognizes the rigor with which these issues are addressed by different researchers varies across a wide range. We identify a variety of menaces in the practices around social data use, and organize them in a framework that helps to identify them.“For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated.” –Ursula Franklin1
c
Data from: Racial Bias in AI-Generated Images
datacatalogue.cessda.eu
openicpsr.org
+1more
Updated Sep 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Y. Yang (2024). Racial Bias in AI-Generated Images [Dataset]. http://doi.org/10.17026/SS/O9M6VR
Explore at:
Unique identifier
https://doi.org/10.17026/SS/O9M6VR
Dataset updated
Sep 3, 2024
Dataset provided by
Radboud University
Authors
Y. Yang
Time period covered
Jul 16, 2023 - Jul 23, 2023
Description
This file is supplementary material for the manuscript Racial Bias in AI-Generated Images, which has been submitted to a peer-reviewed journal.This dataset/paper examined the image-to-image generation accuracy (i.e., the original race and gender of a person’s image were replicated in the new AI-generated image) of a Chinese AI-powered image generator. We examined the image-to-image generation models transforming the racial and gender categories of the original photos of White, Black and East Asian people (N =1260) in three different racial photo contexts: a single person, two people of the same race, and two people of different races. There are original images (e.g., WW1), AI-generated images (e.g., AM1_1, AM1_2, AM1_3), and SPSS files (Yang 230801 Racial bias in Meitu_Accuracy Paper.sav) in this dataset.
h
Data from: Cognitive Abilities and Behavioral Biases [Dataset]
heidata.uni-heidelberg.de
Updated May 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jörg Oechssler; Andreas Roider; Patrick W. Schmitz; Jörg Oechssler; Andreas Roider; Patrick W. Schmitz (2019). Cognitive Abilities and Behavioral Biases [Dataset] [Dataset]. http://doi.org/10.11588/DATA/FC6TFM
Explore at:
tsv(47564), pdf(36907), tsv(34052), pdf(55843), application/x-spss-syntax(5061), text/x-tex(3187)Available download formats
Unique identifier
https://doi.org/10.11588/DATA/FC6TFM
Dataset updated
May 2, 2019
Dataset provided by
heiDATA
Authors
Jörg Oechssler; Andreas Roider; Patrick W. Schmitz; Jörg Oechssler; Andreas Roider; Patrick W. Schmitz
License
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/FC6TFMhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/FC6TFM
Description
We use a simple, three-item test for cognitive abilities to investigate whether established behavioral biases that play a prominent role in behavioral economics and finance are related to cognitive abilities.We find that higher test scores on the cognitive reflection test of Frederick [Frederick, S., 2005. Cognitive reflection and decision-making. Journal of Economic Perspectives 19, 25–42] indeed are correlated with lower incidences of the conjunction fallacy and conservatism in updating probabilities. Test scores are also significantly related to subjects’ time and risk preferences. Test scores have no influence on the amount of anchoring, although there is evidence of anchoring among all subjects. Even if incidences of most biases are lower for people with higher cognitive abilities, they still remain substantial.
Z
Data from: Diversity matters: Robustness of bias measurements in Wikidata
data.niaid.nih.gov
Updated May 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sai Keerthana Karnam (2023). Diversity matters: Robustness of bias measurements in Wikidata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7881057
Explore at:
Dataset updated
May 1, 2023
Dataset provided by
Sai Keerthana Karnam
Soumya Sarkar
Paramita das
Anirban Panda
Animesh Mukherjee
Bhanu Prakash Reddy Guda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the widespread use of knowledge graphs (KG) in various automated AI systems and applications, it is very important to ensure that information retrieval algorithms leveraging them are free from societal biases. Previous works have depicted biases that persist in KGs, as well as employed several metrics for measuring the biases. However, such studies lack the systematic exploration of the sensitivity of the bias measurements, through varying sources of data, or the embedding algorithms used. To address this research gap, in this work, we present a holistic analysis of bias measurement on the knowledge graph. First, we attempt to reveal data biases that surface in Wikidata for thirteen different demographics selected from seven continents. Next, we attempt to unfold the variance in the detection of biases by two different knowledge graph embedding algorithms - TransE and ComplEx. We conduct our extensive experiments on a large number of occupations sampled from the thirteen demographics with respect to the sensitive attribute, i.e., gender. Our results show that the inherent data bias that persists in KG can be altered by specific algorithm bias as incorporated by KG embedding learning algorithms. Further, we show that the choice of the state-of-the-art KG embedding algorithm has a strong impact on the ranking of biased occupations irrespective of gender. We observe that the similarity of the biased occupations across demographics is minimal which reflects the socio-cultural differences around the globe. We believe that this full-scale audit of the bias measurement pipeline will raise awareness among the community while deriving insights related to design choices of data and algorithms both and refrain from the popular dogma of ``one-size-fits-all''.
f
fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical...
frontiersin.figshare.com
bin
Updated Jun 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra Olteanu; Carlos Castillo; Fernando Diaz; Emre Kıcıman (2023). fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries.xml [Dataset]. http://doi.org/10.3389/fdata.2019.00013.s002
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00013.s002
Dataset updated
Jun 4, 2023
Dataset provided by
Frontiers
Authors
Alexandra Olteanu; Carlos Castillo; Fernando Diaz; Emre Kıcıman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Social data in digital form—including user-generated content, expressed or implicit relations between people, and behavioral traces—are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many, including understanding “what the world thinks” about a social issue, brand, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the naïve usage of social data. There are biases and inaccuracies occurring at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This paper recognizes the rigor with which these issues are addressed by different researchers varies across a wide range. We identify a variety of menaces in the practices around social data use, and organize them in a framework that helps to identify them.“For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated.” –Ursula Franklin1
d
Data from: Ignoring species availability biases occupancy estimates in...
datadryad.org
data.usgs.gov
+4more
zip
Updated Mar 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Graziella DiRenzo; David Miller; Evan Grant (2022). Ignoring species availability biases occupancy estimates in single-scale occupancy models [Dataset]. http://doi.org/10.5061/dryad.fxpnvx0rv
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.fxpnvx0rv
Dataset updated
Mar 18, 2022
Dataset provided by
Dryad
Authors
Graziella DiRenzo; David Miller; Evan Grant
Time period covered
2021
Description
Most applications of single-scale occupancy models do not differentiate between availability and detectability, even though species availability is rarely equal to one. Species availability can be estimated using multi-scale occupancy models, and the availability process includes elements of species movement, behavior, and phenology. However, for the practical application of multi-scale occupancy models, it can be unclear what a robust sampling design looks like and what the statistical properties of the multi-scale and single-scale occupancy models are when availability is less than one.

Using simulations, we explore the following common questions asked by ecologists during the design phase of a field study: (Q1) what is a robust sampling design for the multi-scale occupancy model when there are a priori expectations of parameter estimates?, (Q2) what is a robust sampling design when we have no expectations of parameter estimates?, and (Q3) can a single-scale occupancy model wit...
H
Replication Data for: Publication Biases in Replication Studies
dataverse.harvard.edu
Updated Sep 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam J. Berinsky; James N. Druckman; Teppei Yamamoto (2022). Replication Data for: Publication Biases in Replication Studies [Dataset]. http://doi.org/10.7910/DVN/BJMZNR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/BJMZNR
Dataset updated
Sep 28, 2022
Dataset provided by
Harvard Dataverse
Authors
Adam J. Berinsky; James N. Druckman; Teppei Yamamoto
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
One of the strongest findings across the sciences is that publication bias occurs. Of particular note is a “file drawer bias” where statistically significant results are privileged over non-significant results. Recognition of this bias, along with increased calls for “open science,” has led to an emphasis on replication studies. Yet, few have explored publication bias and its consequences in replication studies. We offer a model of the publication process involving an initial study and a replication. We use the model to describe three types of publication biases: 1) file drawer bias, 2) a “repeat study” bias against the publication of replication studies, and 3) a “gotcha bias” where replication results that run contrary to a prior study are more likely to be published. We estimate the model’s parameters with a vignette experiment conducted with political science professors teaching at Ph.D.-granting institutions in the United States. We find evidence of all three types of bias, although those explicitly involving replication studies are notably smaller. This bodes well for the replication movement. That said, the aggregation of all of the biases increases the number of false positives in a literature. We conclude by discussing a path for future work on publication biases.
h
political-bias
huggingface.co
Updated May 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Jones (2024). political-bias [Dataset]. https://huggingface.co/datasets/cajcodes/political-bias
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2024
Authors
Christopher Jones
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Political Bias Dataset

Overview

The Political Bias dataset contains 658 synthetic statements, each annotated with a bias rating ranging from 0 to 4. These ratings represent a spectrum from highly conservative (0) to highly liberal (4). The dataset was generated using GPT-4, aiming to facilitate research and development in bias detection and reduction in textual data. Special emphasis was placed on distinguishing between moderate biases on both sides, as this has proven to… See the full description on the dataset page: https://huggingface.co/datasets/cajcodes/political-bias.
f
Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at...
frontiersin.figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee (2023). Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at an Early Stage of COVID-19.docx [Dataset]. http://doi.org/10.3389/fphys.2021.778720.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphys.2021.778720.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Artificial intelligence (AI) technologies have been applied in various medical domains to predict patient outcomes with high accuracy. As AI becomes more widely adopted, the problem of model bias is increasingly apparent. In this study, we investigate the model bias that can occur when training a model using datasets for only one particular gender and aim to present new insights into the bias issue. For the investigation, we considered an AI model that predicts severity at an early stage based on the medical records of coronavirus disease (COVID-19) patients. For 5,601 confirmed COVID-19 patients, we used 37 medical records, namely, basic patient information, physical index, initial examination findings, clinical findings, comorbidity diseases, and general blood test results at an early stage. To investigate the gender-based AI model bias, we trained and evaluated two separate models—one that was trained using only the male group, and the other using only the female group. When the model trained by the male-group data was applied to the female testing data, the overall accuracy decreased—sensitivity from 0.93 to 0.86, specificity from 0.92 to 0.86, accuracy from 0.92 to 0.86, balanced accuracy from 0.93 to 0.86, and area under the curve (AUC) from 0.97 to 0.94. Similarly, when the model trained by the female-group data was applied to the male testing data, once again, the overall accuracy decreased—sensitivity from 0.97 to 0.90, specificity from 0.96 to 0.91, accuracy from 0.96 to 0.91, balanced accuracy from 0.96 to 0.90, and AUC from 0.97 to 0.95. Furthermore, when we evaluated each gender-dependent model with the test data from the same gender used for training, the resultant accuracy was also lower than that from the unbiased model.
P
Article Bias Prediction Dataset
paperswithcode.com
opendatalab.com
Updated Oct 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramy Baly; Giovanni Da San Martino; James Glass; Preslav Nakov (2020). Article Bias Prediction Dataset [Dataset]. https://paperswithcode.com/dataset/article-bias-prediction
Explore at:
Dataset updated
Oct 10, 2020
Authors
Ramy Baly; Giovanni Da San Martino; James Glass; Preslav Nakov
Description
Article-Bias-Prediction Dataset The articles crawled from www.allsides.com are available in the ./data folder, along with the different evaluation splits.

The dataset consists of a total of 37,554 articles. Each article is stored as a JSON object in the ./data/jsons directory, and contains the following fields: 1. ID: an alphanumeric identifier. 2. topic: the topic being discussed in the article. 3. source: the name of the articles's source (example: New York Times) 4. source_url: the URL to the source's homepage (example: www.nytimes.com) 5. url: the link to the actual article. 6. date: the publication date of the article. 7. authors: a comma-separated list of the article's authors. 8. title: the article's title. 9. content_original: the original body of the article, as returned by the newspaper3k Python library. 10. content: the processed and tokenized content, which is used as input to the different models. 11. bias_text: the label of the political bias annotation of the article (left, center, or right). 12. bias: the numeric encoding of the political bias of the article (0, 1, or 2).

The ./data/splits directory contains the two types of splits, as discussed in the paper: random and media-based. For each of these types, we provide the train, validation and test files that contains the articles' IDs belonging to each set, along with their numeric bias label.

Code Under maintenance. To be available soon.

Citation @inproceedings{baly2020we, author = {Baly, Ramy and Da San Martino, Giovanni and Glass, James and Nakov, Preslav}, title = {We Can Detect Your Bias: Predicting the Political Ideology of News Articles}, booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, series = {EMNLP~'20}, NOmonth = {November}, year = {2020} pages = {4982--4991}, NOpublisher = {Association for Computational Linguistics} }
J
Can subjective expectations data be used in choice models? evidence on...
journaldata.zbw.eu
jda-test.zbw.eu
stata do, txt
Updated Dec 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Basit Zafar; Basit Zafar (2022). Can subjective expectations data be used in choice models? evidence on cognitive biases (replication data) [Dataset]. http://doi.org/10.15456/jae.2022320.0722117185
Explore at:
txt(94924), stata do(36089), txt(296483), txt(1496)Available download formats
Unique identifier
https://doi.org/10.15456/jae.2022320.0722117185
Dataset updated
Dec 7, 2022
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Basit Zafar; Basit Zafar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A pervasive concern with the use of subjective data in choice models is that they are biased and endogenous. This paper examines the extent to which cognitive biases plague subjective data, and specifically addresses the questions of: (1) whether cognitive dissonance affects the reporting of beliefs; and (2) whether individuals exert sufficient mental effort when probed about their subjective beliefs. For this purpose, I collect a unique panel dataset of Northwestern University undergraduates which contains their subjective expectations about major-specific outcomes for their chosen major as well as for other alternatives in their choice set. I do not find evidence of cognitive biases systematically affecting the reporting of beliefs. By analyzing patterns of belief updating, I can rule out cognitive dissonance being of serious concern in the current setting. There does not seem to be any systematic (non-classical) measurement error in the reporting of beliefs: I do not find systematic patterns in mental recall of previous responses, or in the extent of rounding in the reported beliefs for the various majors. Comparison of subjective beliefs with objective measures suggests that students have well-formed expectations. Overall, the results paint a favorable picture for the use of subjective expectations data in choice models.
d
Data from: Approach-induced biases in human information sampling
datadryad.org
data.niaid.nih.gov
zip
Updated Jan 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laurence T. Hunt; Robb B. Rutledge; W. M. Nishantha Malalasekera; Steven W. Kennerley; Raymond J. Dolan (2017). Approach-induced biases in human information sampling [Dataset]. http://doi.org/10.5061/dryad.nb41c
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.nb41c
Dataset updated
Jan 5, 2017
Dataset provided by
Dryad
Authors
Laurence T. Hunt; Robb B. Rutledge; W. M. Nishantha Malalasekera; Steven W. Kennerley; Raymond J. Dolan
Time period covered
2017
Description
Raw data, scripts and model codeArchive containing MATLAB analysis .m code, code for computational model, and raw .mat data. Please read the associated readme file.Hunt_infotask.zip
h
Dutch-Government-Data-for-Bias-detection
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milena, Dutch-Government-Data-for-Bias-detection [Dataset]. https://huggingface.co/datasets/milenamileentje/Dutch-Government-Data-for-Bias-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Milena
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Netherlands, Politics of the Netherlands
Description
milenamileentje/Dutch-Government-Data-for-Bias-detection dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Shaina Raza (2023). Navigating News Narratives: A Media Bias Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24422122.v4

Navigating News Narratives: A Media Bias Analysis Dataset

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24422122.v4

Dataset updated

Dec 8, 2023

Dataset provided by

figshare

Authors

Shaina Raza

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0

Clear search

Close search

Google apps

Main menu

Navigating News Narratives: A Media Bias Analysis Dataset

Opinion on mitigating AI data bias in healthcare worldwide 2024

Data and Code for: Confidence, Self-Selection and Bias in the Aggregate

Data_Sheet_1_Data and model bias in artificial intelligence for healthcare...

Data - Biases in the metabarcoding of plant pathogens - Dataset - DataStore

debiased_dataset

NewsMediaBias-Plus Dataset

NewsMediaBias-Plus Dataset

Overview

Dataset Description

Contents

Annotation Labels

Getting Started

Prerequisites

Installation

Load a Few Records

Contributions

License

Citation

Contact

Disclaimer and User Guidance

fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical...

Data from: Racial Bias in AI-Generated Images

Data from: Cognitive Abilities and Behavioral Biases [Dataset]

Data from: Diversity matters: Robustness of bias measurements in Wikidata

fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical...

Data from: Ignoring species availability biases occupancy estimates in...

Replication Data for: Publication Biases in Replication Studies

political-bias

Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at...

Article Bias Prediction Dataset

Can subjective expectations data be used in choice models? evidence on...

Data from: Approach-induced biases in human information sampling

Dutch-Government-Data-for-Bias-detection

Navigating News Narratives: A Media Bias Analysis DatasetSee More Versions

Navigating News Narratives: A Media Bias Analysis Dataset