100+ datasets found
  1. f

    Navigating News Narratives: A Media Bias Analysis Dataset

    • figshare.com
    txt
    Updated Dec 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaina Raza (2023). Navigating News Narratives: A Media Bias Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24422122.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    figshare
    Authors
    Shaina Raza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0

  2. Opinion on mitigating AI data bias in healthcare worldwide 2024

    • statista.com
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Opinion on mitigating AI data bias in healthcare worldwide 2024 [Dataset]. https://www.statista.com/statistics/1559311/ways-to-mitigate-ai-bias-in-healthcare-worldwide/
    Explore at:
    Dataset updated
    Mar 20, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2023 - Mar 2024
    Area covered
    Worldwide
    Description

    According to a survey of healthcare leaders carried out globally in 2024, almost half of respondents believed that by making AI more transparent and interpretable, this would mitigate the risk of data bias in AI applications for healthcare. Furthermore, 46 percent of healthcare leaders thought there should be continuous training and education in AI.

  3. o

    Data and Code for: Confidence, Self-Selection and Bias in the Aggregate

    • openicpsr.org
    delimited
    Updated Mar 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Enke; Thomas Graeber; Ryan Oprea (2023). Data and Code for: Confidence, Self-Selection and Bias in the Aggregate [Dataset]. http://doi.org/10.3886/E185741V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Mar 2, 2023
    Dataset provided by
    American Economic Association
    Authors
    Benjamin Enke; Thomas Graeber; Ryan Oprea
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The influence of behavioral biases on aggregate outcomes depends in part on self-selection: whether rational people opt more strongly into aggregate interactions than biased individuals. In betting market, auction and committee experiments, we document that some errors are strongly reduced through self-selection, while others are not affected at all or even amplified. A large part of this variation is explained by differences in the relationship between confidence and performance. In some tasks, they are positively correlated, such that self-selection attenuates errors. In other tasks, rational and biased people are equally confident, such that self-selection has no effects on aggregate quantities.

  4. f

    Data_Sheet_1_Data and model bias in artificial intelligence for healthcare...

    • frontiersin.figshare.com
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith (2023). Data_Sheet_1_Data and model bias in artificial intelligence for healthcare applications in New Zealand.zip [Dataset]. http://doi.org/10.3389/fcomp.2022.1070493.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    IntroductionDevelopments in Artificial Intelligence (AI) are adopted widely in healthcare. However, the introduction and use of AI may come with biases and disparities, resulting in concerns about healthcare access and outcomes for underrepresented indigenous populations. In New Zealand, Māori experience significant inequities in health compared to the non-Indigenous population. This research explores equity concepts and fairness measures concerning AI for healthcare in New Zealand.MethodsThis research considers data and model bias in NZ-based electronic health records (EHRs). Two very distinct NZ datasets are used in this research, one obtained from one hospital and another from multiple GP practices, where clinicians obtain both datasets. To ensure research equality and fair inclusion of Māori, we combine expertise in Artificial Intelligence (AI), New Zealand clinical context, and te ao Māori. The mitigation of inequity needs to be addressed in data collection, model development, and model deployment. In this paper, we analyze data and algorithmic bias concerning data collection and model development, training and testing using health data collected by experts. We use fairness measures such as disparate impact scores, equal opportunities and equalized odds to analyze tabular data. Furthermore, token frequencies, statistical significance testing and fairness measures for word embeddings, such as WEAT and WEFE frameworks, are used to analyze bias in free-form medical text. The AI model predictions are also explained using SHAP and LIME.ResultsThis research analyzed fairness metrics for NZ EHRs while considering data and algorithmic bias. We show evidence of bias due to the changes made in algorithmic design. Furthermore, we observe unintentional bias due to the underlying pre-trained models used to represent text data. This research addresses some vital issues while opening up the need and opportunity for future research.DiscussionsThis research takes early steps toward developing a model of socially responsible and fair AI for New Zealand's population. We provided an overview of reproducible concepts that can be adopted toward any NZ population data. Furthermore, we discuss the gaps and future research avenues that will enable more focused development of fairness measures suitable for the New Zealand population's needs and social structure. One of the primary focuses of this research was ensuring fair inclusions. As such, we combine expertise in AI, clinical knowledge, and the representation of indigenous populations. This inclusion of experts will be vital moving forward, proving a stepping stone toward the integration of AI for better outcomes in healthcare.

  5. l

    Data - Biases in the metabarcoding of plant pathogens - Dataset - DataStore

    • datastore.landcareresearch.co.nz
    Updated Dec 13, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Data - Biases in the metabarcoding of plant pathogens - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/biases-in-the-metabarcoding-of-plant-pathogens
    Explore at:
    Dataset updated
    Dec 13, 2018
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    We investigated and analysed the causes of differences between next-generation sequencing metabarcoding approaches and traditional DNA cloning in the detection and quantification of recognized species of rust fungi from environmental samples. The data support this article: Makiola A, Dickie IA, Holdaway RJ, Wood JR, Orwin KH, Lee CK, Glare TR. 2018. Biases in the metabarcoding of plant pathogens using rust fungi as a model system. MicrobiologyOpen. The resources (data files) represent the raw sequence data for analysis supporting this manuscript. Leaf samples from 30 sites were collected and analysed using Illumina MiSeq (folder ‘Illumina’), Ion Torrent PGM (file ‘IonTorrent.fastq’), cloning followed by Sanger sequencing (file ‘CloningSanger.fna’). The ‘barcodes.csv’ file contains the barcode names and the corresponding sites.

  6. h

    debiased_dataset

    • huggingface.co
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    News Media Biases (2023). debiased_dataset [Dataset]. http://doi.org/10.57967/hf/1050
    Explore at:
    Dataset updated
    Sep 18, 2023
    Dataset authored and provided by
    News Media Biases
    License

    https://choosealicense.com/licenses/creativeml-openrail-m/https://choosealicense.com/licenses/creativeml-openrail-m/

    Description

    Dataset Description

    About the Dataset: This dataset contains text data that has been processed to identify biased statements based on dimensions and aspects. Each entry has been processed using the GPT-4 language model and manually verified by 5 human annotators for quality assurance. Purpose: The dataset aims to help train and evaluate machine learning models in detecting, classifying, and correcting biases in text content, making it essential for NLP research related to fairness… See the full description on the dataset page: https://huggingface.co/datasets/newsmediabias/debiased_dataset.

  7. NewsMediaBias-Plus Dataset

    • zenodo.org
    • huggingface.co
    bin, zip
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaina Raza; Shaina Raza (2024). NewsMediaBias-Plus Dataset [Dataset]. http://doi.org/10.5281/zenodo.13961155
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Nov 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shaina Raza; Shaina Raza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NewsMediaBias-Plus Dataset

    Overview

    The NewsMediaBias-Plus dataset is designed for the analysis of media bias and disinformation by combining textual and visual data from news articles. It aims to support research in detecting, categorizing, and understanding biased reporting in media outlets.

    Dataset Description

    NewsMediaBias-Plus pairs news articles with relevant images and annotations indicating perceived biases and the reliability of the content. It adds a multimodal dimension for bias detection in news media.

    Contents

    • unique_id: Unique identifier for each news item. Each unique_id matches an image for the same article.
    • outlet: The publisher of the article.
    • headline: The headline of the article.
    • article_text: The full content of the news article.
    • image_description: Description of the paired image.
    • image: The file path of the associated image.
    • date_published: The date the article was published.
    • source_url: The original URL of the article.
    • canonical_link: The canonical URL of the article.
    • new_categories: Categories assigned to the article.
    • news_categories_confidence_scores: Confidence scores for each category.

    Annotation Labels

    • text_label: Indicates the likelihood of the article being disinformation:

      • Likely: Likely to be disinformation.
      • Unlikely: Unlikely to be disinformation.
    • multimodal_label: Indicates the likelihood of disinformation from the combination of the text snippet and image content:

      • Likely: Likely to be disinformation.
      • Unlikely: Unlikely to be disinformation.

    Getting Started

    Prerequisites

    • Python 3.6+
    • Pandas
    • Hugging Face Datasets
    • Hugging Face Hub

    Installation

    Load the dataset into Python:

    python
    Copy code
    from datasets import load_dataset ds = load_dataset("vector-institute/newsmediabias-plus") print(ds) # View structure and splits print(ds['train'][0]) # Access the first record of the train split print(ds['train'][:5]) # Access the first five records

    Load a Few Records

    python
    Copy code
    from datasets import load_dataset # Load the dataset in streaming mode streamed_dataset = load_dataset("vector-institute/newsmediabias-plus", streaming=True) # Get an iterable dataset dataset_iterable = streamed_dataset['train'].take(5) # Print the records for record in dataset_iterable: print(record)

    Contributions

    Contributions are welcome! You can:

    • Add Data: Contribute more data points.
    • Refine Annotations: Improve annotation accuracy.
    • Share Usage Examples: Help others use the dataset effectively.

    To contribute, fork the repository and create a pull request with your changes.

    License

    This dataset is released under a non-commercial license. See the LICENSE file for more details.

    Citation

    Please cite the dataset using this BibTeX entry:

    bibtex
    Copy code
    @misc{vector_institute_2024_newsmediabias_plus, title={NewsMediaBias-Plus: A Multimodal Dataset for Analyzing Media Bias}, author={Vector Institute Research Team}, year={2024}, url={https://huggingface.co/datasets/vector-institute/newsmediabias-plus} }

    Contact

    For questions or support, contact Shaina Raza at: shaina.raza@vectorinstitute.ai

    Disclaimer and User Guidance

    Disclaimer: The labels Likely and Unlikely are based on LLM annotations and expert assessments, intended for informational use only. They should not be considered final judgments.

    Guidance: This dataset is for research purposes. Cross-reference findings with other reliable sources before drawing conclusions. The dataset aims to encourage critical thinking, not provide definitive classifications.

  8. f

    fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical...

    • frontiersin.figshare.com
    pdf
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra Olteanu; Carlos Castillo; Fernando Diaz; Emre Kıcıman (2023). fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries.pdf [Dataset]. http://doi.org/10.3389/fdata.2019.00013.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Alexandra Olteanu; Carlos Castillo; Fernando Diaz; Emre Kıcıman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Social data in digital form—including user-generated content, expressed or implicit relations between people, and behavioral traces—are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many, including understanding “what the world thinks” about a social issue, brand, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the naïve usage of social data. There are biases and inaccuracies occurring at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This paper recognizes the rigor with which these issues are addressed by different researchers varies across a wide range. We identify a variety of menaces in the practices around social data use, and organize them in a framework that helps to identify them.“For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated.” –Ursula Franklin1

  9. c

    Data from: Racial Bias in AI-Generated Images

    • datacatalogue.cessda.eu
    • openicpsr.org
    • +1more
    Updated Sep 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Y. Yang (2024). Racial Bias in AI-Generated Images [Dataset]. http://doi.org/10.17026/SS/O9M6VR
    Explore at:
    Dataset updated
    Sep 3, 2024
    Dataset provided by
    Radboud University
    Authors
    Y. Yang
    Time period covered
    Jul 16, 2023 - Jul 23, 2023
    Description

    This file is supplementary material for the manuscript Racial Bias in AI-Generated Images, which has been submitted to a peer-reviewed journal.This dataset/paper examined the image-to-image generation accuracy (i.e., the original race and gender of a person’s image were replicated in the new AI-generated image) of a Chinese AI-powered image generator. We examined the image-to-image generation models transforming the racial and gender categories of the original photos of White, Black and East Asian people (N =1260) in three different racial photo contexts: a single person, two people of the same race, and two people of different races. There are original images (e.g., WW1), AI-generated images (e.g., AM1_1, AM1_2, AM1_3), and SPSS files (Yang 230801 Racial bias in Meitu_Accuracy Paper.sav) in this dataset.

  10. h

    Data from: Cognitive Abilities and Behavioral Biases [Dataset]

    • heidata.uni-heidelberg.de
    Updated May 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jörg Oechssler; Andreas Roider; Patrick W. Schmitz; Jörg Oechssler; Andreas Roider; Patrick W. Schmitz (2019). Cognitive Abilities and Behavioral Biases [Dataset] [Dataset]. http://doi.org/10.11588/DATA/FC6TFM
    Explore at:
    tsv(47564), pdf(36907), tsv(34052), pdf(55843), application/x-spss-syntax(5061), text/x-tex(3187)Available download formats
    Dataset updated
    May 2, 2019
    Dataset provided by
    heiDATA
    Authors
    Jörg Oechssler; Andreas Roider; Patrick W. Schmitz; Jörg Oechssler; Andreas Roider; Patrick W. Schmitz
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/FC6TFMhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/FC6TFM

    Description

    We use a simple, three-item test for cognitive abilities to investigate whether established behavioral biases that play a prominent role in behavioral economics and finance are related to cognitive abilities.We find that higher test scores on the cognitive reflection test of Frederick [Frederick, S., 2005. Cognitive reflection and decision-making. Journal of Economic Perspectives 19, 25–42] indeed are correlated with lower incidences of the conjunction fallacy and conservatism in updating probabilities. Test scores are also significantly related to subjects’ time and risk preferences. Test scores have no influence on the amount of anchoring, although there is evidence of anchoring among all subjects. Even if incidences of most biases are lower for people with higher cognitive abilities, they still remain substantial.

  11. Z

    Data from: Diversity matters: Robustness of bias measurements in Wikidata

    • data.niaid.nih.gov
    Updated May 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sai Keerthana Karnam (2023). Diversity matters: Robustness of bias measurements in Wikidata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7881057
    Explore at:
    Dataset updated
    May 1, 2023
    Dataset provided by
    Sai Keerthana Karnam
    Soumya Sarkar
    Paramita das
    Anirban Panda
    Animesh Mukherjee
    Bhanu Prakash Reddy Guda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the widespread use of knowledge graphs (KG) in various automated AI systems and applications, it is very important to ensure that information retrieval algorithms leveraging them are free from societal biases. Previous works have depicted biases that persist in KGs, as well as employed several metrics for measuring the biases. However, such studies lack the systematic exploration of the sensitivity of the bias measurements, through varying sources of data, or the embedding algorithms used. To address this research gap, in this work, we present a holistic analysis of bias measurement on the knowledge graph. First, we attempt to reveal data biases that surface in Wikidata for thirteen different demographics selected from seven continents. Next, we attempt to unfold the variance in the detection of biases by two different knowledge graph embedding algorithms - TransE and ComplEx. We conduct our extensive experiments on a large number of occupations sampled from the thirteen demographics with respect to the sensitive attribute, i.e., gender. Our results show that the inherent data bias that persists in KG can be altered by specific algorithm bias as incorporated by KG embedding learning algorithms. Further, we show that the choice of the state-of-the-art KG embedding algorithm has a strong impact on the ranking of biased occupations irrespective of gender. We observe that the similarity of the biased occupations across demographics is minimal which reflects the socio-cultural differences around the globe. We believe that this full-scale audit of the bias measurement pipeline will raise awareness among the community while deriving insights related to design choices of data and algorithms both and refrain from the popular dogma of ``one-size-fits-all''.

  12. f

    fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical...

    • frontiersin.figshare.com
    bin
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra Olteanu; Carlos Castillo; Fernando Diaz; Emre Kıcıman (2023). fdata-02-00013_Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries.xml [Dataset]. http://doi.org/10.3389/fdata.2019.00013.s002
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Alexandra Olteanu; Carlos Castillo; Fernando Diaz; Emre Kıcıman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Social data in digital form—including user-generated content, expressed or implicit relations between people, and behavioral traces—are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many, including understanding “what the world thinks” about a social issue, brand, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the naïve usage of social data. There are biases and inaccuracies occurring at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This paper recognizes the rigor with which these issues are addressed by different researchers varies across a wide range. We identify a variety of menaces in the practices around social data use, and organize them in a framework that helps to identify them.“For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated.” –Ursula Franklin1

  13. d

    Data from: Ignoring species availability biases occupancy estimates in...

    • datadryad.org
    • data.usgs.gov
    • +4more
    zip
    Updated Mar 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graziella DiRenzo; David Miller; Evan Grant (2022). Ignoring species availability biases occupancy estimates in single-scale occupancy models [Dataset]. http://doi.org/10.5061/dryad.fxpnvx0rv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 18, 2022
    Dataset provided by
    Dryad
    Authors
    Graziella DiRenzo; David Miller; Evan Grant
    Time period covered
    2021
    Description
    1. Most applications of single-scale occupancy models do not differentiate between availability and detectability, even though species availability is rarely equal to one. Species availability can be estimated using multi-scale occupancy models, and the availability process includes elements of species movement, behavior, and phenology. However, for the practical application of multi-scale occupancy models, it can be unclear what a robust sampling design looks like and what the statistical properties of the multi-scale and single-scale occupancy models are when availability is less than one.

    2. Using simulations, we explore the following common questions asked by ecologists during the design phase of a field study: (Q1) what is a robust sampling design for the multi-scale occupancy model when there are a priori expectations of parameter estimates?, (Q2) what is a robust sampling design when we have no expectations of parameter estimates?, and (Q3) can a single-scale occupancy model wit...

  14. H

    Replication Data for: Publication Biases in Replication Studies

    • dataverse.harvard.edu
    Updated Sep 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam J. Berinsky; James N. Druckman; Teppei Yamamoto (2022). Replication Data for: Publication Biases in Replication Studies [Dataset]. http://doi.org/10.7910/DVN/BJMZNR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Adam J. Berinsky; James N. Druckman; Teppei Yamamoto
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    One of the strongest findings across the sciences is that publication bias occurs. Of particular note is a “file drawer bias” where statistically significant results are privileged over non-significant results. Recognition of this bias, along with increased calls for “open science,” has led to an emphasis on replication studies. Yet, few have explored publication bias and its consequences in replication studies. We offer a model of the publication process involving an initial study and a replication. We use the model to describe three types of publication biases: 1) file drawer bias, 2) a “repeat study” bias against the publication of replication studies, and 3) a “gotcha bias” where replication results that run contrary to a prior study are more likely to be published. We estimate the model’s parameters with a vignette experiment conducted with political science professors teaching at Ph.D.-granting institutions in the United States. We find evidence of all three types of bias, although those explicitly involving replication studies are notably smaller. This bodes well for the replication movement. That said, the aggregation of all of the biases increases the number of false positives in a literature. We conclude by discussing a path for future work on publication biases.

  15. h

    political-bias

    • huggingface.co
    Updated May 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Jones (2024). political-bias [Dataset]. https://huggingface.co/datasets/cajcodes/political-bias
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2024
    Authors
    Christopher Jones
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Political Bias Dataset

      Overview
    

    The Political Bias dataset contains 658 synthetic statements, each annotated with a bias rating ranging from 0 to 4. These ratings represent a spectrum from highly conservative (0) to highly liberal (4). The dataset was generated using GPT-4, aiming to facilitate research and development in bias detection and reduction in textual data. Special emphasis was placed on distinguishing between moderate biases on both sides, as this has proven to… See the full description on the dataset page: https://huggingface.co/datasets/cajcodes/political-bias.

  16. f

    Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at...

    • frontiersin.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee (2023). Data_Sheet_1_Gender Bias in Artificial Intelligence: Severity Prediction at an Early Stage of COVID-19.docx [Dataset]. http://doi.org/10.3389/fphys.2021.778720.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Heewon Chung; Chul Park; Wu Seong Kang; Jinseok Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artificial intelligence (AI) technologies have been applied in various medical domains to predict patient outcomes with high accuracy. As AI becomes more widely adopted, the problem of model bias is increasingly apparent. In this study, we investigate the model bias that can occur when training a model using datasets for only one particular gender and aim to present new insights into the bias issue. For the investigation, we considered an AI model that predicts severity at an early stage based on the medical records of coronavirus disease (COVID-19) patients. For 5,601 confirmed COVID-19 patients, we used 37 medical records, namely, basic patient information, physical index, initial examination findings, clinical findings, comorbidity diseases, and general blood test results at an early stage. To investigate the gender-based AI model bias, we trained and evaluated two separate models—one that was trained using only the male group, and the other using only the female group. When the model trained by the male-group data was applied to the female testing data, the overall accuracy decreased—sensitivity from 0.93 to 0.86, specificity from 0.92 to 0.86, accuracy from 0.92 to 0.86, balanced accuracy from 0.93 to 0.86, and area under the curve (AUC) from 0.97 to 0.94. Similarly, when the model trained by the female-group data was applied to the male testing data, once again, the overall accuracy decreased—sensitivity from 0.97 to 0.90, specificity from 0.96 to 0.91, accuracy from 0.96 to 0.91, balanced accuracy from 0.96 to 0.90, and AUC from 0.97 to 0.95. Furthermore, when we evaluated each gender-dependent model with the test data from the same gender used for training, the resultant accuracy was also lower than that from the unbiased model.

  17. P

    Article Bias Prediction Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Oct 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramy Baly; Giovanni Da San Martino; James Glass; Preslav Nakov (2020). Article Bias Prediction Dataset [Dataset]. https://paperswithcode.com/dataset/article-bias-prediction
    Explore at:
    Dataset updated
    Oct 10, 2020
    Authors
    Ramy Baly; Giovanni Da San Martino; James Glass; Preslav Nakov
    Description

    Article-Bias-Prediction Dataset The articles crawled from www.allsides.com are available in the ./data folder, along with the different evaluation splits.

    The dataset consists of a total of 37,554 articles. Each article is stored as a JSON object in the ./data/jsons directory, and contains the following fields: 1. ID: an alphanumeric identifier. 2. topic: the topic being discussed in the article. 3. source: the name of the articles's source (example: New York Times) 4. source_url: the URL to the source's homepage (example: www.nytimes.com) 5. url: the link to the actual article. 6. date: the publication date of the article. 7. authors: a comma-separated list of the article's authors. 8. title: the article's title. 9. content_original: the original body of the article, as returned by the newspaper3k Python library. 10. content: the processed and tokenized content, which is used as input to the different models. 11. bias_text: the label of the political bias annotation of the article (left, center, or right). 12. bias: the numeric encoding of the political bias of the article (0, 1, or 2).

    The ./data/splits directory contains the two types of splits, as discussed in the paper: random and media-based. For each of these types, we provide the train, validation and test files that contains the articles' IDs belonging to each set, along with their numeric bias label.

    Code Under maintenance. To be available soon.

    Citation @inproceedings{baly2020we, author = {Baly, Ramy and Da San Martino, Giovanni and Glass, James and Nakov, Preslav}, title = {We Can Detect Your Bias: Predicting the Political Ideology of News Articles}, booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, series = {EMNLP~'20}, NOmonth = {November}, year = {2020} pages = {4982--4991}, NOpublisher = {Association for Computational Linguistics} }

  18. J

    Can subjective expectations data be used in choice models? evidence on...

    • journaldata.zbw.eu
    • jda-test.zbw.eu
    stata do, txt
    Updated Dec 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Basit Zafar; Basit Zafar (2022). Can subjective expectations data be used in choice models? evidence on cognitive biases (replication data) [Dataset]. http://doi.org/10.15456/jae.2022320.0722117185
    Explore at:
    txt(94924), stata do(36089), txt(296483), txt(1496)Available download formats
    Dataset updated
    Dec 7, 2022
    Dataset provided by
    ZBW - Leibniz Informationszentrum Wirtschaft
    Authors
    Basit Zafar; Basit Zafar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A pervasive concern with the use of subjective data in choice models is that they are biased and endogenous. This paper examines the extent to which cognitive biases plague subjective data, and specifically addresses the questions of: (1) whether cognitive dissonance affects the reporting of beliefs; and (2) whether individuals exert sufficient mental effort when probed about their subjective beliefs. For this purpose, I collect a unique panel dataset of Northwestern University undergraduates which contains their subjective expectations about major-specific outcomes for their chosen major as well as for other alternatives in their choice set. I do not find evidence of cognitive biases systematically affecting the reporting of beliefs. By analyzing patterns of belief updating, I can rule out cognitive dissonance being of serious concern in the current setting. There does not seem to be any systematic (non-classical) measurement error in the reporting of beliefs: I do not find systematic patterns in mental recall of previous responses, or in the extent of rounding in the reported beliefs for the various majors. Comparison of subjective beliefs with objective measures suggests that students have well-formed expectations. Overall, the results paint a favorable picture for the use of subjective expectations data in choice models.

  19. d

    Data from: Approach-induced biases in human information sampling

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Jan 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laurence T. Hunt; Robb B. Rutledge; W. M. Nishantha Malalasekera; Steven W. Kennerley; Raymond J. Dolan (2017). Approach-induced biases in human information sampling [Dataset]. http://doi.org/10.5061/dryad.nb41c
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 5, 2017
    Dataset provided by
    Dryad
    Authors
    Laurence T. Hunt; Robb B. Rutledge; W. M. Nishantha Malalasekera; Steven W. Kennerley; Raymond J. Dolan
    Time period covered
    2017
    Description

    Raw data, scripts and model codeArchive containing MATLAB analysis .m code, code for computational model, and raw .mat data. Please read the associated readme file.Hunt_infotask.zip

  20. h

    Dutch-Government-Data-for-Bias-detection

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Milena, Dutch-Government-Data-for-Bias-detection [Dataset]. https://huggingface.co/datasets/milenamileentje/Dutch-Government-Data-for-Bias-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Milena
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Netherlands, Politics of the Netherlands
    Description

    milenamileentje/Dutch-Government-Data-for-Bias-detection dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shaina Raza (2023). Navigating News Narratives: A Media Bias Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24422122.v4

Navigating News Narratives: A Media Bias Analysis Dataset

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
Dec 8, 2023
Dataset provided by
figshare
Authors
Shaina Raza
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0

Search
Clear search
Close search
Google apps
Main menu