10 datasets found
  1. AllSides : Ratings of bias in electronic media

    • kaggle.com
    zip
    Updated Sep 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Supratim Haldar (2021). AllSides : Ratings of bias in electronic media [Dataset]. https://www.kaggle.com/datasets/supratimhaldar/allsides-ratings-of-bias-in-electronic-media
    Explore at:
    zip(32548 bytes)Available download formats
    Dataset updated
    Sep 23, 2021
    Authors
    Supratim Haldar
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Context

    Media is the 4th pillar of democracy, so they must execute their duty with rightfulness. While majority does so, very often news articles gets contaminated with personal perspectives of journalists authoring those articles, or the beliefs of people running those media houses. As per Wikipedia definition - media bias is the bias or perceived bias of journalists and news producers within the mass media in the selection of events and stories that are reported and how they are covered.

    Content

    https://www.allsides.com is doing an wonderful job in analyzing the bias of renowned media houses, and showing how a particular news is presented with complete different perspectives by different media publications. Based on analysis, each media publication is assigned a "bias" direction (left, right or neutral). General public can vote to express their opinion if they agree to this analysis. The details in captured in https://www.allsides.com/media-bias/media-bias-ratings and constantly updated based on new votes. The content of this dataset is scraped from this and subsequent pages.

    Acknowledgements

    https://www.allsides.com is the owner of this data and holds all rights to it. Many thanks to them for their effort!

    Inspiration

    A deeper analysis can reveal which side most of the media houses are leaned towards. The analysis can further be extended by comparing news articles on same event by different media publications, and as a final step to build a classifier to find biasness of any random article on the internet just by reading it. This might help fight the battle against fake news as well.

    Allsides will love to see any work which brings out insightful information from this data. Please feel free to share your work with Allsides (https://www.allsides.com/contact).

    Licenses and Attribution

    AllSides Media Bias Ratings by AllSides.com are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You may use this data for research or noncommercial purposes provided you include this attribution.

    For commercial use, or to request this data as a CSV or JSON file, go to www.allsides.com/contact.

  2. CRITEO FAIRNESS IN JOB ADS DATASET

    • kaggle.com
    zip
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Abdur Rahman (2024). CRITEO FAIRNESS IN JOB ADS DATASET [Dataset]. https://www.kaggle.com/datasets/borhanitrash/fairness-in-job-ads-dataset
    Explore at:
    zip(201430692 bytes)Available download formats
    Dataset updated
    Jul 1, 2024
    Authors
    Md. Abdur Rahman
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset is released by Criteo to foster research and innovation on Fairness in Advertising and AI systems in general. See also Criteo pledge for Fairness in Advertising.

    The dataset is intended to learn click predictions models and evaluate by how much their predictions are biased between different gender groups.

    Data description

    The dataset contains pseudononymized users' context and publisher features that was collected from a job targeting campaign ran for 5 months by Criteo AdTech company. Each line represents a product that was shown to a user. Each user has an impression session where they can see several products at the same time. Each product can be clicked or not clicked by the user. The dataset consists of 1072226 rows and 55 columns.

    • features
      • user_id is a unique identifier assigned to each user. This identifier has been anonymized and does not contain any information related to the real users.
      • product_id is a unique identifier assigned to each product, i.e. job offer.
      • impression_id is a unique identifier assigned to each impression, i.e. online session that can have several products at the same time.
      • cat0 to cat5 are anonymized categorical user features.
      • cat6 to cat12 are anonymized categorical product features.
      • num13 to num47 are anonymized numerical user features.
    • labels
      • protected_attribute is a binary feature that describes user gender proxy, i.e. female is 0, male is 1. The detailed description on the meaning can be found below.
      • senior is a binary feature that describes the seniority of the job position, i.e. an assistant role is 0, a managerial role is 1. This feature was created during data processing step from the product title feature: if the product title contains words describing managerial role (e.g. 'president', 'ceo', and others), it is assigned to 1, otherwise to 0.
      • rank is a numerical feature that corresponds to the positional rank of the product on the display for given impression_id. Usually, the position on the display creates the bias with respect to the click: lower rank means higher position of the product on the display.
      • displayrandom is a binary feature that equals 1 if the display position on the banner of the products associated with the same impression_id was randomized. The click-rank metric should be computed on displayrandom = 1 to avoid positional bias.
      • click is a binary feature that equals 1 if the product product_id in the impression impression_id was clicked by the user user_id.

    Data statistics

    dimensionaverage
    click0.077
    protected attribute0.500
    senior0.704

    License

    The data is released under the CC-BY-NC-SA 4.0 license. You are free to Share and Adapt this data provided that you respect the Attribution, NonCommercial and ShareAlike conditions. Please read carefully the full license before using.

    Protected attribute

    As Criteo does not have access to user demographics we report a proxy of gender as protected attribute. This proxy is reported as binary for simplicity yet we acknowledge gender is not necessarily binary.

    The value of the proxy is computed as the majority of gender attributes of products seen in the user timeline. Product having a gender attribute are typically fashion and clothing. We acknowledge that this proxy does not necessarily represent how users relate to a given gender yet we believe it to be a realistic approximation for research purposes.

    We encourage research in Fairness defined with respect to other attributes as well.

    Limitations and interpretations

    We remark that the proposed gender proxy does not give a definition of the gender. Since we do not have access to the sensitive information, this is the best solution we have identified at this stage to idenitify bias on pseudonymised data, and we encourage any discussion on better approximations. This proxy is reported as binary for simplicity yet we acknowledge gender is not necessarily binary. Although our research focuses on gender, this should not diminish the importance of investigating other types of algorithmic discrimination. While this dataset provides important application of fairness-aware algorithms in a high-risk domain, there are several fundamental limitation that can not be addressed easily through data collection or curation processes. These limitations in...

  3. d

    Data from: A large-scale assessment of ant diversity across the Brazilian...

    • search.dataone.org
    • zenodo.org
    • +1more
    Updated May 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joudellys Andrade-Silva; Fabricio Baccaro; LÃvia Prado; Benoit Guenard; Dan Warren; Jamie Kass; Evan Economo; Rogerio Silva (2025). A large-scale assessment of ant diversity across the Brazilian Amazon Basin: integrating geographic, ecological, and morphological drivers of sampling bias [Dataset]. http://doi.org/10.5061/dryad.ht76hdrj8
    Explore at:
    Dataset updated
    May 12, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Joudellys Andrade-Silva; Fabricio Baccaro; Lívia Prado; Benoit Guenard; Dan Warren; Jamie Kass; Evan Economo; Rogerio Silva
    Time period covered
    May 12, 2022
    Area covered
    Brazil
    Description

    Tropical ecosystems are often biodiversity hotspots, and invertebrates represent the main underrepresented component of diversity in large-scale analyses. This problem is partly related to the scarcity of data widely available to conduct these studies and the lack of systematic organization of knowledge about invertebrates’ distributions in biodiversity hotspots. Here, we introduce and analyze a comprehensive data compilation of Amazonian ant diversity. Using records from 1817 to 2020 from both published and unpublished sources, we describe the diversity and distribution of ant species in the Brazilian Amazon Basin. Further, using high-definition images and data from taxonomic publications, we build a comprehensive database of morphological traits for the ant species that occur in the region. In total, we recorded 1,067 nominal species in the Brazilian Amazon Basin, with sampling locations strongly biased by access routes, urban centers, research institutions, and major infrastructure p...

  4. Data from: A virtual multi-label approach to imbalanced data classification

    • tandf.figshare.com
    text/x-tex
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth P. Chou; Shan-Ping Yang (2024). A virtual multi-label approach to imbalanced data classification [Dataset]. http://doi.org/10.6084/m9.figshare.19390561.v1
    Explore at:
    text/x-texAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Elizabeth P. Chou; Shan-Ping Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods.

  5. Utrecht Fairness Recruitment dataset

    • kaggle.com
    zip
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ICT Institute (2025). Utrecht Fairness Recruitment dataset [Dataset]. https://www.kaggle.com/datasets/ictinstitute/utrecht-fairness-recruitment-dataset
    Explore at:
    zip(47198 bytes)Available download formats
    Dataset updated
    Mar 11, 2025
    Authors
    ICT Institute
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    Utrecht
    Description

    This dataset is a purely synthetic dataset created to help educators and researchers understand fairness definitions. It is a convenient way to illustrate differences between different definitions, such as fairness through unawareness, group fairness, statistical parity, predictive parity equalised odds or treatment equality. The dataset contains multiple sensitive features: age, gender and lives-near-by. These can be combined to define many different sensitive groups. The dataset contains the decisions of five example decisions methods that can be evaluated. When using this dataset, you do not need to train your own methods. Instead you can focus on evaluation the existing models.

    This dataset is described and analysed in the following paper. Please cite this paper when using this dataset:

    *Burda, P and Van Otterloo, S. 2024. * Fairness definitions explained and illustrated with examples. Computers and Society Research Journal, 2025 (2). [https://doi.org/10.54822/PASR6281 ]

  6. n

    Data from: STRUCTURE is more robust than other clustering methods in...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jun 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Stift; Filip Kolar; Patrick G. Meirmans (2019). STRUCTURE is more robust than other clustering methods in simulated mixed-ploidy populations [Dataset]. http://doi.org/10.5061/dryad.6g635f6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 19, 2019
    Authors
    Marc Stift; Filip Kolar; Patrick G. Meirmans
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Analyses of population genetic structure has become a standard approach in population genetics. In polyploid complexes, clustering analyses can elucidate the origin of polyploid populations and patterns of admixture between different cytotypes. However, combining diploid and polyploid data can theoretically lead to biased inference with (artefactual) clustering by ploidy. We used simulated mixed-ploidy (diploid-autotetraploid) data to systematically compare the performance of k-means clustering and the model-based clustering methods implemented in STRUCTURE, ADMIXTURE, FASTSTRUCTURE and INSTRUCT under different scenarios of differentiation and with different marker types. Under scenarios of strong population differentiation, the tested applications performed equally well. However, when population differentiation was weak, STRUCTURE was the only method that allowed unbiased inference with markers with limited genotypic information (co-dominant markers with unknown do sage or dominant markers). Still, since STRUCTURE was comparably slow the much faster but less powerful FASTSTRUCTURE provides a reasonable alternative for large datasets. Finally, although bias makes k-means clustering unsuitable for markers with incomplete genotype information, given large numbers of loci (>1000) with known dosage k-means clustering was superior to FASTSTRUCTURE in terms of power and speed. We conclude that STRUCTURE is the most robust method for the analysis of genetic structure in mixed-ploidy populations, although alternative methods should be considered under some specific conditions.

  7. GPT Detectors - Did a GPT write this?

    • kaggle.com
    zip
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujay Kapadnis (2023). GPT Detectors - Did a GPT write this? [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/gpt-detectors
    Explore at:
    zip(64023 bytes)Available download formats
    Dataset updated
    Sep 20, 2023
    Authors
    Sujay Kapadnis
    Description

    detectors is an R data package containing predictions from various GPT detectors. The data is based on the paper:

    GPT Detectors Are Biased Against Non-Native English Writers. Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou. CellPress Patterns.

    The study authors carried out a series of experiments passing a number of essays to different GPT detection models. Juxtaposing detector predictions for papers written by native and non-native English writers, the authors argue that GPT detectors disproportionately classify real writing from non-native English writers as AI-generated.

    Data Cr: https://github.com/simonpcouch/detectors/

    Data Dictionary

    detectors.csv

    variableclassdescription
    kindcharacterWhether the essay was written by a "Human" or "AI".
    .pred_AIdoubleThe class probability from the GPT detector that the inputted text was written by AI.
    .pred_classcharacterThe uncalibrated class prediction, encoded as if_else(.pred_AI > .5, "AI", "Human")
    detectorcharacterThe name of the detector used to generate the predictions.
    nativecharacterFor essays written by humans, whether the essay was written by a native English writer or not. These categorizations are coarse; values of "Yes" may actually be written by people who do not write with English natively. NA indicates that the text was not written by a human.
    namecharacterA label for the experiment that the predictions were generated from.
    modelcharacterFor essays that were written by AI, the name of the model that generated the essay.
    document_iddoubleA unique identifier for the supplied essay. Some essays were supplied to multiple detectors. Note that some essays are AI-revised derivatives of others.
    promptcharacterFor essays that were written by AI, a descriptor for the form of "prompt engineering" passed to the model.
  8. f

    List of features and their definition.

    • plos.figshare.com
    xlsx
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Efrat Cohen-Davidi; Isana Veksler-Lublinsky (2024). List of features and their definition. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012385.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 6, 2024
    Dataset provided by
    PLOS Computational Biology
    Authors
    Efrat Cohen-Davidi; Isana Veksler-Lublinsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally. In animals, this regulation is achieved via base-pairing with partially complementary sequences on mainly 3’ UTR region of messenger RNAs (mRNAs). Computational approaches that predict miRNA target interactions (MTIs) facilitate the process of narrowing down potential targets for experimental validation. The availability of new datasets of high-throughput, direct MTIs has led to the development of machine learning (ML) based methods for MTI prediction. To train an ML algorithm, it is beneficial to provide entries from all class labels (i.e., positive and negative). Currently, no high-throughput assays exist for capturing negative examples. Therefore, current ML approaches must rely on either artificially generated or inferred negative examples deduced from experimentally identified positive miRNA-target datasets. Moreover, the lack of uniform standards for generating such data leads to biased results and hampers comparisons between studies. In this comprehensive study, we collected methods for generating negative data for animal miRNA–target interactions and investigated their impact on the classification of true human MTIs. Our study relies on training ML models on a fixed positive dataset in combination with different negative datasets and evaluating their intra- and cross-dataset performance. As a result, we were able to examine each method independently and evaluate ML models’ sensitivity to the methodologies utilized in negative data generation. To achieve a deep understanding of the performance results, we analyzed unique features that distinguish between datasets. In addition, we examined whether one-class classification models that utilize solely positive interactions for training are suitable for the task of MTI classification. We demonstrate the importance of negative data in MTI classification, analyze specific methodological characteristics that differentiate negative datasets, and highlight the challenge of ML models generalizing interaction rules from training to testing sets derived from different approaches. This study provides valuable insights into the computational prediction of MTIs that can be further used to establish standards in the field.

  9. f

    Data Sheet 1_autoMEA: machine learning-based burst detection for...

    • frontiersin.figshare.com
    pdf
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinicius Hernandes; Anouk M. Heuvelmans; Valentina Gualtieri; Dimphna H. Meijer; Geeske M. van Woerden; Eliska Greplova (2024). Data Sheet 1_autoMEA: machine learning-based burst detection for multi-electrode array datasets.pdf [Dataset]. http://doi.org/10.3389/fnins.2024.1446578.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    Frontiers
    Authors
    Vinicius Hernandes; Anouk M. Heuvelmans; Valentina Gualtieri; Dimphna H. Meijer; Geeske M. van Woerden; Eliska Greplova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Neuronal activity in the highly organized networks of the central nervous system is the vital basis for various functional processes, such as perception, motor control, and cognition. Understanding interneuronal connectivity and how activity is regulated in the neuronal circuits is crucial for interpreting how the brain works. Multi-electrode arrays (MEAs) are particularly useful for studying the dynamics of neuronal network activity and their development as they allow for real-time, high-throughput measurements of neural activity. At present, the key challenge in the utilization of MEA data is the sheer complexity of the measured datasets. Available software offers semi-automated analysis for a fixed set of parameters that allow for the definition of spikes, bursts and network bursts. However, this analysis remains time-consuming, user-biased, and limited by pre-defined parameters. Here, we present autoMEA, software for machine learning-based automated burst detection in MEA datasets. We exemplify autoMEA efficacy on neuronal network activity of primary hippocampal neurons from wild-type mice monitored using 24-well multi-well MEA plates. To validate and benchmark the software, we showcase its application using wild-type neuronal networks and two different neuronal networks modeling neurodevelopmental disorders to assess network phenotype detection. Detection of network characteristics typically reported in literature, such as synchronicity and rhythmicity, could be accurately detected compared to manual analysis using the autoMEA software. Additionally, autoMEA could detect reverberations, a more complex burst dynamic present in hippocampal cultures. Furthermore, autoMEA burst detection was sufficiently sensitive to detect changes in the synchronicity and rhythmicity of networks modeling neurodevelopmental disorders as well as detecting changes in their network burst dynamics. Thus, we show that autoMEA reliably analyses neural networks measured with the multi-well MEA setup with the precision and accuracy compared to that of a human expert.

  10. Data_Sheet_1_Protection from prior natural infection vs. vaccination against...

    • frontiersin.figshare.com
    pdf
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susanne Weber; Pontus Hedberg; Pontus Naucler; Martin Wolkewitz (2024). Data_Sheet_1_Protection from prior natural infection vs. vaccination against SARS-CoV-2—a statistical note to avoid biased interpretation.pdf [Dataset]. http://doi.org/10.3389/fmed.2024.1376275.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Susanne Weber; Pontus Hedberg; Pontus Naucler; Martin Wolkewitz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe fight against SARS-CoV-2 has been a major task worldwide since it was first identified in December 2019. An imperative preventive measure is the availability of efficacious vaccines while there is also a significant interest in the protective effect of a previous SARS-CoV-2 infection on a subsequent infection (natural protection rate).MethodsIn order to compare protection rates after infection and vaccination, researchers consider different effect measures such as 1 minus hazard ratio, 1 minus odds ratio, or 1 minus risk ratio. These measures differ in a setting with competing risks. Nevertheless, as there is no unique definition, these metrics are frequently used in studies examining protection rate. Comparison of protection rates via vaccination and natural infection poses several challenges. For instance many publications consider the epidemiological definition, that a reinfection after a SARS-CoV-2 infection is only possible after 90 days, whereas there is no such constraint after vaccination. Furthermore, death is more prominent as a competing event during the first 90 days after infection compared to vaccination. In this work we discuss the statistical issues that arise when investigating protection rates comparing vaccination with infection. We explore different aspects of effect measures and provide insights drawn from different analyses, distinguishing between the first and the second 90 days post-infection or vaccination.ResultsIn this study, we have access to real-world data of almost two million people from Stockholm County, Sweden. For the main analysis, data of over 52.000 people is considered. The infected group is younger, includes more men, and is less morbid compared to the vaccinated group. After the first 90 days, these differences increased. Analysis of the second 90 days shows differences between analysis approaches and between age groups. There are age-related differences in mortality. Considering the outcome SARS-CoV-2 infection, the effect of vaccination versus infection varies by age, showing a disadvantage for the vaccinated in the younger population, while no significant difference was found in the elderly.DiscussionTo compare the effects of immunization through infection or vaccination, we emphasize consideration of several investigations. It is crucial to examine two observation periods: The first and second 90-day intervals following infection or vaccination. Additionally, methods to address imbalances are essential and need to be used. This approach supports fair comparisons, allows for more comprehensive conclusions and helps prevent biased interpretations.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Supratim Haldar (2021). AllSides : Ratings of bias in electronic media [Dataset]. https://www.kaggle.com/datasets/supratimhaldar/allsides-ratings-of-bias-in-electronic-media
Organization logo

AllSides : Ratings of bias in electronic media

How neutral or biased are the news you are reading today?

Explore at:
zip(32548 bytes)Available download formats
Dataset updated
Sep 23, 2021
Authors
Supratim Haldar
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Context

Media is the 4th pillar of democracy, so they must execute their duty with rightfulness. While majority does so, very often news articles gets contaminated with personal perspectives of journalists authoring those articles, or the beliefs of people running those media houses. As per Wikipedia definition - media bias is the bias or perceived bias of journalists and news producers within the mass media in the selection of events and stories that are reported and how they are covered.

Content

https://www.allsides.com is doing an wonderful job in analyzing the bias of renowned media houses, and showing how a particular news is presented with complete different perspectives by different media publications. Based on analysis, each media publication is assigned a "bias" direction (left, right or neutral). General public can vote to express their opinion if they agree to this analysis. The details in captured in https://www.allsides.com/media-bias/media-bias-ratings and constantly updated based on new votes. The content of this dataset is scraped from this and subsequent pages.

Acknowledgements

https://www.allsides.com is the owner of this data and holds all rights to it. Many thanks to them for their effort!

Inspiration

A deeper analysis can reveal which side most of the media houses are leaned towards. The analysis can further be extended by comparing news articles on same event by different media publications, and as a final step to build a classifier to find biasness of any random article on the internet just by reading it. This might help fight the battle against fake news as well.

Allsides will love to see any work which brings out insightful information from this data. Please feel free to share your work with Allsides (https://www.allsides.com/contact).

Licenses and Attribution

AllSides Media Bias Ratings by AllSides.com are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You may use this data for research or noncommercial purposes provided you include this attribution.

For commercial use, or to request this data as a CSV or JSON file, go to www.allsides.com/contact.

Search
Clear search
Close search
Google apps
Main menu