100+ datasets found

f
Data_Sheet_1_Data and model bias in artificial intelligence for healthcare...
frontiersin.figshare.com
zip
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith (2023). Data_Sheet_1_Data and model bias in artificial intelligence for healthcare applications in New Zealand.zip [Dataset]. http://doi.org/10.3389/fcomp.2022.1070493.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fcomp.2022.1070493.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New Zealand
Description
IntroductionDevelopments in Artificial Intelligence (AI) are adopted widely in healthcare. However, the introduction and use of AI may come with biases and disparities, resulting in concerns about healthcare access and outcomes for underrepresented indigenous populations. In New Zealand, Māori experience significant inequities in health compared to the non-Indigenous population. This research explores equity concepts and fairness measures concerning AI for healthcare in New Zealand.MethodsThis research considers data and model bias in NZ-based electronic health records (EHRs). Two very distinct NZ datasets are used in this research, one obtained from one hospital and another from multiple GP practices, where clinicians obtain both datasets. To ensure research equality and fair inclusion of Māori, we combine expertise in Artificial Intelligence (AI), New Zealand clinical context, and te ao Māori. The mitigation of inequity needs to be addressed in data collection, model development, and model deployment. In this paper, we analyze data and algorithmic bias concerning data collection and model development, training and testing using health data collected by experts. We use fairness measures such as disparate impact scores, equal opportunities and equalized odds to analyze tabular data. Furthermore, token frequencies, statistical significance testing and fairness measures for word embeddings, such as WEAT and WEFE frameworks, are used to analyze bias in free-form medical text. The AI model predictions are also explained using SHAP and LIME.ResultsThis research analyzed fairness metrics for NZ EHRs while considering data and algorithmic bias. We show evidence of bias due to the changes made in algorithmic design. Furthermore, we observe unintentional bias due to the underlying pre-trained models used to represent text data. This research addresses some vital issues while opening up the need and opportunity for future research.DiscussionsThis research takes early steps toward developing a model of socially responsible and fair AI for New Zealand's population. We provided an overview of reproducible concepts that can be adopted toward any NZ population data. Furthermore, we discuss the gaps and future research avenues that will enable more focused development of fairness measures suitable for the New Zealand population's needs and social structure. One of the primary focuses of this research was ensuring fair inclusions. As such, we combine expertise in AI, clinical knowledge, and the representation of indigenous populations. This inclusion of experts will be vital moving forward, proving a stepping stone toward the integration of AI for better outcomes in healthcare.
Z
Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...
data.niaid.nih.gov
zenodo.org
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haak, Fabian (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
Explore at:
Dataset updated
Mar 1, 2023
Dataset provided by
Haak, Fabian
Schaer, Philipp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

Dataset 2: Search Query Suggestions (suggestions.csv)

The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

AllSides Scraper

At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
Outcome Reporting Bias Data Collection
figshare.com
zip
Updated Feb 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kieran Shah (2016). Outcome Reporting Bias Data Collection [Dataset]. http://doi.org/10.6084/m9.figshare.2326669.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2326669.v1
Dataset updated
Feb 18, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Kieran Shah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data collection for reviews 1-350 included.For reviews with an unclear risk of bias, separate tables included where more than one author was involved to determine overall consensus on risk of bias (low, high, unclear). For reviews that required emailing Cochrane authors, email responses are also included and assigned a risk of bias. For email responses that had common themes, a table of a bias assessment for common themes was created. Finally, a summarized results table is included.
Z
NewsUnravel Dataset
data.niaid.nih.gov
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anon (2024). NewsUnravel Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8344890
Explore at:
Dataset updated
Jul 11, 2024
Dataset authored and provided by
anon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About the NUDA DatasetMedia bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.

General

This dataset was created through user feedback on automatically generated bias highlights on news articles on the website NewsUnravel made by ANON. Its goal is to improve the detection of linguistic media bias for analysis and to indicate it to the public. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.

The dataset consists of text, namely biased sentences with binary bias labels (processed, biased or not biased) as well as metadata about the article. It includes all feedback that was given. The single ratings (unprocessed) used to create the labels with correlating User IDs are included.

For training, this dataset was combined with the BABE dataset. All data is completely anonymous. Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.

Description of the Data Files

This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain the following data:

NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labelsStatistics.png: contains all Umami statistics for NewsUnravel's usage dataFeedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasonsContent.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentence and the bias rating, and reason, if givenArticle.csv: holds the article ID, title, source, article metadata, article topic, and bias amount in %Participant.csv: holds the participant IDs and data processing consent

Collection Process

Data was collected through interactions with the Feedback Mechanism on NewsUnravel. A news article was displayed with automatically generated bias highlights. Each highlight could be selected, and readers were able to agree or disagree with the automatic label. Through a majority vote, labels were generated from those feedback interactions. Spammers were excluded through a spam detection approach.

Readers came to our website voluntarily through posts on LinkedIn and social media as well as posts on university boards. The data collection period lasted for one week, from March 4th to March 11th (2023). The landing page informed them about the goal and the data processing. After being informed, they could proceed to the article overview.

So far, the dataset has been used on top of BABE to train a linguistic bias classifier, adopting hyperparameter configurations from BABE with a pre-trained model from Hugging Face.The dataset will be open source. On acceptance, a link with all details and contact information will be provided. No third parties are involved.

The dataset will not be maintained as it captures the first test of NewsUnravel at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsUnravel paper if you use the dataset and contact us if you're interested in more information or joining the project.
Data for: Peer Review Under Scrutiny: Systematic Evidence of Bias in...
zenodo.org
bin
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Maciel Pinto; Daniela Maciel Pinto; Adriana Bin; Adriana Bin; Evandro Coggo Cristofoletti; Evandro Coggo Cristofoletti; Ana Carolina Spatti; Ana Carolina Spatti; Larissa Aparecida Prevato Lopes; Larissa Aparecida Prevato Lopes (2025). Data for: Peer Review Under Scrutiny: Systematic Evidence of Bias in Research Funding [Dataset]. http://doi.org/10.5281/zenodo.15536550
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15536550
Dataset updated
May 28, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniela Maciel Pinto; Daniela Maciel Pinto; Adriana Bin; Adriana Bin; Evandro Coggo Cristofoletti; Evandro Coggo Cristofoletti; Ana Carolina Spatti; Ana Carolina Spatti; Larissa Aparecida Prevato Lopes; Larissa Aparecida Prevato Lopes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset supports the systematic literature review conducted in the study "Peer Review Under Scrutiny: Systematic Evidence of Bias in Research Funding". The data comprise a curated collection of empirical studies that investigated the existence of biases in peer review processes within research funding agencies worldwide.

The dataset includes detailed categorizations based on the types of biases investigated, methodologies employed, data sources, and the confirmation status of each bias identified in the selected studies. The file was structured to facilitate further analyses, replications, and methodological reviews in the field of research evaluation and science policy studies.

Data were collected through systematic searches in Scopus and Web of Science databases, followed by rigorous screening and classification procedures. The dataset may be particularly useful for researchers, policymakers, and evaluators interested in improving transparency and equity in research funding mechanisms.
r
Data from: Data : Heuristics and Biases in Home Care Package Resource...
researchdata.edu.au
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Professor Tracy Comans; Professor Tracy Comans (2025). Data : Heuristics and Biases in Home Care Package Resource Allocation [Dataset]. http://doi.org/10.48610/B2B5DE7
Explore at:
Unique identifier
https://doi.org/10.48610/B2B5DE7
Dataset updated
Apr 1, 2025
Dataset provided by
The University of Queensland
Authors
Professor Tracy Comans; Professor Tracy Comans
License
http://guides.library.uq.edu.au/deposit_your_data/terms_and_conditionshttp://guides.library.uq.edu.au/deposit_your_data/terms_and_conditions
Description
This dataset contains anonymised experiment data downloaded from a survey instrument. The experiment is designed to assess framing bias and the mechanism of data collection was online survey. The survey was designed in three parts: Information and consent, demographic questions, and case study vignettes. Demographic questions were identical in both forms of the survey. For the vignettes, respondents were randomised to one of two frames, with frame one presented from a medical assessment perspective (ACAT) and frame two presented from a service provider perspective There are four vignettes detailing real world choices in home-care packages, changing only the services and equipment suggested by either ACAT assessors or service providers (treatments).
f
Publication Bias in Laboratory Animal Research: A Survey on Magnitude,...
plos.figshare.com
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerben ter Riet; Daniel A. Korevaar; Marlies Leenaars; Peter J. Sterk; Cornelis J. F. Van Noorden; Lex M. Bouter; René Lutter; Ronald P. Oude Elferink; Lotty Hooft (2023). Publication Bias in Laboratory Animal Research: A Survey on Magnitude, Drivers, Consequences and Potential Solutions [Dataset]. http://doi.org/10.1371/journal.pone.0043404
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0043404
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Gerben ter Riet; Daniel A. Korevaar; Marlies Leenaars; Peter J. Sterk; Cornelis J. F. Van Noorden; Lex M. Bouter; René Lutter; Ronald P. Oude Elferink; Lotty Hooft
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ContextPublication bias jeopardizes evidence-based medicine, mainly through biased literature syntheses. Publication bias may also affect laboratory animal research, but evidence is scarce. ObjectivesTo assess the opinion of laboratory animal researchers on the magnitude, drivers, consequences and potential solutions for publication bias. And to explore the impact of size of the animals used, seniority of the respondent, working in a for-profit organization and type of research (fundamental, pre-clinical, or both) on those opinions. DesignInternet-based survey. SettingAll animal laboratories in The Netherlands. ParticipantsLaboratory animal researchers. Main Outcome Measure(s)Median (interquartile ranges) strengths of beliefs on 5 and 10-point scales (1: totally unimportant to 5 or 10: extremely important). ResultsOverall, 454 researchers participated. They considered publication bias a problem in animal research (7 (5 to 8)) and thought that about 50% (32–70) of animal experiments are published. Employees (n = 21) of for-profit organizations estimated that 10% (5 to 50) are published. Lack of statistical significance (4 (4 to 5)), technical problems (4 (3 to 4)), supervisors (4 (3 to 5)) and peer reviewers (4 (3 to 5)) were considered important reasons for non-publication (all on 5-point scales). Respondents thought that mandatory publication of study protocols and results, or the reasons why no results were obtained, may increase scientific progress but expected increased bureaucracy. These opinions did not depend on size of the animal used, seniority of the respondent or type of research. ConclusionsNon-publication of “negative” results appears to be prevalent in laboratory animal research. If statistical significance is indeed a main driver of publication, the collective literature on animal experimentation will be biased. This will impede the performance of valid literature syntheses. Effective, yet efficient systems should be explored to counteract selective reporting of laboratory animal research.
NewsUnravel Dataset
zenodo.org
csv
Updated Sep 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anonymous; anonymous (2023). NewsUnravel Dataset [Dataset]. http://doi.org/10.5281/zenodo.8344882
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8344882
Dataset updated
Sep 14, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
anonymous; anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About the Dataset
Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.

Description of the data files
This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain following data:

NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labels
Statistics.png: contains all Umami statistics for NewsUnravel's usage data
Feedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasons
Content.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentences and the bias rating, and reason, if given
Article.csv: holds the article ID, title, source, article meta data, article topic, and bias amount in %
Participant.csv: holds the participant IDs and data processing consent
d
Replication Data for: Reducing Political Bias in Political Science Estimates...
search.dataone.org
dataverse.harvard.edu
+1more
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zigerell, Lawrence (2023). Replication Data for: Reducing Political Bias in Political Science Estimates [Dataset]. http://doi.org/10.7910/DVN/PZLCJM
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/PZLCJM
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Zigerell, Lawrence
Description
Political science researchers have flexibility in how to analyze data, how to report data, and whether to report on data. Review of examples of reporting flexibility from the race and sex discrimination literature illustrates how research design choices can influence estimates and inferences. This reporting flexibility—coupled with the political imbalance among political scientists—creates the potential for political bias in reported political science estimates, but this potential for political bias can be reduced or eliminated through preregistration and preacceptance, in which researchers commit to a research design before completing data collection. Removing the potential for reporting flexibility can raise the credibility of political science research.
d
Bias feature containing proxy-datum bias information to be used in the...
catalog.data.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Bias feature containing proxy-datum bias information to be used in the Digital Shoreline Analysis System for the central coast of North Carolina from Cape Hatteras to Cape Lookout (NCcentral) [Dataset]. https://catalog.data.gov/dataset/bias-feature-containing-proxy-datum-bias-information-to-be-used-in-the-digital-shoreline-a-a918a
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Hatteras Island, Cape Lookout, Cape Hatteras, Cape Lookout
Description
The U.S. Geological Survey (USGS) has compiled national shoreline data for more than 20 years to document coastal change and serve the needs of research, management, and the public. Maintaining a record of historical shoreline positions is an effective method to monitor national shoreline evolution over time, enabling scientists to identify areas most susceptible to erosion or accretion. These data can help coastal managers and planners understand which areas of the coast are vulnerable to change. This data release includes one new mean high water (MHW) shoreline extracted from lidar data collected in 2017 for the entire coastal region of North Carolina which is divided into four subregions: northern North Carolina (NCnorth), central North Carolina (NCcentral), southern North Carolina (NCsouth), and western North Carolina (NCwest). Previously published historical shorelines for North Carolina (Kratzmann and others, 2017) were combined with the new lidar shoreline to calculate long-term (up to 169 years) and short-term (up to 20 years) rates of change. Files associated with the long-term and short-term rates are appended with "LT" and "ST", respectively. A proxy-datum bias reference line that accounts for the positional difference in a proxy shoreline (e.g. High Water Line (HWL) shoreline) and a datum shoreline (e.g. MHW shoreline) is also included in this release.
f
Data from: The Influence of Cognitive Ability on Cognitive Biases Generated...
scielo.figshare.com
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edzana R. F. da C. Vieira Lucena; César Augusto Tibúrcio Silva; Yuri Gomes Paiva Azevedo (2023). The Influence of Cognitive Ability on Cognitive Biases Generated by the Representativeness Heuristic [Dataset]. http://doi.org/10.6084/m9.figshare.20011138.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20011138.v1
Dataset updated
Jun 1, 2023
Dataset provided by
SciELO journals
Authors
Edzana R. F. da C. Vieira Lucena; César Augusto Tibúrcio Silva; Yuri Gomes Paiva Azevedo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Purpose: This study aims to investigate the influence of cognitive ability on cognitive biases generated by the representativeness heuristic. Design/methodology/approach: The data collection was performed through questionnaires in order to measure the cognitive ability of 1,064 Brazilian accounting students and professionals using the Cognitive Reflection Test. To perform the analysis, we used the Student’s t-test, analysis of variance, correlations, and regressions. Findings: Our initial findings indicate that cognitive ability only influences the incidence of base rate insensitivity and illusion of validity biases, indicating that the higher the cognitive ability, the lower the incidence of these biases in decision making. However, robustness tests expand this influence to misconceptions of chance and regression fallacy biases. Furthermore, we show that there is a difference of means between gender, level of education, and geographic region and the representativeness heuristic biases. Originality/value: This paper contributes to the literature on behavioral accounting considering that although investigations into this subject do exist, no study has been performed in the accounting area that involves all the cognitive biases of one heuristic in a single study.
o
Worst Case Resistance Testing: A Nonresponse Bias Solution for Today’s...
openicpsr.org
delimited
Updated May 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen L. France; Frank Adams; Myles Landers (2024). Worst Case Resistance Testing: A Nonresponse Bias Solution for Today’s Survey Research Realities [Dataset]. http://doi.org/10.3886/E203261V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E203261V1
Dataset updated
May 19, 2024
Dataset provided by
Mississippi State University
Authors
Stephen L. France; Frank Adams; Myles Landers
License
https://opensource.org/licenses/GPL-3.0https://opensource.org/licenses/GPL-3.0
Description
The dataset contains data gathered from a Qualtrics panel sample involving a previous retail shopping experience. The dataset was gathered to help test and validate methods for dealing with nonresponse bias in the following paper:France, S. L., Adams, F. G., Landers, V. M. (2024). Worst Case Resistance Testing: A Nonresponse Bias Solution for Today’s Survey Research Realities, forthcoming in Survey Research Methods.The rationale for the use of this sample was to gather response data from a well known and validated set of instruments. The questionnaire adapts instruments previous developed in a previous paper. As noted in the paper “as Szymanski & Henard (2001) have over 3,000 citations, and pose relatively simple questions, they were judged as liable to provide stable results, and unlikely to represent confounding factors due to their complexity.” Szymanski, D. M., & Henard, D. H. (2001). Customer satisfaction: A meta-analysis of the empirical evidence. Journal of the Academy of Marketing Science, 29(1), 16-35.
News Ninja Dataset
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Feb 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anon; anon (2024). News Ninja Dataset [Dataset]. http://doi.org/10.5281/zenodo.10683029
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10683029
Dataset updated
Feb 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
anon; anon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About
Recent research shows that visualizing linguistic media bias mitigates its negative effects. However, reliable automatic detection methods to generate such visualizations require costly, knowledge-intensive training data. To facilitate data collection for media bias datasets, we present News Ninja, a game employing data-collecting game mechanics to generate a crowdsourced dataset. Before annotating sentences, players are educated on media bias via a tutorial. Our findings show that datasets gathered with crowdsourced workers trained on News Ninja can reach significantly higher inter-annotator agreements than expert and crowdsourced datasets. As News Ninja encourages continuous play, it allows datasets to adapt to the reception and contextualization of news over time, presenting a promising strategy to reduce data collection expenses, educate players, and promote long-term bias mitigation.

General
This dataset was created through player annotations in the News Ninja Game made by ANON. Its goal is to improve the detection of linguistic media bias. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.

The dataset includes sentences with binary bias labels (processed, biased or not biased) as well as the annotations of single players used for the majority vote. It includes all game-collected data. All data is completely anonymous. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.

Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset contains topics such as violence, abortion, and hate against specific races, genders, religions, or sexual orientations.

Description of the Data Files
This repository contains the datasets for the anonymous News Ninja submission. The tables contain the following data:

ExportNewsNinja.csv: Contains 370 BABE sentences and 150 new sentences with their text (sentence), words labeled as biased (words), BABE ground truth (ground_Truth), and the sentence bias label from the player annotations (majority_vote). The first 370 sentences are re-annotated BABE sentences, and the following 150 sentences are new sentences.

AnalysisNewsNinja.xlsx: Contains 370 BABE sentences and 150 new sentences. The first 370 sentences are re-annotated BABE sentences, and the following 150 sentences are new sentences. The table includes the full sentence (Sentence), the sentence bias label from player annotations (isBiased Game), the new expert label (isBiased Expert), if the game label and expert label match (Game VS Expert), if differing labels are a false positives or false negatives (false negative, false positive), the ground truth label from BABE (isBiasedBABE), if Expert and BABE labels match (Expert VS BABE), and if the game label and BABE label match (Game VS BABE). It also includes the analysis of the agreement between the three rater categories (Game, Expert, BABE).

demographics.csv: Contains demographic information of News Ninja players, including gender, age, education, English proficiency, political orientation, news consumption, and consumed outlets.

Collection Process
Data was collected through interactions with the NewsNinja game. All participants went through a tutorial before annotating 2x10 BABE sentences and 2x10 new sentences. For this first test, players were recruited using Prolific. The game was hosted on a costume-built responsive website. The collection period was from 20.02.2023 to 28.02.2023. Before starting the game, players were informed about the goal and the data processing. After consenting, they could proceed to the tutorial.

The dataset will be open source. A link with all details and contact information will be provided upon acceptance. No third parties are involved.

The dataset will not be maintained as it captures the first test of NewsNinja at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsNinja paper if you use the dataset and contact us if you're interested in more information or joining the project.
n
Data from: Approach-induced biases in human information sampling
data.niaid.nih.gov
datadryad.org
zip
Updated Jan 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laurence T. Hunt; Robb B. Rutledge; W. M. Nishantha Malalasekera; Steven W. Kennerley; Raymond J. Dolan (2017). Approach-induced biases in human information sampling [Dataset]. http://doi.org/10.5061/dryad.nb41c
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.nb41c
Dataset updated
Jan 5, 2017
Dataset provided by
University College London
Authors
Laurence T. Hunt; Robb B. Rutledge; W. M. Nishantha Malalasekera; Steven W. Kennerley; Raymond J. Dolan
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
IInformation sampling is often biased towards seeking evidence that confirms one’s prior beliefs. Despite such biases being a pervasive feature of human behavior, their underlying causes remain unclear. Many accounts of these biases appeal to limitations of human hypothesis testing and cognition, de facto evoking notions of bounded rationality, but neglect more basic aspects of behavioral control. Here, we investigated a potential role for Pavlovian approach in biasing which information humans will choose to sample. We collected a large novel dataset from 32,445 human subjects, making over 3 million decisions, who played a gambling task designed to measure the latent causes and extent of information-sampling biases. We identified three novel approach-related biases, formalized by comparing subject behavior to a dynamic programming model of optimal information gathering. These biases reflected the amount of information sampled (“positive evidence approach”), the selection of which information to sample (“sampling the favorite”), and the interaction between information sampling and subsequent choices (“rejecting unsampled options”). The prevalence of all three biases was related to a Pavlovian approach-avoid parameter quantified within an entirely independent economic decision task. Our large dataset also revealed that individual differences in the amount of information gathered are a stable trait across multiple gameplays and can be related to demographic measures, including age and educational attainment. As well as revealing limitations in cognitive processing, our findings suggest information sampling biases reflect the expression of primitive, yet potentially ecologically adaptive, behavioral repertoires. One such behavior is sampling from options that will eventually be chosen, even when other sources of information are more pertinent for guiding future action.
c
Cross-cultural differences in biased cognition - Pilot task data
datacatalogue.cessda.eu
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiend, J (2025). Cross-cultural differences in biased cognition - Pilot task data [Dataset]. http://doi.org/10.5255/UKDA-SN-852440
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-852440
Dataset updated
May 27, 2025
Dataset provided by
King
Authors
Yiend, J
Time period covered
Nov 30, 2013 - Nov 29, 2016
Area covered
United Kingdom
Variables measured
Individual
Measurement technique
Participants: Fluent bilingual Mandarin and English speakers, aged 16-65 with no current major physical illness or psychological disorder, who were not receiving psychological therapy or medication for psychological conditions.Sampling procedure: Participants were recruited using circular emails which are sent to all university staff and students as well as through flyers around campuses. Relevant societies and language schools in central London were also contacted.Data collection: Participants completed four cognitive bias tasks (emotional Stroop, attention probe, similarity ratings task and scrambled sentence task) in both English and Mandarin. Order of language presentation and task presentation were counterbalanced.
Description
This data collection consists of pilot data measuring task equivalence for measures of attention and interpretation bias. Congruent Mandarin and English emotional Stroop, attention probe (both measuring attention bias) and similarity ratings task and scrambled sentence task (both measuring interpretation bias) were developed using back-translation and decentering procedures. Tasks were then completed by 47 bilingual Mandarin-English speakers. Presented are data detailing personal characteristics, task scores and bias scores.
The way in which we process information in the world around us has a significant effect on our health and well being. For example, some people are more prone than others to notice potential dangers, to remember bad things from the past and assume the worst, when the meaning of an event or comment is uncertain. These tendencies are called negative cognitive biases and can lead to low mood and poor quality of life. They also make people vulnerable to mental illnesses.

In contrast, those with positive cognitive biases tend to function well and remain healthy. To date most of this work has been conducted on white, western populations and we do not know whether similar cognitive biases exist in Eastern cultures. This project will examine cognitive biases in Eastern (Hong Kong nationals ) and Western (UK nationals) people to see whether there are any differences between the two. It will also examine what happens to cognitive biases when someone migrates to a different culture. This will tell us whether influences from the society and culture around us have any effect on our cognitive biases. Finally the project will consider how much our own cognitive biases are inherited from our parents.

Together these results will tell us whether the known good and bad effects of cognitive biases apply to non Western cultural groups as well, and how much cognitive biases are decided by our genes or our environment.
Z
Risk of bias in observational studies using routinely collected data of...
data.niaid.nih.gov
Updated Oct 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nguyen Thu Van (2021). Risk of bias in observational studies using routinely collected data of comparative effectiveness research: a meta-research study [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5543468
Explore at:
Dataset updated
Oct 1, 2021
Dataset authored and provided by
Nguyen Thu Van
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We performed a meta-research study by searching PubMed for comparative effectiveness observational studies evaluating therapeutic interventions using routinely collected data published in high impact factor journals from 01/06/2018 to 30/06/2020. We assessed the reporting of study design (i.e., eligibility, treatment assignment, and the start of follow-up). Risk of selection bias and immortal time bias was determined by assessing if the time of eligibility, treatment assignment and the start of follow-up were synchronised to mimic the randomisation following the target trial emulation framework.
I
Global AI Bias Audit Services Market Growth Opportunities 2025-2032
statsndata.org
excel, pdf
Updated May 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global AI Bias Audit Services Market Growth Opportunities 2025-2032 [Dataset]. https://www.statsndata.org/report/ai-bias-audit-services-market-375694
Explore at:
excel, pdfAvailable download formats
Dataset updated
May 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The AI Bias Audit Services market has emerged as a critical sector in response to the increasing reliance on artificial intelligence (AI) across various industries. As organizations integrate AI systems into their operations, concerns about bias-stemming from data quality, algorithmic fairness, and ethical considera
d
Data from: Questioning Bias: Validating a Bias Crime Assessment Tool in...
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Questioning Bias: Validating a Bias Crime Assessment Tool in California and New Jersey, 2016-2017 [Dataset]. https://catalog.data.gov/dataset/questioning-bias-validating-a-bias-crime-assessment-tool-in-california-and-new-jersey-2016-a062f
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Area covered
California, New Jersey
Description
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study investigates experiences surrounding hate and bias crimes and incidents and reasons and factors affecting reporting and under-reporting among youth and adults in LGBT, immigrant, Hispanic, Black, and Muslim communities in New Jersey and Los Angeles County, California. The collection includes 1 SPSS data file (QB_FinalDataset-Revised.sav (n=1,326; 513 variables)). The collection also contains 24 qualitative data files of transcripts from focus groups and interviews with key informants, which are not included in this release.
d
Across space and time: a review of sampling and analytical biases in fossil...
search.dataone.org
datadryad.org
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karma Nanglu; Thomas Cullen (2025). Across space and time: a review of sampling and analytical biases in fossil data across macroecological scales [Dataset]. http://doi.org/10.5061/dryad.6djh9w143
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.6djh9w143
Dataset updated
Apr 28, 2025
Dataset provided by
Dryad Digital Repository
Authors
Karma Nanglu; Thomas Cullen
Time period covered
Jan 1, 2022
Description
Quantitative studies of fossil data have proven critical to a number of major macroevolutionary and macroecological discoveries, such as the â€˜Big 5â€™ mass extinctions of the Phanerozoic. The development and easy accessibility of major meta-data sources such as the Paleobiology Database and Geobiodiversity Database have also spurred the widespread application of these data to testing ecological hypotheses at finer spatiotemporal and phylogenetic scales. However, issues of preservational/taphonomic biases, sampling/collecting biases, taxonomic issues, and analytical choice can impact the degree of interpretative resolution possible, and even obscure biological â€˜signalâ€™ from error/bias-introduced â€˜noiseâ€™. The degree to which these factors can impact analytical interpretations is not well-documented in comparison to the scale of use of these data sources. Here, we review the many forms of systematic error that can creep into a paleoecological study, from the stage of data collection to the i...
D
Dataset: Religion, social desirability bias and financial inclusion
dataverse.nl
csv, doc
Updated Apr 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syedah Ahmad; Syedah Ahmad; Robert Lensink; Robert Lensink; Annika Mueller; Annika Mueller (2022). Dataset: Religion, social desirability bias and financial inclusion [Dataset]. http://doi.org/10.34894/NXXGPK
Explore at:
csv(81681), doc(71644)Available download formats
Unique identifier
https://doi.org/10.34894/NXXGPK
Dataset updated
Apr 14, 2022
Dataset provided by
DataverseNL
Authors
Syedah Ahmad; Syedah Ahmad; Robert Lensink; Robert Lensink; Annika Mueller; Annika Mueller
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data set contains the raw data used for the analysis of the research article “Religion, social desirability bias and financial inclusion: Evidence from a list experiment on Islamic (micro-)finance”. In particular, it comprises information regarding the attitudes of the Muslim poor towards the use of non-Islamic financial products and services, based on survey data and list experiment data of 2,145 borrowers of an Islamic microfinance institution (Akhuwat) from the city of Multan, Punjab Province, Pakistan, collected in 2017. Data collection took place in Urdu.

Facebook

Twitter

Click to copy link

Link copied

Cite

Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith (2023). Data_Sheet_1_Data and model bias in artificial intelligence for healthcare applications in New Zealand.zip [Dataset]. http://doi.org/10.3389/fcomp.2022.1070493.s001

Data_Sheet_1_Data and model bias in artificial intelligence for healthcare applications in New Zealand.zip

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.3389/fcomp.2022.1070493.s001

Dataset updated

Jun 3, 2023

Dataset provided by

Frontiers

Authors

Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

New Zealand

Description

IntroductionDevelopments in Artificial Intelligence (AI) are adopted widely in healthcare. However, the introduction and use of AI may come with biases and disparities, resulting in concerns about healthcare access and outcomes for underrepresented indigenous populations. In New Zealand, Māori experience significant inequities in health compared to the non-Indigenous population. This research explores equity concepts and fairness measures concerning AI for healthcare in New Zealand.MethodsThis research considers data and model bias in NZ-based electronic health records (EHRs). Two very distinct NZ datasets are used in this research, one obtained from one hospital and another from multiple GP practices, where clinicians obtain both datasets. To ensure research equality and fair inclusion of Māori, we combine expertise in Artificial Intelligence (AI), New Zealand clinical context, and te ao Māori. The mitigation of inequity needs to be addressed in data collection, model development, and model deployment. In this paper, we analyze data and algorithmic bias concerning data collection and model development, training and testing using health data collected by experts. We use fairness measures such as disparate impact scores, equal opportunities and equalized odds to analyze tabular data. Furthermore, token frequencies, statistical significance testing and fairness measures for word embeddings, such as WEAT and WEFE frameworks, are used to analyze bias in free-form medical text. The AI model predictions are also explained using SHAP and LIME.ResultsThis research analyzed fairness metrics for NZ EHRs while considering data and algorithmic bias. We show evidence of bias due to the changes made in algorithmic design. Furthermore, we observe unintentional bias due to the underlying pre-trained models used to represent text data. This research addresses some vital issues while opening up the need and opportunity for future research.DiscussionsThis research takes early steps toward developing a model of socially responsible and fair AI for New Zealand's population. We provided an overview of reproducible concepts that can be adopted toward any NZ population data. Furthermore, we discuss the gaps and future research avenues that will enable more focused development of fairness measures suitable for the New Zealand population's needs and social structure. One of the primary focuses of this research was ensuring fair inclusions. As such, we combine expertise in AI, clinical knowledge, and the representation of indigenous populations. This inclusion of experts will be vital moving forward, proving a stepping stone toward the integration of AI for better outcomes in healthcare.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_Data and model bias in artificial intelligence for healthcare...

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

Outcome Reporting Bias Data Collection

NewsUnravel Dataset

Data for: Peer Review Under Scrutiny: Systematic Evidence of Bias in...

Data from: Data : Heuristics and Biases in Home Care Package Resource...

Publication Bias in Laboratory Animal Research: A Survey on Magnitude,...

NewsUnravel Dataset

Replication Data for: Reducing Political Bias in Political Science Estimates...

Bias feature containing proxy-datum bias information to be used in the...

Data from: The Influence of Cognitive Ability on Cognitive Biases Generated...

Worst Case Resistance Testing: A Nonresponse Bias Solution for Today’s...

News Ninja Dataset

Data from: Approach-induced biases in human information sampling

Cross-cultural differences in biased cognition - Pilot task data

Risk of bias in observational studies using routinely collected data of...

Global AI Bias Audit Services Market Growth Opportunities 2025-2032

Data from: Questioning Bias: Validating a Bias Crime Assessment Tool in...

Across space and time: a review of sampling and analytical biases in fossil...

Dataset: Religion, social desirability bias and financial inclusion

Data_Sheet_1_Data and model bias in artificial intelligence for healthcare applications in New Zealand.zip