Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionDevelopments in Artificial Intelligence (AI) are adopted widely in healthcare. However, the introduction and use of AI may come with biases and disparities, resulting in concerns about healthcare access and outcomes for underrepresented indigenous populations. In New Zealand, Māori experience significant inequities in health compared to the non-Indigenous population. This research explores equity concepts and fairness measures concerning AI for healthcare in New Zealand.MethodsThis research considers data and model bias in NZ-based electronic health records (EHRs). Two very distinct NZ datasets are used in this research, one obtained from one hospital and another from multiple GP practices, where clinicians obtain both datasets. To ensure research equality and fair inclusion of Māori, we combine expertise in Artificial Intelligence (AI), New Zealand clinical context, and te ao Māori. The mitigation of inequity needs to be addressed in data collection, model development, and model deployment. In this paper, we analyze data and algorithmic bias concerning data collection and model development, training and testing using health data collected by experts. We use fairness measures such as disparate impact scores, equal opportunities and equalized odds to analyze tabular data. Furthermore, token frequencies, statistical significance testing and fairness measures for word embeddings, such as WEAT and WEFE frameworks, are used to analyze bias in free-form medical text. The AI model predictions are also explained using SHAP and LIME.ResultsThis research analyzed fairness metrics for NZ EHRs while considering data and algorithmic bias. We show evidence of bias due to the changes made in algorithmic design. Furthermore, we observe unintentional bias due to the underlying pre-trained models used to represent text data. This research addresses some vital issues while opening up the need and opportunity for future research.DiscussionsThis research takes early steps toward developing a model of socially responsible and fair AI for New Zealand's population. We provided an overview of reproducible concepts that can be adopted toward any NZ population data. Furthermore, we discuss the gaps and future research avenues that will enable more focused development of fairness measures suitable for the New Zealand population's needs and social structure. One of the primary focuses of this research was ensuring fair inclusions. As such, we combine expertise in AI, clinical knowledge, and the representation of indigenous populations. This inclusion of experts will be vital moving forward, proving a stepping stone toward the integration of AI for better outcomes in healthcare.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in
Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.
Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)
The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.
To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.
Dataset 2: Search Query Suggestions (suggestions.csv)
The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.
The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".
We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.
AllSides Scraper
At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.
We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
This data collection consists of pilot data measuring task equivalence for measures of attention and interpretation bias. Congruent Mandarin and English emotional Stroop, attention probe (both measuring attention bias) and similarity ratings task and scrambled sentence task (both measuring interpretation bias) were developed using back-translation and decentering procedures. Tasks were then completed by 47 bilingual Mandarin-English speakers. Presented are data detailing personal characteristics, task scores and bias scores.The way in which we process information in the world around us has a significant effect on our health and well being. For example, some people are more prone than others to notice potential dangers, to remember bad things from the past and assume the worst, when the meaning of an event or comment is uncertain. These tendencies are called negative cognitive biases and can lead to low mood and poor quality of life. They also make people vulnerable to mental illnesses. In contrast, those with positive cognitive biases tend to function well and remain healthy. To date most of this work has been conducted on white, western populations and we do not know whether similar cognitive biases exist in Eastern cultures. This project will examine cognitive biases in Eastern (Hong Kong nationals ) and Western (UK nationals) people to see whether there are any differences between the two. It will also examine what happens to cognitive biases when someone migrates to a different culture. This will tell us whether influences from the society and culture around us have any effect on our cognitive biases. Finally the project will consider how much our own cognitive biases are inherited from our parents. Together these results will tell us whether the known good and bad effects of cognitive biases apply to non Western cultural groups as well, and how much cognitive biases are decided by our genes or our environment. Participants: Fluent bilingual Mandarin and English speakers, aged 16-65 with no current major physical illness or psychological disorder, who were not receiving psychological therapy or medication for psychological conditions. Sampling procedure: Participants were recruited using circular emails which are sent to all university staff and students as well as through flyers around campuses. Relevant societies and language schools in central London were also contacted. Data collection: Participants completed four cognitive bias tasks (emotional Stroop, attention probe, similarity ratings task and scrambled sentence task) in both English and Mandarin. Order of language presentation and task presentation were counterbalanced.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
IInformation sampling is often biased towards seeking evidence that confirms one’s prior beliefs. Despite such biases being a pervasive feature of human behavior, their underlying causes remain unclear. Many accounts of these biases appeal to limitations of human hypothesis testing and cognition, de facto evoking notions of bounded rationality, but neglect more basic aspects of behavioral control. Here, we investigated a potential role for Pavlovian approach in biasing which information humans will choose to sample. We collected a large novel dataset from 32,445 human subjects, making over 3 million decisions, who played a gambling task designed to measure the latent causes and extent of information-sampling biases. We identified three novel approach-related biases, formalized by comparing subject behavior to a dynamic programming model of optimal information gathering. These biases reflected the amount of information sampled (“positive evidence approach”), the selection of which information to sample (“sampling the favorite”), and the interaction between information sampling and subsequent choices (“rejecting unsampled options”). The prevalence of all three biases was related to a Pavlovian approach-avoid parameter quantified within an entirely independent economic decision task. Our large dataset also revealed that individual differences in the amount of information gathered are a stable trait across multiple gameplays and can be related to demographic measures, including age and educational attainment. As well as revealing limitations in cognitive processing, our findings suggest information sampling biases reflect the expression of primitive, yet potentially ecologically adaptive, behavioral repertoires. One such behavior is sampling from options that will eventually be chosen, even when other sources of information are more pertinent for guiding future action.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Longitudinal or panel surveys offer unique benefits for social science research, but they typically suffer from attrition, which reduces sample size and can result in biased inferences. Previous research tends to focus on the demographic predictors of attrition, conceptualizing attrition propensity as a stable, individual- level characteristic—some individuals (e.g., young, poor, residentially mobile) are more likely to drop out of a study than others. We argue that panel attrition reflects both the characteristics of the individual respondent as well as her survey experience, a factor shaped by the design and implementation features of the study. In this paper, we examine and compare the predictors of panel attrition in the 2008-2009 American National Election Study, an on- line panel, and the 2006-2010 General Social Survey, a face-to-face panel. In both cases, survey experience variables are predictive of panel attrition above and beyond the standard demographic predictors, but the particular measures of relevance differ across the two surveys. The findings inform statistical corrections for panel attrition bias and provide study design insights for future panel data collections.
http://guides.library.uq.edu.au/deposit_your_data/terms_and_conditionshttp://guides.library.uq.edu.au/deposit_your_data/terms_and_conditions
This dataset contains anonymised experiment data downloaded from a survey instrument. The experiment is designed to assess framing bias and the mechanism of data collection was online survey. The survey was designed in three parts: Information and consent, demographic questions, and case study vignettes. Demographic questions were identical in both forms of the survey. For the vignettes, respondents were randomised to one of two frames, with frame one presented from a medical assessment perspective (ACAT) and frame two presented from a service provider perspective There are four vignettes detailing real world choices in home-care packages, changing only the services and equipment suggested by either ACAT assessors or service providers (treatments).
This data collection consists of behavioural task data for measures of attention and interpretation bias, specifically: emotional Stroop, attention probe (both measuring attention bias) and similarity ratings task and scrambled sentence task (both measuring interpretation bias). Data on the following 6 participant groups are included in the dataset: native UK (n=36), native HK (n=39), UK migrants to HK (short term = 31, long term = 28) and HK migrants to UK (short term = 37, long term = 31). Also included are personal characteristics and questionnaire measures. The way in which we process information in the world around us has a significant effect on our health and well being. For example, some people are more prone than others to notice potential dangers, to remember bad things from the past and assume the worst, when the meaning of an event or comment is uncertain. These tendencies are called negative cognitive biases and can lead to low mood and poor quality of life. They also make people vulnerable to mental illnesses. In contrast, those with positive cognitive biases tend to function well and remain healthy. To date most of this work has been conducted on white, western populations and we do not know whether similar cognitive biases exist in Eastern cultures. This project will examine cognitive biases in Eastern (Hong Kong nationals ) and Western (UK nationals) people to see whether there are any differences between the two. It will also examine what happens to cognitive biases when someone migrates to a different culture. This will tell us whether influences from the society and culture around us have any effect on our cognitive biases. Finally the project will consider how much our own cognitive biases are inherited from our parents. Together these results will tell us whether the known good and bad effects of cognitive biases apply to non Western cultural groups as well, and how much cognitive biases are decided by our genes or our environment. Participants: Local Hong Kong and UK natives; short term and long term migrants in each country, aged 16-65 with no current major physical illness or psychological disorder, who were not receiving psychological therapy or medication for psychological conditions. Sampling procedure: Participants were recruited using circular emails, public flyers and other advertisements in local venues, universities and clubs. Data collection: Participants completed four previously developed and validated cognitive bias tasks (emotional Stroop, attention probe, similarity ratings task and scrambled sentence task) in their native language. They also completed socio-demographic information and questionnaires.
Political science researchers have flexibility in how to analyze data, how to report data, and whether to report on data. Review of examples of reporting flexibility from the race and sex discrimination literature illustrates how research design choices can influence estimates and inferences. This reporting flexibility—coupled with the political imbalance among political scientists—creates the potential for political bias in reported political science estimates, but this potential for political bias can be reduced or eliminated through preregistration and preacceptance, in which researchers commit to a research design before completing data collection. Removing the potential for reporting flexibility can raise the credibility of political science research.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ContextPublication bias jeopardizes evidence-based medicine, mainly through biased literature syntheses. Publication bias may also affect laboratory animal research, but evidence is scarce. ObjectivesTo assess the opinion of laboratory animal researchers on the magnitude, drivers, consequences and potential solutions for publication bias. And to explore the impact of size of the animals used, seniority of the respondent, working in a for-profit organization and type of research (fundamental, pre-clinical, or both) on those opinions. DesignInternet-based survey. SettingAll animal laboratories in The Netherlands. ParticipantsLaboratory animal researchers. Main Outcome Measure(s)Median (interquartile ranges) strengths of beliefs on 5 and 10-point scales (1: totally unimportant to 5 or 10: extremely important). ResultsOverall, 454 researchers participated. They considered publication bias a problem in animal research (7 (5 to 8)) and thought that about 50% (32–70) of animal experiments are published. Employees (n = 21) of for-profit organizations estimated that 10% (5 to 50) are published. Lack of statistical significance (4 (4 to 5)), technical problems (4 (3 to 4)), supervisors (4 (3 to 5)) and peer reviewers (4 (3 to 5)) were considered important reasons for non-publication (all on 5-point scales). Respondents thought that mandatory publication of study protocols and results, or the reasons why no results were obtained, may increase scientific progress but expected increased bureaucracy. These opinions did not depend on size of the animal used, seniority of the respondent or type of research. ConclusionsNon-publication of “negative” results appears to be prevalent in laboratory animal research. If statistical significance is indeed a main driver of publication, the collective literature on animal experimentation will be biased. This will impede the performance of valid literature syntheses. Effective, yet efficient systems should be explored to counteract selective reporting of laboratory animal research.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status. Each sentence pair is a minimal edit of the first sentence: The only words that change between them are those that identify the group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:
Columns:,**sent_more**,sent_less,**stereo_antistereo**,bias_type,**annotations**,,anon_writer,,anon_annotators,,prompt,,source
The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status. Each sentence pair is a minimal edit of the first sentence: The only words that change between them are those that identify the group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:
Columns:,**sent_less**sent_more,,stereo_antistereo,,bias_type,,annotations,,anon_writer,,anon_annotators,,,,prompt,,source
This dataset can be used to measure social biases in MLMs by training models on it and evaluating their performance
- Measuring the ability of MLMs to identify and avoid social biases;
- Developing new methods for reducing social biases in MLMs; and
- Investigating the impact of social biases on downstream tasks such as reading comprehension or question answering
If you use this dataset in your research, please credit the original authors.
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: crows_pairs_anonymized.csv | Column name | Description | |:----------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------| | sent_more | The first sentence in the pair, which can demonstrate or violate a stereotype. (String) | | sent_less | The second sentence in the pair, which is a minimal edit of the first sentence. The only words that change between them are those that identify the group. (String) | | stereo_antistereo | Whether the first sentence demonstrates or violates a stereotype. (String) | | bias_type | The type of bias represented in the sentence pair. (String) | | annotations | The annotations made by the crowdworkers on the sentence pair. (String) | | anon_writer | The anonymous writer of the sentence pair. (String) | | anon_annotators | The anonymous annotators of the sentence pair. (String) |
File: prompts.csv | Column name | Descripti...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This contains model code and data from the paper titled
"Perceived and observed biases within scientific communities: a case study in movement ecology"
By: Shaw AK, Fouda L, Mezzini S, Kim D, Chatterjee N, Wolfson D, Abrahms B, Attias N, Beardsworth CE, Beltran R, Binning SA, Blincow KM, Chan Y-C, Fronhofer EA, Hegemann A, Hurme ER, Iannarilli F, Kellner JB, McCoy KD, Rafiq K, Saastamoinen M, Sequeira AMM, Serota MW, Sumasgutner P, Tao Y, Torstenson M, Yanco SW, Beck KB, Bertram MG, Beumer LT, Bradarić M, Clermont J, Ellis-Soto D, Faltusová M, Fieberg J, Hall RJ, Kölzsch A, Lai S, Lee-Cruz L, Loretto M-C, Loveridge A, Michelangeli M, Mueller T, Riotte-Lambert L, Sapir N, Scacco M, Teitelbaum CS, Cagnacci F
Published in: Proceedings of the Royal Society B
Abstract:
Who conducts biological research, where, and how results are disseminated varies among geographies and identities. Identifying and documenting these forms of bias by research communities is a critical step towards addressing them. We documented perceived and observed biases in movement ecology, a rapidly expanding sub-discipline of biology, which is strongly underpinned by fieldwork and technology use. We surveyed attendees before an international conference to assess a baseline within-discipline perceived bias (uninformed perceived bias). We analysed geographic patterns in Movement Ecology articles, finding discrepancies between the country of the authors’ affiliation and study site location, related to national economics. We analysed race-gender identities of USA biology researchers (the closest-to-our-sub-discipline with data available), finding that they differed from national demographics. Finally, we discussed the quantitatively-observed bias at the conference, to assess within-discipline perceived bias informed with observational data (informed perceived bias). Although the survey indicated most conference participants as bias-aware, conversations only covered a subset of biases. We discuss potential causes of bias (parachute-science, fieldwork accessibility), solutions, and the need to evaluate mitigatory action effectiveness. Undertaking data-driven analysis of bias within sub-disciplines can help identify specific barriers and move towards the inclusion of a greater diversity of participants in the scientific process.
The text data used in the current study was collected through a semistructured interview: the questions used in the interview addressed what was the participant’s role in a decision was, what he did or did not do; what did the participant reckon the outcomes, the decision-making process; and questions which intended to bring elements of behavioral biases, both cognitive and emotional, in the decision-making process. Data analysis was supported by content analysis, The interviews were fully transcript, and the text files were imported into the software NVivo® v. 11. The software was used for organizing, managing, and coding data, as well as for generating maps for grouping the results for interpretation. Therefore, the file attached contains the text data, nodes, categorization, and maps used by the authors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About the Dataset
Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.
Description of the data files
This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain following data:
NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labels
Statistics.png: contains all Umami statistics for NewsUnravel's usage data
Feedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasons
Content.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentences and the bias rating, and reason, if given
Article.csv: holds the article ID, title, source, article meta data, article topic, and bias amount in %
Participant.csv: holds the participant IDs and data processing consent
https://opensource.org/licenses/GPL-3.0https://opensource.org/licenses/GPL-3.0
The dataset contains data gathered from a Qualtrics panel sample involving a previous retail shopping experience. The dataset was gathered to help test and validate methods for dealing with nonresponse bias in the following paper:France, S. L., Adams, F. G., Landers, V. M. (2024). Worst Case Resistance Testing: A Nonresponse Bias Solution for Today’s Survey Research Realities, forthcoming in Survey Research Methods.The rationale for the use of this sample was to gather response data from a well known and validated set of instruments. The questionnaire adapts instruments previous developed in a previous paper. As noted in the paper “as Szymanski & Henard (2001) have over 3,000 citations, and pose relatively simple questions, they were judged as liable to provide stable results, and unlikely to represent confounding factors due to their complexity.” Szymanski, D. M., & Henard, D. H. (2001). Customer satisfaction: A meta-analysis of the empirical evidence. Journal of the Academy of Marketing Science, 29(1), 16-35.
The U.S. Geological Survey (USGS) has compiled national shoreline data for more than 20 years to document coastal change and serve the needs of research, management, and the public. Maintaining a record of historical shoreline positions is an effective method to monitor national shoreline evolution over time, enabling scientists to identify areas most susceptible to erosion or accretion. These data can help coastal managers and planners understand which areas of the coast are vulnerable to change. This data release includes one new mean high water (MHW) shoreline extracted from lidar data collected in 2017 for the entire coastal region of North Carolina which is divided into four subregions: northern North Carolina (NCnorth), central North Carolina (NCcentral), southern North Carolina (NCsouth), and western North Carolina (NCwest). Previously published historical shorelines for North Carolina (Kratzmann and others, 2017) were combined with the new lidar shoreline to calculate long-term (up to 169 years) and short-term (up to 20 years) rates of change. Files associated with the long-term and short-term rates are appended with "LT" and "ST", respectively. A proxy-datum bias reference line that accounts for the positional difference in a proxy shoreline (e.g. High Water Line (HWL) shoreline) and a datum shoreline (e.g. MHW shoreline) is also included in this release.
This research utilizes cognitive neuroscience and information systems research to predict user engagement and decision-making in digital platforms. By applying Natural Language Processing (NLP) techniques and cognitive bias theories, we investigate user interactions with synonyms in digital content. Our approach incorporates four cognitive biases - representativeness, ease-of-use (processing fluency), affect-biased attention, and distribution/availability (R.E.A.D) - into a comprehensive model. The model's predictive capacity was evaluated using a large user survey, revealing that synonyms representative of core concepts, easy to process, emotionally resonant, and readily available, fostered increased user engagement. Importantly, our research provides a novel perspective on human-computer interaction, digital habits, and decision-making processes. Findings underscore the potential of cognitive biases as powerful predictors of user engagement, emphasizing their role in effective digital content design across education, marketing, and beyond.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the systematic literature review conducted in the study "Peer Review Under Scrutiny: Systematic Evidence of Bias in Research Funding". The data comprise a curated collection of empirical studies that investigated the existence of biases in peer review processes within research funding agencies worldwide.
The dataset includes detailed categorizations based on the types of biases investigated, methodologies employed, data sources, and the confirmation status of each bias identified in the selected studies. The file was structured to facilitate further analyses, replications, and methodological reviews in the field of research evaluation and science policy studies.
Data were collected through systematic searches in Scopus and Web of Science databases, followed by rigorous screening and classification procedures. The dataset may be particularly useful for researchers, policymakers, and evaluators interested in improving transparency and equity in research funding mechanisms.
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study investigates experiences surrounding hate and bias crimes and incidents and reasons and factors affecting reporting and under-reporting among youth and adults in LGBT, immigrant, Hispanic, Black, and Muslim communities in New Jersey and Los Angeles County, California. The collection includes 1 SPSS data file (QB_FinalDataset-Revised.sav (n=1,326; 513 variables)). The collection also contains 24 qualitative data files of transcripts from focus groups and interviews with key informants, which are not included in this release.
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The AI Bias Audit Services market has emerged as a critical sector in response to the increasing reliance on artificial intelligence (AI) across various industries. As organizations integrate AI systems into their operations, concerns about bias-stemming from data quality, algorithmic fairness, and ethical considera
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionDevelopments in Artificial Intelligence (AI) are adopted widely in healthcare. However, the introduction and use of AI may come with biases and disparities, resulting in concerns about healthcare access and outcomes for underrepresented indigenous populations. In New Zealand, Māori experience significant inequities in health compared to the non-Indigenous population. This research explores equity concepts and fairness measures concerning AI for healthcare in New Zealand.MethodsThis research considers data and model bias in NZ-based electronic health records (EHRs). Two very distinct NZ datasets are used in this research, one obtained from one hospital and another from multiple GP practices, where clinicians obtain both datasets. To ensure research equality and fair inclusion of Māori, we combine expertise in Artificial Intelligence (AI), New Zealand clinical context, and te ao Māori. The mitigation of inequity needs to be addressed in data collection, model development, and model deployment. In this paper, we analyze data and algorithmic bias concerning data collection and model development, training and testing using health data collected by experts. We use fairness measures such as disparate impact scores, equal opportunities and equalized odds to analyze tabular data. Furthermore, token frequencies, statistical significance testing and fairness measures for word embeddings, such as WEAT and WEFE frameworks, are used to analyze bias in free-form medical text. The AI model predictions are also explained using SHAP and LIME.ResultsThis research analyzed fairness metrics for NZ EHRs while considering data and algorithmic bias. We show evidence of bias due to the changes made in algorithmic design. Furthermore, we observe unintentional bias due to the underlying pre-trained models used to represent text data. This research addresses some vital issues while opening up the need and opportunity for future research.DiscussionsThis research takes early steps toward developing a model of socially responsible and fair AI for New Zealand's population. We provided an overview of reproducible concepts that can be adopted toward any NZ population data. Furthermore, we discuss the gaps and future research avenues that will enable more focused development of fairness measures suitable for the New Zealand population's needs and social structure. One of the primary focuses of this research was ensuring fair inclusions. As such, we combine expertise in AI, clinical knowledge, and the representation of indigenous populations. This inclusion of experts will be vital moving forward, proving a stepping stone toward the integration of AI for better outcomes in healthcare.