100+ datasets found

Z
Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...
data.niaid.nih.gov
zenodo.org
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haak, Fabian (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
Explore at:
Dataset updated
Mar 1, 2023
Dataset provided by
Haak, Fabian
Schaer, Philipp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

Dataset 2: Search Query Suggestions (suggestions.csv)

The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

AllSides Scraper

At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
Unveiling Engagement and Platform Algorithmic Biases in Social Media Data...
osf.io
Updated Feb 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei Zhong (2025). Unveiling Engagement and Platform Algorithmic Biases in Social Media Data Collection and Analysis: An Experimental Study [Dataset]. https://osf.io/ys7a8
Explore at:
Dataset updated
Feb 16, 2025
Dataset provided by
Center for Open Sciencehttps://cos.io/
Authors
Wei Zhong
Description
No description was included in this Dataset collected from the OSF
NewsUnravel Dataset
zenodo.org
csv
Updated Sep 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anonymous; anonymous (2023). NewsUnravel Dataset [Dataset]. http://doi.org/10.5281/zenodo.8344882
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8344882
Dataset updated
Sep 14, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
anonymous; anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About the Dataset
Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.

Description of the data files
This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain following data:

NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labels
Statistics.png: contains all Umami statistics for NewsUnravel's usage data
Feedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasons
Content.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentences and the bias rating, and reason, if given
Article.csv: holds the article ID, title, source, article meta data, article topic, and bias amount in %
Participant.csv: holds the participant IDs and data processing consent
Z
Risk of bias in observational studies using routinely collected data of...
data.niaid.nih.gov
zenodo.org
Updated Oct 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nguyen Thu Van (2021). Risk of bias in observational studies using routinely collected data of comparative effectiveness research: a meta-research study [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5543468
Explore at:
Dataset updated
Oct 1, 2021
Dataset authored and provided by
Nguyen Thu Van
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We performed a meta-research study by searching PubMed for comparative effectiveness observational studies evaluating therapeutic interventions using routinely collected data published in high impact factor journals from 01/06/2018 to 30/06/2020. We assessed the reporting of study design (i.e., eligibility, treatment assignment, and the start of follow-up). Risk of selection bias and immortal time bias was determined by assessing if the time of eligibility, treatment assignment and the start of follow-up were synchronised to mimic the randomisation following the target trial emulation framework.
d
Replication Data for: Reducing Political Bias in Political Science Estimates...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zigerell, Lawrence (2023). Replication Data for: Reducing Political Bias in Political Science Estimates [Dataset]. http://doi.org/10.7910/DVN/PZLCJM
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/PZLCJM
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Zigerell, Lawrence
Description
Political science researchers have flexibility in how to analyze data, how to report data, and whether to report on data. Review of examples of reporting flexibility from the race and sex discrimination literature illustrates how research design choices can influence estimates and inferences. This reporting flexibility—coupled with the political imbalance among political scientists—creates the potential for political bias in reported political science estimates, but this potential for political bias can be reduced or eliminated through preregistration and preacceptance, in which researchers commit to a research design before completing data collection. Removing the potential for reporting flexibility can raise the credibility of political science research.
c
Biased cognition in East Asian and Western Cultures: Behavioural data...
datacatalogue.cessda.eu
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiend, J (2025). Biased cognition in East Asian and Western Cultures: Behavioural data 2016-2018 [Dataset]. http://doi.org/10.5255/UKDA-SN-853644
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-853644
Dataset updated
Mar 25, 2025
Dataset provided by
King
Authors
Yiend, J
Time period covered
Nov 30, 2013 - Aug 29, 2017
Area covered
Hong Kong, United Kingdom
Variables measured
Individual
Measurement technique
Participants: Local Hong Kong and UK natives; short term and long term migrants in each country, aged 16-65 with no current major physical illness or psychological disorder, who were not receiving psychological therapy or medication for psychological conditions.Sampling procedure: Participants were recruited using circular emails, public flyers and other advertisements in local venues, universities and clubs. Data collection: Participants completed four previously developed and validated cognitive bias tasks (emotional Stroop, attention probe, similarity ratings task and scrambled sentence task) in their native language. They also completed socio-demographic information and questionnaires.
Description
This data collection consists of behavioural task data for measures of attention and interpretation bias, specifically: emotional Stroop, attention probe (both measuring attention bias) and similarity ratings task and scrambled sentence task (both measuring interpretation bias). Data on the following 6 participant groups are included in the dataset: native UK (n=36), native HK (n=39), UK migrants to HK (short term = 31, long term = 28) and HK migrants to UK (short term = 37, long term = 31). Also included are personal characteristics and questionnaire measures.
The way in which we process information in the world around us has a significant effect on our health and well being. For example, some people are more prone than others to notice potential dangers, to remember bad things from the past and assume the worst, when the meaning of an event or comment is uncertain. These tendencies are called negative cognitive biases and can lead to low mood and poor quality of life. They also make people vulnerable to mental illnesses. In contrast, those with positive cognitive biases tend to function well and remain healthy. To date most of this work has been conducted on white, western populations and we do not know whether similar cognitive biases exist in Eastern cultures. This project will examine cognitive biases in Eastern (Hong Kong nationals ) and Western (UK nationals) people to see whether there are any differences between the two. It will also examine what happens to cognitive biases when someone migrates to a different culture. This will tell us whether influences from the society and culture around us have any effect on our cognitive biases. Finally the project will consider how much our own cognitive biases are inherited from our parents. Together these results will tell us whether the known good and bad effects of cognitive biases apply to non Western cultural groups as well, and how much cognitive biases are decided by our genes or our environment.
d
Data from: Questioning Bias: Validating a Bias Crime Assessment Tool in...
catalog.data.gov
icpsr.umich.edu
+1more
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Questioning Bias: Validating a Bias Crime Assessment Tool in California and New Jersey, 2016-2017 [Dataset]. https://catalog.data.gov/dataset/questioning-bias-validating-a-bias-crime-assessment-tool-in-california-and-new-jersey-2016-a062f
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Area covered
New Jersey, California
Description
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study investigates experiences surrounding hate and bias crimes and incidents and reasons and factors affecting reporting and under-reporting among youth and adults in LGBT, immigrant, Hispanic, Black, and Muslim communities in New Jersey and Los Angeles County, California. The collection includes 1 SPSS data file (QB_FinalDataset-Revised.sav (n=1,326; 513 variables)). The collection also contains 24 qualitative data files of transcripts from focus groups and interviews with key informants, which are not included in this release.
c
Bias feature containing proxy-datum bias information to be used in the...
s.cnmilf.com
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Bias feature containing proxy-datum bias information to be used in the Digital Shoreline Analysis System for the southern coast of North Carolina from Cape Lookout to Cape Fear (NCsouth) [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/bias-feature-containing-proxy-datum-bias-information-to-be-used-in-the-digital-shoreline-a-7bc9c
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Cape Lookout
Description
The U.S. Geological Survey (USGS) has compiled national shoreline data for more than 20 years to document coastal change and serve the needs of research, management, and the public. Maintaining a record of historical shoreline positions is an effective method to monitor national shoreline evolution over time, enabling scientists to identify areas most susceptible to erosion or accretion. These data can help coastal managers and planners understand which areas of the coast are vulnerable to change. This data release includes one new mean high water (MHW) shoreline extracted from lidar data collected in 2017 for the entire coastal region of North Carolina which is divided into four subregions: northern North Carolina (NCnorth), central North Carolina (NCcentral), southern North Carolina (NCsouth), and western North Carolina (NCwest). Previously published historical shorelines for North Carolina (Kratzmann and others, 2017) were combined with the new lidar shoreline to calculate long-term (up to 169 years) and short-term (up to 20 years) rates of change. Files associated with the long-term and short-term rates are appended with "LT" and "ST", respectively. A proxy-datum bias reference line that accounts for the positional difference in a proxy shoreline (e.g. High Water Line (HWL) shoreline) and a datum shoreline (e.g. MHW shoreline) is also included in this release.
f
Biases and mitigation strategies in Classical and Digital Epidemiology.
plos.figshare.com
xls
Updated Jan 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Mesquita; Lília Perfeito; Daniela Paolotti; Joana Gonçalves-Sá (2025). Biases and mitigation strategies in Classical and Digital Epidemiology. [Dataset]. http://doi.org/10.1371/journal.pdig.0000670.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000670.t001
Dataset updated
Jan 13, 2025
Dataset provided by
PLOS Digital Health
Authors
Sara Mesquita; Lília Perfeito; Daniela Paolotti; Joana Gonçalves-Sá
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Biases and mitigation strategies in Classical and Digital Epidemiology.
d
Data from: Mapping species richness using opportunistic samples: a case...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Dec 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Neyens; Peter Diggle; Christel Faes; Natalie Beenaerts; Tom Artois; Emanuele Giorgi (2019). Mapping species richness using opportunistic samples: a case study on ground-floor bryophyte species richness in the Belgian province of Limburg [Dataset]. http://doi.org/10.5061/dryad.brv15dv5r
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.brv15dv5r
Dataset updated
Dec 6, 2019
Dataset provided by
Dryad
Authors
Thomas Neyens; Peter Diggle; Christel Faes; Natalie Beenaerts; Tom Artois; Emanuele Giorgi
Time period covered
2019
Area covered
Belgium
Description
In species richness studies, citizen-science surveys where participants make individual decisions regarding sampling strategies provide a cost-effective approach to collect a large amount of data. However, it is unclear to what extent the bias inherent to opportunistically collected samples may invalidate our inferences. Here, we compare spatial predictions of forest ground-floor bryophyte species richness in Limburg (Belgium), based on crowd- and expert-sourced data, where the latter are collected by adhering to a rigorous geographical randomisation and data collection protocol. We develop a log-Gaussian Cox process model to analyse the opportunistic sampling process of the crowd-sourced data and assess its sampling bias. We then fit two geostatistical Poisson models to both data-sets and compare the parameter estimates and species richness predictions. We find that the citizens had a higher propensity for locations that were close to their homes and environmentally more valuable. The ...
d
Across space and time: a review of sampling and analytical biases in fossil...
datadryad.org
zip
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karma Nanglu; Thomas Cullen, Across space and time: a review of sampling and analytical biases in fossil data across macroecological scales [Dataset]. http://doi.org/10.5061/dryad.6djh9w143
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.6djh9w143
Dataset provided by
Dryad
Authors
Karma Nanglu; Thomas Cullen
Time period covered
2022
Description
Quantitative studies of fossil data have proven critical to a number of major macroevolutionary and macroecological discoveries, such as the ‘Big 5’ mass extinctions of the Phanerozoic. The development and easy accessibility of major meta-data sources such as the Paleobiology Database and Geobiodiversity Database have also spurred the widespread application of these data to testing ecological hypotheses at finer spatiotemporal and phylogenetic scales. However, issues of preservational/taphonomic biases, sampling/collecting biases, taxonomic issues, and analytical choice can impact the degree of interpretative resolution possible, and even obscure biological ‘signal’ from error/bias-introduced ‘noise’. The degree to which these factors can impact analytical interpretations is not well-documented in comparison to the scale of use of these data sources. Here, we review the many forms of systematic error that can creep into a paleoecological study, from the stage of data collection to the i...
d
Bias estimation for seven precipitation datasets for the eastern MENA region...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Bias estimation for seven precipitation datasets for the eastern MENA region [Dataset]. https://catalog.data.gov/dataset/bias-estimation-for-seven-precipitation-datasets-for-the-eastern-mena-region
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Middle East and North Africa
Description
Information on the spatio-temporal distribution of rainfall is critical for addressing water-related disasters, especially in the Middle East and North Africa's (MENA) arid to semi-arid regions. However, the availability of reliable rainfall datasets for most river basins is limited. In this study, we utilized observations from satellite-based rainfall data, in situ rain gauge observations, and rainfall climatology to determine the most suitable precipitation dataset in the MENA region. This dataset includes the supporting data and graphics for the analysis. The collection includes a spreadsheet containing all the data for the tables and charts, as well as the text file for the in situ data collected and used for the analysis.

Data from: Mitigating Biases in Collective Decision-Making: Enhancing...

zenodo.org
data.niaid.nih.gov

csv

Updated Mar 17, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Axel Abels; Axel Abels (2024). Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News [Dataset]. http://doi.org/10.5281/zenodo.10794209

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.10794209

Dataset updated

Mar 17, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Axel Abels; Axel Abels

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data supporting "Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News".

If you use this dataset in your own research, please cite this paper:

```
@misc{abels2024mitigating,
title={Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News},
author={Axel Abels and Elias Fernandez Domingos and Ann Nowé and Tom Lenaerts},
year={2024},
eprint={2403.08829},
archivePrefix={arXiv},
primaryClass={cs.HC}
}
```

column name	description
treatment	identifier for the set of headlines presented to the participant
trial	trial/round in which the headline was presented
arm	which "arm" the headline was presented as (0=left, 1=middle, 2=right)
advice	the participant's response (0=very unlikely, 0.25=unlikely, 0.5=undecided, 0.75=likely, 1=very likely)
genuine	whether the headline was genuine (1) or altered (0)
headline	the headline as shown to the participant
original	the headline before a possible alteration
expert_id	participant's identifier
sentiment	whether the headline reported a negative (-1) or positive (1) outcome
expert:ethnicity	the participant's ethnicity
expert:sex	the participant's sex
expert:age	the participant's age
outcome:white, outcome:black, outcome:young, outcome:old, outcome:male, outcome:female	whether the headline reported a negative (-1) or positive (1) or neutral (0) outcome for the specified group
trial_time	how long the participant took to respond to the trial/round

abstract
Individual and social biases undermine the effectiveness of human advisers by inducing judgment errors which can disadvantage protected groups. In this paper, we study the influence these biases can have in the pervasive problem of fake news by evaluating human participants' capacity to identify false headlines. By focusing on headlines involving sensitive characteristics, we gather a comprehensive dataset to explore how human responses are shaped by their biases. Our analysis reveals recurring individual biases and their permeation into collective decisions. We show that demographic factors, headline categories, and the manner in which information is presented significantly influence errors in human judgment. We then use our collected data as a benchmark problem on which we evaluate the efficacy of adaptive aggregation algorithms. In addition to their improved accuracy, our results highlight the interactions between the emergence of collective intelligence and the mitigation of participant biases.

Z
Data from: Approach-induced biases in human information sampling
data.niaid.nih.gov
zenodo.org
+1more
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dolan, Raymond J. (2024). Data from: Approach-induced biases in human information sampling [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4946669
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Dolan, Raymond J.
Malalasekera, W. M. Nishantha
Hunt, Laurence T.
Kennerley, Steven W.
Rutledge, Robb B.
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
IInformation sampling is often biased towards seeking evidence that confirms one's prior beliefs. Despite such biases being a pervasive feature of human behavior, their underlying causes remain unclear. Many accounts of these biases appeal to limitations of human hypothesis testing and cognition, de facto evoking notions of bounded rationality, but neglect more basic aspects of behavioral control. Here, we investigated a potential role for Pavlovian approach in biasing which information humans will choose to sample. We collected a large novel dataset from 32,445 human subjects, making over 3 million decisions, who played a gambling task designed to measure the latent causes and extent of information-sampling biases. We identified three novel approach-related biases, formalized by comparing subject behavior to a dynamic programming model of optimal information gathering. These biases reflected the amount of information sampled ("positive evidence approach"), the selection of which information to sample ("sampling the favorite"), and the interaction between information sampling and subsequent choices ("rejecting unsampled options"). The prevalence of all three biases was related to a Pavlovian approach-avoid parameter quantified within an entirely independent economic decision task. Our large dataset also revealed that individual differences in the amount of information gathered are a stable trait across multiple gameplays and can be related to demographic measures, including age and educational attainment. As well as revealing limitations in cognitive processing, our findings suggest information sampling biases reflect the expression of primitive, yet potentially ecologically adaptive, behavioral repertoires. One such behavior is sampling from options that will eventually be chosen, even when other sources of information are more pertinent for guiding future action.
Data from: Racial Bias in AI-Generated Images
ssh.datastations.nl
openicpsr.org
Updated Aug 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Y. Yang; Y. Yang (2024). Racial Bias in AI-Generated Images [Dataset]. http://doi.org/10.17026/SS/7MQV4M
Explore at:
text/x-fixed-field(28980), application/x-spss-sav(67438), application/x-spss-syntax(1998)Available download formats
Unique identifier
https://doi.org/10.17026/SS/7MQV4M
Dataset updated
Aug 1, 2024
Dataset provided by
Data Archiving and Networked Services
Authors
Y. Yang; Y. Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file is supplementary material for the manuscript Racial Bias in AI-Generated Images, which has been submitted to a peer-reviewed journal. This dataset/paper examined the image-to-image generation accuracy (i.e., the original race and gender of a person’s image were replicated in the new AI-generated image) of a Chinese AI-powered image generator. We examined the image-to-image generation models transforming the racial and gender categories of the original photos of White, Black and East Asian people (N =1260) in three different racial photo contexts: a single person, two people of the same race, and two people of different races.
c
Data from: Psychological and Behavioral Effects of Bias- and...
s.cnmilf.com
datasets.ai
+2more
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Psychological and Behavioral Effects of Bias- and Non-Bias-Motivated Assault in Boston, Massachusetts, 1992-1997 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/psychological-and-behavioral-effects-of-bias-and-non-bias-motivated-assault-in-boston-1992-6add9
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Area covered
Massachusetts, Boston
Description
This study sought to inform various issues related to the extent of victims' adverse psychological and behavioral reactions to aggravated assault differentiated by the offenders' bias or non-bias motives. The goals of the research included (1) identifying the individual and situational factors related to bias- and non-bias-motivated aggravated assault, (2) determining the comparative severity and duration of psychological after-effects attributed to the victimization experience, and (3) measuring the comparative extent of behavioral avoidance strategies of victims. Data were collected on all 560 cases from the Boston Police Department's Community Disorders Unit from 1992 to 1997 that involved victim of a bias-motivated aggravated assault. In addition, data were collected on a 10-percent stratified random sample of victims of non-bias assaults within the city of Boston from 1993 to 1997, resulting in another 544 cases. For each of the cases, information was collected from each police incident report. Additionally, the researchers attempted to contact each victim in the sample to participate in a survey about their victimization experiences. The victim questionnaires included questions in five general categories: (1) incident information, (2) police response, (3) prosecutor response, (4) personal impact of the crime, and (5) respondent's personal characteristics. Criminal history variables were also collected regarding the number and type of adult and juvenile arrest charges against offenders and victims, as well as dispositions and arraignment dates.
Data from: XBT and CTD pairs dataset Version 1
data.csiro.au
data.gov.au
Updated Oct 16, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebecca Cowley; Tim Boyer; Shoichi Kizu; Kimio Hanawa; Gustavo Goni; Esmee Van Wijk; Steve Rintoul; Mark Rosenberg (2014). XBT and CTD pairs dataset Version 1 [Dataset]. http://doi.org/10.4225/08/52AE99A4663B1
Explore at:
Unique identifier
https://doi.org/10.4225/08/52AE99A4663B1
Dataset updated
Oct 16, 2014
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Rebecca Cowley; Tim Boyer; Shoichi Kizu; Kimio Hanawa; Gustavo Goni; Esmee Van Wijk; Steve Rintoul; Mark Rosenberg
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Time period covered
Jan 1, 1967 - Dec 31, 2011
Area covered
Dataset funded by
Bundesamt für Seeschifffahrt und Hydrographie (BSH)
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
University of Miami
CSIROhttp://www.csiro.au/
Tohoku University
ACE/CRC
Description
The XBT/CTD pairs dataset (Version 1) is the dataset used to calculate the historical XBT fall rate and temperature corrections presented in Cowley, R., Wijffels, S., Cheng, L., Boyer, T., and Kizu, S. (2013). Biases in Expendable Bathythermograph Data: A New View Based on Historical Side-by-Side Comparisons. Journal of Atmospheric and Oceanic Technology, 30, 1195–1225, doi:10.1175/JTECH-D-12-00127.1.
http://journals.ametsoc.org/doi/abs/10.1175/JTECH-D-12-00127.1

4,115 pairs from 114 datasets were used to derive the fall rate and temperature corrections. Each dataset contains the scientifically quality controlled version and (where available) the originator's data. The XBT/CTD pairs are identified in the document 'XBT_CTDpairs_metadata_V1.csv'. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets. Lineage: Data is sourced from the World Ocean Database, NOAA, CSIRO Marine and Atmospheric Research, Bundesamt für Seeschifffahrt und Hydrographie (BSH), Hamburg, Germany, Australian Antarctic Division. Original and raw data files are included where available. Quality controlled datasets follow the procedure of Bailey, R., Gronell, A., Phillips, H., Tanner, E., and Meyers, G. (1994). Quality control cookbook for XBT data, Version 1.1. CSIRO Marine Laboratories Reports, 221. Quality controlled data is in the 'MQNC' format used at CSIRO Marine and Atmospheric Research. The MQNC format is described in the document 'XBT_CTDpairs_descriptionV1.pdf'. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets.
Z
Data from: Sampling biases shape our view of the natural world
data.niaid.nih.gov
zenodo.org
+1more
Updated Jun 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiao, Huijie (2022). Sampling biases shape our view of the natural world [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4950129
Explore at:
Dataset updated
Jun 4, 2022
Dataset provided by
Qiao, Huijie
Hughes, Alice
Zhu, Chaodong
Ma, Keping
Orr, Michael
Costello, Mark
Yang, Qinmin
Provoost, Pieter
Waller, John
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Spatial patterns of biodiversity are inextricably linked to their collection methods, yet no synthesis of bias patterns or their consequences exists. As such, views of organismal distribution and the ecosystems they make up may be incorrect, undermining countless ecological and evolutionary studies. Using 742 million records of 374,900 species, we explore the global patterns and impacts of biases related to taxonomy, accessibility, ecotype, and data type across terrestrial and marine systems. Pervasive sampling and observation biases exist across animals, with only 6.74% of the globe sampled, and disproportionately poor tropical sampling. High -elevations and deep -seas are particularly unknown. Over 50% of records in most groups account for under 2% of species, and citizen-science only exacerbates biases. Additional data will be needed to overcome many of these biases, but we must increasingly value data publication to bridge this gap and better represent species' distributions from more distant and inaccessible areas, and provide the necessary basis for conservation and management.
ISW3.0 - Women's experiences working in sport and exercise science academia
figshare.com
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emma Cowley (2024). ISW3.0 - Women's experiences working in sport and exercise science academia [Dataset]. http://doi.org/10.6084/m9.figshare.22647076.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.22647076.v1
Dataset updated
Jul 4, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Emma Cowley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is survey data from a mixed methods study exploring women's experiences working in sport and exercise science academia, and is part of the Invisible Sportswomen research collection. This dataset contains the quantitative portion of the survey, where qualitative data has been removed to protect participant anonymity.
Data from: Accounting for imperfect detection in data from museums and...
zenodo.org
data.niaid.nih.gov
+1more
bin, csv
Updated Jun 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelley D. Erickson; Kelley D. Erickson; Adam B. Smith; Adam B. Smith (2022). Accounting for imperfect detection in data from museums and herbaria when modeling species distributions: Combining and contrasting data-level versus model-level bias correction [Dataset]. http://doi.org/10.5061/dryad.51c59zw8b
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.51c59zw8b
Dataset updated
Jun 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kelley D. Erickson; Kelley D. Erickson; Adam B. Smith; Adam B. Smith
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The digitization of museum collections as well as an explosion in citizen science initiatives has resulted in a wealth of data that can be useful for understanding the global distribution of biodiversity, provided that the well-documented biases inherent in unstructured opportunistic data are accounted for. While traditionally used to model imperfect detection using structured data from systematic surveys of wildlife, occupancy models provide a framework for modelling the imperfect collection process that results in digital specimen data. In this study, we explore methods for adapting occupancy models for use with biased opportunistic occurrence data from museum specimens and citizen science platforms using 7 species of Anacardiaceae in Florida as a case study. We explored two methods of incorporating information about collection effort to inform our uncertainty around species presence: (1) filtering the data to exclude collectors unlikely to collect the focal species and (2) incorporating collection covariates (collection type, time of collection, and history of previous detections) into a model of collection probability. We found that the best models incorporated both the background data filtration step as well as collector covariates. Month, method of collection and whether a collector had previously collected the focal species were important predictors of collection probability. Efforts to standardize meta-data associated with data collection will improve efforts for modeling the spatial distribution of a variety of species.

Facebook

Twitter

Click to copy link

Link copied

Cite

Haak, Fabian (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions

Explore at:

Dataset updated

Mar 1, 2023

Dataset provided by

Haak, Fabian
Schaer, Philipp

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

Dataset 2: Search Query Suggestions (suggestions.csv)

The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

AllSides Scraper

At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

Clear search

Close search

Google apps

Main menu

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

Unveiling Engagement and Platform Algorithmic Biases in Social Media Data...

NewsUnravel Dataset

Risk of bias in observational studies using routinely collected data of...

Replication Data for: Reducing Political Bias in Political Science Estimates...

Biased cognition in East Asian and Western Cultures: Behavioural data...

Data from: Questioning Bias: Validating a Bias Crime Assessment Tool in...

Bias feature containing proxy-datum bias information to be used in the...

Biases and mitigation strategies in Classical and Digital Epidemiology.

Data from: Mapping species richness using opportunistic samples: a case...

Across space and time: a review of sampling and analytical biases in fossil...

Bias estimation for seven precipitation datasets for the eastern MENA region...

Data from: Mitigating Biases in Collective Decision-Making: Enhancing...

Data from: Approach-induced biases in human information sampling

Data from: Racial Bias in AI-Generated Images

Data from: Psychological and Behavioral Effects of Bias- and...

Data from: XBT and CTD pairs dataset Version 1

Data from: Sampling biases shape our view of the natural world

ISW3.0 - Women's experiences working in sport and exercise science academia

Data from: Accounting for imperfect detection in data from museums and...

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions