100+ datasets found

n
Data from: Reliable species distributions are obtainable with sparse, patchy...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated May 31, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samantha L. Peel; Nicole A. Hill; Scott D. Foster; Simon J. Wotherspoon; Claudio Ghiglione; Stefano Schiaparelli (2019). Reliable species distributions are obtainable with sparse, patchy and biased data by leveraging over species and data types [Dataset]. http://doi.org/10.5061/dryad.2226v8m
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2226v8m
Dataset updated
May 31, 2019
Dataset provided by
University of Tasmania
Commonwealth Scientific and Industrial Research Organisation
Italian National Antarctic Museum (MNA, Section of Genoa) Genoa Italy
Authors
Samantha L. Peel; Nicole A. Hill; Scott D. Foster; Simon J. Wotherspoon; Claudio Ghiglione; Stefano Schiaparelli
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
New methods for species distribution models (SDMs) utilise presence‐absence (PA) data to correct the sampling bias of presence‐only (PO) data in a spatial point process setting. These have been shown to improve species estimates when both data sets are large and dense. However, is a PA data set that is smaller and patchier than hitherto examined able to do the same? Furthermore, when both data sets are relatively small, is there enough information contained within them to produce a useful estimate of species’ distributions? These attributes are common in many applications.

A stochastic simulation was conducted to assess the ability of a pooled data SDM to estimate the distribution of species from increasingly sparser and patchier data sets. The simulated data sets were varied by changing the number of presence‐absence sample locations, the degree of patchiness of these locations, the number of PO observations, and the level of sampling bias within the PO observations. The performance of the pooled data SDM was compared to a PA SDM and a PO SDM to assess the strengths and limitations of each SDM.

The pooled data SDM successfully removed the sampling bias from the PO observations even when the presence‐absence data was sparse and patchy, and the PO observations formed the majority of the data. The pooled data SDM was, in general, more accurate and more precise than either the PA SDM or the PO SDM. All SDMs were more precise for the species responses than they were for the covariate coefficients.

The emerging SDM methodology that pools PO and PA data will facilitate more certainty around species’ distribution estimates, which in turn will allow more relevant and concise management and policy decisions to be enacted. This work shows that it is possible to achieve this result even in relatively data‐poor regions.
d
Hate Crimes by County and Bias Type: Beginning 2010
catalog.data.gov
datadiscoverystudio.org
+3more
Updated Nov 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ny.gov (2023). Hate Crimes by County and Bias Type: Beginning 2010 [Dataset]. https://catalog.data.gov/dataset/hate-crimes-by-county-and-bias-type-beginning-2010
Explore at:
Dataset updated
Nov 10, 2023
Dataset provided by
data.ny.gov
Description
Under New York State’s Hate Crime Law (Penal Law Article 485), a person commits a hate crime when one of a specified set of offenses is committed targeting a victim because of a perception or belief about their race, color, national origin, ancestry, gender, religion, religious practice, age, disability, or sexual orientation, or when such an act is committed as a result of that type of perception or belief. These types of crimes can target an individual, a group of individuals, or public or private property. DCJS submits hate crime incident data to the FBI’s Uniform Crime Reporting (UCR) Program. Information collected includes number of victims, number of offenders, type of bias motivation, and type of victim.
Data from: Wide range screening of algorithmic bias in word embedding models...
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Jun 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Rozado; David Rozado (2022). Data from: Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types [Dataset]. http://doi.org/10.5061/dryad.rbnzs7h7w
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.rbnzs7h7w
Dataset updated
Jun 2, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Rozado; David Rozado
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Concerns about gender bias in word embedding models have captured substantial attention in the algorithmic bias research literature. Other bias types however have received lesser amounts of scrutiny. This work describes a large-scale analysis of sentiment associations in popular word embedding models along the lines of gender and ethnicity but also along the less frequently studied dimensions of socioeconomic status, age, physical appearance, sexual orientation, religious sentiment and political leanings. Consistent with previous scholarly literature, this work has found systemic bias against given names popular among African-Americans in most embedding models examined. Gender bias in embedding models however appears to be multifaceted and often reversed in polarity to what has been regularly reported. Interestingly, using the common operationalization of the term bias in the fairness literature, novel types of so far unreported bias types in word embedding models have also been identified. Specifically, the popular embedding models analyzed here display negative biases against middle and working-class socioeconomic status, male children, senior citizens, plain physical appearance and intellectual phenomena such as Islamic religious faith, non-religiosity and conservative political orientation. Reasons for the paradoxical underreporting of these bias types in the relevant literature are probably manifold but widely held blind spots when searching for algorithmic bias and a lack of widespread technical jargon to unambiguously describe a variety of algorithmic associations could conceivably be playing a role. The causal origins for the multiplicity of loaded associations attached to distinct demographic groups within embedding models are often unclear but the heterogeneity of said associations and their potential multifactorial roots raises doubts about the validity of grouping them all under the umbrella term bias. Richer and more fine-grained terminology as well as a more comprehensive exploration of the bias landscape could help the fairness epistemic community to characterize and neutralize algorithmic discrimination more efficiently.
Number of new AI farness and bias metrics worldwide 2016-2022, by type
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of new AI farness and bias metrics worldwide 2016-2022, by type [Dataset]. https://www.statista.com/statistics/1378864/ai-fairness-bias-metrics-growth-worlwide/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
Worldwide
Description
There has been a continuous growth in the number of metrics used to analyze fairness and biases in artificial intelligence (AI) platforms since 2016. Diagnostic metrics have consistently been adapted more than benchmarks, with a peak of ** in 2019. It is quite likely that this is simply because more diagnostics need to be run to analyze data to create more accurate benchmarks, i.e. the diagnostics lead to benchmarks.
H
Replication Data for: Publication Biases in Replication Studies
dataverse.harvard.edu
Updated Sep 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam J. Berinsky; James N. Druckman; Teppei Yamamoto (2022). Replication Data for: Publication Biases in Replication Studies [Dataset]. http://doi.org/10.7910/DVN/BJMZNR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/BJMZNR
Dataset updated
Sep 28, 2022
Dataset provided by
Harvard Dataverse
Authors
Adam J. Berinsky; James N. Druckman; Teppei Yamamoto
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
One of the strongest findings across the sciences is that publication bias occurs. Of particular note is a “file drawer bias” where statistically significant results are privileged over non-significant results. Recognition of this bias, along with increased calls for “open science,” has led to an emphasis on replication studies. Yet, few have explored publication bias and its consequences in replication studies. We offer a model of the publication process involving an initial study and a replication. We use the model to describe three types of publication biases: 1) file drawer bias, 2) a “repeat study” bias against the publication of replication studies, and 3) a “gotcha bias” where replication results that run contrary to a prior study are more likely to be published. We estimate the model’s parameters with a vignette experiment conducted with political science professors teaching at Ph.D.-granting institutions in the United States. We find evidence of all three types of bias, although those explicitly involving replication studies are notably smaller. This bodes well for the replication movement. That said, the aggregation of all of the biases increases the number of false positives in a literature. We conclude by discussing a path for future work on publication biases.
d
Data from: Sampling methodology influences habitat suitability modeling for...
datadryad.org
search.dataone.org
+1more
zip
Updated Sep 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Gaulke; Tara Hohoff; Brittany Rogness; Mark Davis (2023). Sampling methodology influences habitat suitability modeling for Chiropteran species [Dataset]. http://doi.org/10.5061/dryad.t1g1jwt6r
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t1g1jwt6r
Dataset updated
Sep 5, 2023
Dataset provided by
Dryad
Authors
Sarah Gaulke; Tara Hohoff; Brittany Rogness; Mark Davis
Time period covered
2023
Description
Reference Information

Provenance for this README

File name: README.txt

Authors: Sarah M. Gaulke

Other contributors: Tara C. Hohoff, Brittany A. Rogness, Mark A. Davis

Date created: 2023-04-01

Date modified: 2023-04-01

Dataset Version and Release History

Current Version:

Number: 1.0.0

Date: 2023-04-01

Persistent identifier: DOI: 10.5061/dryad.t1g1jwt6r

Summary of changes: n/a

Embargo Provenance: n/a

Scope of embargo: n/a

Embargo period: n/a

Dataset Attribution and Usage

Dataset Title: Data for the article "Sampling Methodology Influences Habitat Suitability Modeling for Chiropteran Species"

Dataset Contributors:

Creators: Sarah M. Gaulke, Tara C. Hohoff, Brittany A. Rogness, Mark A. Davis

Date of Issue: 2023-01-16

Publisher: Ecology and Evolution

License: Use of these data is covered by the following license: ...
d
Hate Crimes in USA: Year-wise Victim Type by Bias Motivation
dataful.in
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). Hate Crimes in USA: Year-wise Victim Type by Bias Motivation [Dataset]. https://dataful.in/datasets/19757
Explore at:
application/x-parquet, xlsx, csvAvailable download formats
Dataset updated
May 27, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
United States
Variables measured
Count
Description
This dataset contains the yearly statistics on the victim types by bias motivation. Major categories of victim types include individuals, government, business/financial institution, religious organization, society/public and other or multiple victims. Major categories of bias motivations include Race/Ethnicity/Ancestry, Religion, Sexual Orientation, Disability, Gender and Gender Identity.
f
Different bias types studied in recent research.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenlong Sun; Olfa Nasraoui; Patrick Shafto (2023). Different bias types studied in recent research. [Dataset]. http://doi.org/10.1371/journal.pone.0235502.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0235502.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Wenlong Sun; Olfa Nasraoui; Patrick Shafto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Iterated algorithmic bias happens when an algorithm interacts with human response continuously, and updates its model after receiving feedback from the human. Meanwhile, the algorithm interacts with the human by showing only selected items or options. Other types of bias are static, which means they have a one-time influence on an algorithm.
Data from: Confirmation Bias in Web-Based Search: A Randomized Online Study...
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Schweiger; Stefan Schweiger; Ulrike Cress; Aileen Oeberst; Ulrike Cress; Aileen Oeberst (2020). Confirmation Bias in Web-Based Search: A Randomized Online Study on the Effects of Expert Information and Social Tags on Information Search and Evaluation [Dataset]. http://doi.org/10.5281/zenodo.3358127
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3358127
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stefan Schweiger; Stefan Schweiger; Ulrike Cress; Aileen Oeberst; Ulrike Cress; Aileen Oeberst
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT

Background: The public typically believes psychotherapy to be more effective than pharmacotherapy for depression treatments. This is not consistent with current scientific evidence, which shows that both types of treatment are about equally effective.

Objective: The study investigates whether this bias towards psychotherapy guides online information search and whether the bias can be reduced by explicitly providing expert information (in a blog entry) and by providing tag clouds that implicitly reveal experts’ evaluations.

Methods: A total of 174 participants completed a fully automated Web-based study after we invited them via mailing lists. First, participants read two blog posts by experts that either challenged or supported the bias towards psychotherapy. Subsequently, participants searched for information about depression treatment in an online environment that provided more experts’ blog posts about the effectiveness of treatments based on alleged research findings. These blogs were organized in a tag cloud; both psychotherapy tags and pharmacotherapy tags were popular. We measured tag and blog post selection, efficacy ratings of the presented treatments, and participants’ treatment recommendation after information search.

Results: Participants demonstrated a clear bias towards psychotherapy (mean 4.53, SD 1.99) compared to pharmacotherapy (mean 2.73, SD 2.41; t₁₇₃=7.67, P<.001, d=0.81) when rating treatment efficacy prior to the experiment. Accordingly, participants exhibited biased information search and evaluation. This bias was significantly reduced, however, when participants were exposed to tag clouds with challenging popular tags. Participants facing popular tags challenging their bias (n=61) showed significantly less biased tag selection (F_2,168=10.61, P<.001, partial eta squared=0.112), blog post selection (F_2,168=6.55, P=.002, partial eta squared=0.072), and treatment efficacy ratings (F_2,168=8.48, P<.001, partial eta squared=0.092), compared to bias-supporting tag clouds (n=56) and balanced tag clouds (n=57). Challenging (n=93) explicit expert information as presented in blog posts, compared to supporting expert information (n=81), decreased the bias in information search with regard to blog post selection (F_1,168=4.32, P=.04, partial eta squared=0.025). No significant effects were found for treatment recommendation (Ps>.33).

Conclusions: We conclude that the psychotherapy bias is most effectively attenuated—and even eliminated—when popular tags implicitly point to blog posts that challenge the widespread view. Explicit expert information (in a blog entry) was less successful in reducing biased information search and evaluation. Since tag clouds have the potential to counter biased information processing, we recommend their insertion.
Data from: Integrated species distribution models to account for sampling...
zenodo.org
data.niaid.nih.gov
+1more
bin, zip
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jussi Mäkinen; Jussi Mäkinen; Cory Merow; Walter Jetz; Cory Merow; Walter Jetz (2023). Data from: Integrated species distribution models to account for sampling biases and improve range wide occurrence predictions [Dataset]. http://doi.org/10.5061/dryad.k98sf7mdg
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.k98sf7mdg
Dataset updated
Nov 27, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jussi Mäkinen; Jussi Mäkinen; Cory Merow; Walter Jetz; Cory Merow; Walter Jetz
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Measurement technique
<p>A detailed methodology associated with the environmental variables and species data can be found from the references used in the original publication.</p>
Description
Aim

Species distribution models (SDMs) that integrate presence-only and presence-absence data offer a promising avenue to improve information on species' geographic distributions. The use of such 'integrated SDMs' on a species range-wide extent has been constrained by the often-limited presence-absence data and by the heterogeneous sampling of the presence-only data. Here, we evaluate integrated SDMs for studying species ranges with a novel expert range map-based evaluation. We build a new understanding about how integrated SDMs address issues of estimation accuracy and data deficiency and thereby offer advantages over traditional SDMs.

Location

South and Central America.

Time period

1979-2017.

Major taxa studied

Hummingbirds.

Methods

We build integrated SDMs by linking two observation models – one for each data type – to the same underlying spatial process. We validate SDMs with two schemes: i) cross-validation with presence-absence data and ii) comparison with respect to the species' whole range as defined with IUCN range maps. We also compare models relative to the estimated response curves and compute the association between the benefit of the data integration and the number of presence records in each data set.

Results

The integrated SDM accounting for the spatially varying sampling intensity of the presence-only data was one of the top-performing models in both model validation schemes. Presence-only data alleviated overly large niche estimates, and data integration was beneficial compared to modelling solely presence-only data for species that had few presence points when predicting the species' whole range. On the community level, integrated models improved the species richness prediction.

Main conclusions

Integrated SDMs combining presence-only and presence-absence data are successfully able to borrow strengths from both data types and offer improved predictions of species' ranges. Integrated SDMs can potentially alleviate the impacts of taxonomically and geographically uneven sampling and to leverage the detailed sampling information in presence-absence data.
d
Hate Crimes in USA: Year-wise Offenses by Offense Type and by Bias...
dataful.in
Updated May 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). Hate Crimes in USA: Year-wise Offenses by Offense Type and by Bias Motivation [Dataset]. https://dataful.in/datasets/19756
Explore at:
application/x-parquet, csv, xlsxAvailable download formats
Dataset updated
May 27, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
United States
Variables measured
Count
Description
This dataset contains the yearly statistics on the number of offenses by offense types and by bias motivation. Major categories of offense types include crimes against persons, crimes against property and crimes against society. Each offense type is further categorized by type of crime such as murder, rape, trafficking, robbery etc. Major categories of bias motivations include Race/Ethnicity/Ancestry, Religion, Sexual Orientation, Disability, Gender and Gender Identity.
Data from: Survey of Prosecutorial Response to Bias-Motivated Crime in the...
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Survey of Prosecutorial Response to Bias-Motivated Crime in the United States, 1994-1995 [Dataset]. https://catalog.data.gov/dataset/survey-of-prosecutorial-response-to-bias-motivated-crime-in-the-united-states-1994-1995-96eb6
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justicehttp://nij.ojp.gov/
Area covered
United States
Description
This national survey of prosecutors was undertaken to systematically gather information about the handling of bias or hate crime prosecutions in the United States. The goal was to use this information to identify needs and to enhance the ability of prosecutors to respond effectively to hate crimes by promoting effective practices. The survey aimed to address the following research questions: (1) What was the present level of bias crime prosecution in the United States? (2) What training had been provided to prosecutors to assist them in prosecuting hate- and bias-motivated crimes and what additional training would be beneficial? (3) What types of bias offenses were prosecuted in 1994-1995? (4) How were bias crime cases assigned and to what extent were bias crime cases given priority? and (5) What factors or issues inhibited a prosecutor's ability to prosecute bias crimes? In 1995, a national mail survey was sent to a stratified sample of prosecutor offices in three phases to solicit information about prosecutors' experiences with hate crimes. Questions were asked about size of jurisdiction, number of full-time staff, number of prosecutors and investigators assigned to bias crimes, and number of bias cases prosecuted. Additional questions measured training for bias-motivated crimes, such as whether staff received specialized training, whether there existed a written policy on bias crimes, how well prosecutors knew the bias statute, and whether there was a handbook on bias crime. Information elicited on case processing included the frequency with which certain criminal acts were charged and sentenced as bias crimes, the existence of a special bias unit, case tracking systems, preparation of witnesses, jury selection, and case disposition. Other topics specifically covered bias related to racial or ethnic differences, religious differences, sexual orientation, and violence against women.
Data for: Peer Review Under Scrutiny: Systematic Evidence of Bias in...
zenodo.org
bin
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Maciel Pinto; Daniela Maciel Pinto; Adriana Bin; Adriana Bin; Evandro Coggo Cristofoletti; Evandro Coggo Cristofoletti; Ana Carolina Spatti; Ana Carolina Spatti; Larissa Aparecida Prevato Lopes; Larissa Aparecida Prevato Lopes (2025). Data for: Peer Review Under Scrutiny: Systematic Evidence of Bias in Research Funding [Dataset]. http://doi.org/10.5281/zenodo.15536550
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15536550
Dataset updated
May 28, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniela Maciel Pinto; Daniela Maciel Pinto; Adriana Bin; Adriana Bin; Evandro Coggo Cristofoletti; Evandro Coggo Cristofoletti; Ana Carolina Spatti; Ana Carolina Spatti; Larissa Aparecida Prevato Lopes; Larissa Aparecida Prevato Lopes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset supports the systematic literature review conducted in the study "Peer Review Under Scrutiny: Systematic Evidence of Bias in Research Funding". The data comprise a curated collection of empirical studies that investigated the existence of biases in peer review processes within research funding agencies worldwide.

The dataset includes detailed categorizations based on the types of biases investigated, methodologies employed, data sources, and the confirmation status of each bias identified in the selected studies. The file was structured to facilitate further analyses, replications, and methodological reviews in the field of research evaluation and science policy studies.

Data were collected through systematic searches in Scopus and Web of Science databases, followed by rigorous screening and classification procedures. The dataset may be particularly useful for researchers, policymakers, and evaluators interested in improving transparency and equity in research funding mechanisms.
f
Data from "Spatial filtering strategies for mitigating sampling bias in...
figshare.com
txt
Updated Aug 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoan Fourcade; Quentin Lamboley (2023). Data from "Spatial filtering strategies for mitigating sampling bias in species distribution models" [Dataset]. http://doi.org/10.6084/m9.figshare.24032196.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24032196.v1
Dataset updated
Aug 25, 2023
Dataset provided by
figshare
Authors
Yoan Fourcade; Quentin Lamboley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results obtained and analysed in Lamboley & Fourcade, Spatial filtering strategies for mitigating sampling bias in species distribution models. Briefly, we used two virtual species with contrasting levels of specialisation to explore the impact of spatial filtering distances on the performance of ecological niche models. This investigation was conducted across a spectrum of modelling conditions, encompassing diverse types and degrees of bias, as well as varying sample sizes.Results reporting the overlap between modelled and true distributions:Unbiased_distribution.csv: results for the models trained from unbiased, i.e. randomly sampled, datasetsBiased_corrected_distribution.csv: results for the models trained from biased datasets, corrected with various spatial filtering distancesResults reporting the overlap between modelled and true response curves:Unbiased_response_curves.csv: results for the models trained from unbiased, i.e. randomly sampled, datasetsBiased_corrected_response_curves.csv: results for the models trained from biased datasets, corrected with various spatial filtering distances
f
Data from: Bivariate Analysis of Distribution Functions Under Biased...
tandf.figshare.com
txt
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hsin-wen Chang; Shu-Hsiang Wang (2024). Bivariate Analysis of Distribution Functions Under Biased Sampling [Dataset]. http://doi.org/10.6084/m9.figshare.23998414.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23998414.v1
Dataset updated
Apr 17, 2024
Dataset provided by
Taylor & Francis
Authors
Hsin-wen Chang; Shu-Hsiang Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This article compares distribution functions among pairs of locations in their domains, in contrast to the typical approach of univariate comparison across individual locations. This bivariate approach is studied in the presence of sampling bias, which has been gaining attention in COVID-19 studies that over-represent more symptomatic people. In cases with either known or unknown sampling bias, we introduce Anderson–Darling-type tests based on both the univariate and bivariate formulation. A simulation study shows the superior performance of the bivariate approach over the univariate one. We illustrate the proposed methods using real data on the distribution of the number of symptoms suggestive of COVID-19.
f
A Comparison of Four Methods for the Analysis of N-of-1 Trials
figshare.com
doc
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xinlin Chen; Pingyan Chen (2023). A Comparison of Four Methods for the Analysis of N-of-1 Trials [Dataset]. http://doi.org/10.1371/journal.pone.0087752
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0087752
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Xinlin Chen; Pingyan Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveTo provide a practical guidance for the analysis of N-of-1 trials by comparing four commonly used models.MethodsThe four models, paired t-test, mixed effects model of difference, mixed effects model and meta-analysis of summary data were compared using a simulation study. The assumed 3-cycles and 4-cycles N-of-1 trials were set with sample sizes of 1, 3, 5, 10, 20 and 30 respectively under normally distributed assumption. The data were generated based on variance-covariance matrix under the assumption of (i) compound symmetry structure or first-order autoregressive structure, and (ii) no carryover effect or 20% carryover effect. Type I error, power, bias (mean error), and mean square error (MSE) of effect differences between two groups were used to evaluate the performance of the four models.ResultsThe results from the 3-cycles and 4-cycles N-of-1 trials were comparable with respect to type I error, power, bias and MSE. Paired t-test yielded type I error near to the nominal level, higher power, comparable bias and small MSE, whether there was carryover effect or not. Compared with paired t-test, mixed effects model produced similar size of type I error, smaller bias, but lower power and bigger MSE. Mixed effects model of difference and meta-analysis of summary data yielded type I error far from the nominal level, low power, and large bias and MSE irrespective of the presence or absence of carryover effect.ConclusionWe recommended paired t-test to be used for normally distributed data of N-of-1 trials because of its optimal statistical performance. In the presence of carryover effects, mixed effects model could be used as an alternative.

Hate Crimes

kaggle.com

Updated Jul 7, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Melissa Monfared (2024). Hate Crimes [Dataset]. https://www.kaggle.com/datasets/melissamonfared/hate-crimes

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 7, 2024

Dataset provided by

Kaggle

Authors

Melissa Monfared

Description

Overview:

This dataset contains detailed information on cases where a hate or bias crime has been reported to the Bloomington Police Department. Hate crimes are criminal offenses motivated by bias against race, religion, ethnicity, sexual orientation, gender identity, or other protected characteristics. This dataset provides insights into the nature and demographics of hate crimes in Bloomington, aiding in understanding and addressing these incidents.

Dataset Details:

The dataset includes the following columns:

Column Name	Description	API Field Name	Data Type
case_number	Case Number	case_number	Text
date	Date	date	Floating Timestamp
weekday	Day of Week	day_of_week	Text
victims	Total Number of Victims	victims	Number
victim_race	Victim Race	victim_race	Text
victim_gender	Victim Gender	victim_gender	Text
victim_type	Victim Type	victim_type	Text
offenders	Total Number of Offenders	offenders	Number
offender_race	Offender Race	offender_race	Text
offender_gender	Offender Gender	offender_gender	Text
offense	Offense / Crime	offense	Text
location_type	Offense / Crime Location Type	location_type	Text
motivation	Offense/Crime Bias Motivation	motivation	Text

Key Features:

Comprehensive Crime Data: Provides detailed information on hate crimes, including demographics of victims and offenders, types of offenses, and bias motivations.
Temporal Analysis: Includes timestamps for each incident, allowing for analysis of trends over time.
Demographic Insights: Offers data on race and gender of both victims and offenders, helping to identify patterns and target interventions.
Location Information: Contains details about the type of location where the offense occurred, useful for spatial analysis and preventive measures.

Usage:

This dataset can be used for:

Crime Analysis: Analyzing trends and patterns in hate crimes to inform law enforcement strategies and policies.
Community Safety: Identifying high-risk areas and times to improve community policing and preventive measures.
Research and Advocacy: Supporting academic research and advocacy efforts focused on combating hate crimes and promoting social justice.
Policy Development: Assisting policymakers in developing targeted initiatives to reduce hate crimes and support affected communities.

Data Maintenance:

Last Updated: July 7, 2024
Source: Bloomington Police Department Data Portal
Revisions: The dataset is annually updated to ensure the inclusion of the latest incidents and to maintain data accuracy. Historical data is preserved to support long-term analyses.

Additional Notes

Data Accuracy: The Bloomington Police Department strives for accuracy in open data; however, errors may occur due to the nature of data collection from multiple sources.
Data Interpretation: Users should be aware that the dataset may change over time as new information becomes available or corrections are made.
Race and District Codes: The dataset uses specific codes for race and reading districts, which are detailed in the accompanying documentation to ensure proper interpretation.
License: Open Data Commons Public Domain Dedication and License

Data from: Sampling biases shape our view of the natural world
zenodo.org
datadryad.org
csv, txt, xls, zip
Updated Jun 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alice Hughes; Keping Ma; Mark Costello; Mark Costello; John Waller; John Waller; Pieter Provoost; Qinmin Yang; Chaodong Zhu; Huijie Qiao; Michael Orr; Michael Orr; Alice Hughes; Keping Ma; Pieter Provoost; Qinmin Yang; Chaodong Zhu; Huijie Qiao (2022). Sampling biases shape our view of the natural world [Dataset]. http://doi.org/10.5061/dryad.zw3r2287z
Explore at:
zip, xls, txt, csvAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.zw3r2287z
Dataset updated
Jun 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alice Hughes; Keping Ma; Mark Costello; Mark Costello; John Waller; John Waller; Pieter Provoost; Qinmin Yang; Chaodong Zhu; Huijie Qiao; Michael Orr; Michael Orr; Alice Hughes; Keping Ma; Pieter Provoost; Qinmin Yang; Chaodong Zhu; Huijie Qiao
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Spatial patterns of biodiversity are inextricably linked to their collection methods, yet no synthesis of bias patterns or their consequences exists. As such, views of organismal distribution and the ecosystems they make up may be incorrect, undermining countless ecological and evolutionary studies. Using 742 million records of 374,900 species, we explore the global patterns and impacts of biases related to taxonomy, accessibility, ecotype, and data type across terrestrial and marine systems. Pervasive sampling and observation biases exist across animals, with only 6.74% of the globe sampled, and disproportionately poor tropical sampling. High -elevations and deep -seas are particularly unknown. Over 50% of records in most groups account for under 2% of species, and citizen-science only exacerbates biases. Additional data will be needed to overcome many of these biases, but we must increasingly value data publication to bridge this gap and better represent species' distributions from more distant and inaccessible areas, and provide the necessary basis for conservation and management.
f
Systematic Review of the Empirical Evidence of Study Publication Bias and...
figshare.com
plos.figshare.com
doc
Updated Jan 18, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerry Dwan; Carrol Gamble; Paula R. Williamson; Jamie J. Kirkham (2016). Systematic Review of the Empirical Evidence of Study Publication Bias and Outcome Reporting Bias — An Updated Review [Dataset]. http://doi.org/10.1371/journal.pone.0066844
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0066844
Dataset updated
Jan 18, 2016
Dataset provided by
PLOS ONE
Authors
Kerry Dwan; Carrol Gamble; Paula R. Williamson; Jamie J. Kirkham
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe increased use of meta-analysis in systematic reviews of healthcare interventions has highlighted several types of bias that can arise during the completion of a randomised controlled trial. Study publication bias and outcome reporting bias have been recognised as a potential threat to the validity of meta-analysis and can make the readily available evidence unreliable for decision making.Methodology/Principal FindingsIn this update, we review and summarise the evidence from cohort studies that have assessed study publication bias or outcome reporting bias in randomised controlled trials. Twenty studies were eligible of which four were newly identified in this update. Only two followed the cohort all the way through from protocol approval to information regarding publication of outcomes. Fifteen of the studies investigated study publication bias and five investigated outcome reporting bias. Three studies have found that statistically significant outcomes had a higher odds of being fully reported compared to non-significant outcomes (range of odds ratios: 2.2 to 4.7). In comparing trial publications to protocols, we found that 40–62% of studies had at least one primary outcome that was changed, introduced, or omitted. We decided not to undertake meta-analysis due to the differences between studies.ConclusionsThis update does not change the conclusions of the review in which 16 studies were included. Direct empirical evidence for the existence of study publication bias and outcome reporting bias is shown. There is strong evidence of an association between significant results and publication; studies that report positive or significant results are more likely to be published and outcomes that are statistically significant have higher odds of being fully reported. Publications have been found to be inconsistent with their protocols. Researchers need to be aware of the problems of both types of bias and efforts should be concentrated on improving the reporting of trials.
Data and script for "Detecting synthetic population bias using a...
figshare.com
zip
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani (2024). Data and script for "Detecting synthetic population bias using a spatially-oriented framework and independent validation data" [Dataset]. http://doi.org/10.6084/m9.figshare.24664647.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24664647.v1
Dataset updated
May 15, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This folder contains processed and derived data, and script for the manuscript, 'Detecting synthetic population bias using a spatially-oriented framework and independent validation data'.Abstract: Models of human mobility can be broadly applied to find solutions addressing diverse topics such as public health policy, transportation management, emergency management, and urban development. However, many mobility models require individual-level data that is limited in availability and accessibility. Synthetic populations are commonly used as the foundation for mobility models because they provide detailed individual-level data representing the different types and characteristics of people in a study area. Thorough evaluation of synthetic populations are required to detect data biases before the prejudices are transferred to subsequent applications. Although synthetic populations are commonly used for modeling mobility, they are conventionally validated by their sociodemographic characteristics, rather than mobility attributes. Mobility microdata provides an opportunity to independently/externally validate the mobility attributes of synthetic populations. This study demonstrates a spatially-oriented data validation framework and independent data validation to assess the mobility attributes of two synthetic populations at different spatial granularities. Validation using independent data (SafeGraph) and the validation framework replicated the spatial distribution of errors detected using source data (LODES) and total absolute error. Spatial clusters of error exposed the locations of underrepresented and overrepresented communities. This information can guide bias mitigation efforts to generate a more representative synthetic population.

Facebook

Twitter

Click to copy link

Link copied

Cite

Samantha L. Peel; Nicole A. Hill; Scott D. Foster; Simon J. Wotherspoon; Claudio Ghiglione; Stefano Schiaparelli (2019). Reliable species distributions are obtainable with sparse, patchy and biased data by leveraging over species and data types [Dataset]. http://doi.org/10.5061/dryad.2226v8m

Data from: Reliable species distributions are obtainable with sparse, patchy and biased data by leveraging over species and data types

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.2226v8m

Dataset updated

May 31, 2019

Dataset provided by

University of Tasmania
Commonwealth Scientific and Industrial Research Organisation
Italian National Antarctic Museum (MNA, Section of Genoa) Genoa Italy

Authors

Samantha L. Peel; Nicole A. Hill; Scott D. Foster; Simon J. Wotherspoon; Claudio Ghiglione; Stefano Schiaparelli

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

New methods for species distribution models (SDMs) utilise presence‐absence (PA) data to correct the sampling bias of presence‐only (PO) data in a spatial point process setting. These have been shown to improve species estimates when both data sets are large and dense. However, is a PA data set that is smaller and patchier than hitherto examined able to do the same? Furthermore, when both data sets are relatively small, is there enough information contained within them to produce a useful estimate of species’ distributions? These attributes are common in many applications.
A stochastic simulation was conducted to assess the ability of a pooled data SDM to estimate the distribution of species from increasingly sparser and patchier data sets. The simulated data sets were varied by changing the number of presence‐absence sample locations, the degree of patchiness of these locations, the number of PO observations, and the level of sampling bias within the PO observations. The performance of the pooled data SDM was compared to a PA SDM and a PO SDM to assess the strengths and limitations of each SDM.
The pooled data SDM successfully removed the sampling bias from the PO observations even when the presence‐absence data was sparse and patchy, and the PO observations formed the majority of the data. The pooled data SDM was, in general, more accurate and more precise than either the PA SDM or the PO SDM. All SDMs were more precise for the species responses than they were for the covariate coefficients.
The emerging SDM methodology that pools PO and PA data will facilitate more certainty around species’ distribution estimates, which in turn will allow more relevant and concise management and policy decisions to be enacted. This work shows that it is possible to achieve this result even in relatively data‐poor regions.

Clear search

Close search

Google apps

Main menu

Data from: Reliable species distributions are obtainable with sparse, patchy...

Hate Crimes by County and Bias Type: Beginning 2010

Data from: Wide range screening of algorithmic bias in word embedding models...

Number of new AI farness and bias metrics worldwide 2016-2022, by type

Replication Data for: Publication Biases in Replication Studies

Data from: Sampling methodology influences habitat suitability modeling for...

Reference Information

Provenance for this README

Dataset Version and Release History

Dataset Attribution and Usage

Hate Crimes in USA: Year-wise Victim Type by Bias Motivation

Different bias types studied in recent research.

Data from: Confirmation Bias in Web-Based Search: A Randomized Online Study...

Data from: Integrated species distribution models to account for sampling...

Hate Crimes in USA: Year-wise Offenses by Offense Type and by Bias...

Data from: Survey of Prosecutorial Response to Bias-Motivated Crime in the...

Data for: Peer Review Under Scrutiny: Systematic Evidence of Bias in...

Data from "Spatial filtering strategies for mitigating sampling bias in...

Data from: Bivariate Analysis of Distribution Functions Under Biased...

A Comparison of Four Methods for the Analysis of N-of-1 Trials