https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
New methods for species distribution models (SDMs) utilise presence‐absence (PA) data to correct the sampling bias of presence‐only (PO) data in a spatial point process setting. These have been shown to improve species estimates when both data sets are large and dense. However, is a PA data set that is smaller and patchier than hitherto examined able to do the same? Furthermore, when both data sets are relatively small, is there enough information contained within them to produce a useful estimate of species’ distributions? These attributes are common in many applications.
A stochastic simulation was conducted to assess the ability of a pooled data SDM to estimate the distribution of species from increasingly sparser and patchier data sets. The simulated data sets were varied by changing the number of presence‐absence sample locations, the degree of patchiness of these locations, the number of PO observations, and the level of sampling bias within the PO observations. The performance of the pooled data SDM was compared to a PA SDM and a PO SDM to assess the strengths and limitations of each SDM.
The pooled data SDM successfully removed the sampling bias from the PO observations even when the presence‐absence data was sparse and patchy, and the PO observations formed the majority of the data. The pooled data SDM was, in general, more accurate and more precise than either the PA SDM or the PO SDM. All SDMs were more precise for the species responses than they were for the covariate coefficients.
The emerging SDM methodology that pools PO and PA data will facilitate more certainty around species’ distribution estimates, which in turn will allow more relevant and concise management and policy decisions to be enacted. This work shows that it is possible to achieve this result even in relatively data‐poor regions.
Under New York State’s Hate Crime Law (Penal Law Article 485), a person commits a hate crime when one of a specified set of offenses is committed targeting a victim because of a perception or belief about their race, color, national origin, ancestry, gender, religion, religious practice, age, disability, or sexual orientation, or when such an act is committed as a result of that type of perception or belief. These types of crimes can target an individual, a group of individuals, or public or private property. DCJS submits hate crime incident data to the FBI’s Uniform Crime Reporting (UCR) Program. Information collected includes number of victims, number of offenders, type of bias motivation, and type of victim.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Concerns about gender bias in word embedding models have captured substantial attention in the algorithmic bias research literature. Other bias types however have received lesser amounts of scrutiny. This work describes a large-scale analysis of sentiment associations in popular word embedding models along the lines of gender and ethnicity but also along the less frequently studied dimensions of socioeconomic status, age, physical appearance, sexual orientation, religious sentiment and political leanings. Consistent with previous scholarly literature, this work has found systemic bias against given names popular among African-Americans in most embedding models examined. Gender bias in embedding models however appears to be multifaceted and often reversed in polarity to what has been regularly reported. Interestingly, using the common operationalization of the term bias in the fairness literature, novel types of so far unreported bias types in word embedding models have also been identified. Specifically, the popular embedding models analyzed here display negative biases against middle and working-class socioeconomic status, male children, senior citizens, plain physical appearance and intellectual phenomena such as Islamic religious faith, non-religiosity and conservative political orientation. Reasons for the paradoxical underreporting of these bias types in the relevant literature are probably manifold but widely held blind spots when searching for algorithmic bias and a lack of widespread technical jargon to unambiguously describe a variety of algorithmic associations could conceivably be playing a role. The causal origins for the multiplicity of loaded associations attached to distinct demographic groups within embedding models are often unclear but the heterogeneity of said associations and their potential multifactorial roots raises doubts about the validity of grouping them all under the umbrella term bias. Richer and more fine-grained terminology as well as a more comprehensive exploration of the bias landscape could help the fairness epistemic community to characterize and neutralize algorithmic discrimination more efficiently.
There has been a continuous growth in the number of metrics used to analyze fairness and biases in artificial intelligence (AI) platforms since 2016. Diagnostic metrics have consistently been adapted more than benchmarks, with a peak of ** in 2019. It is quite likely that this is simply because more diagnostics need to be run to analyze data to create more accurate benchmarks, i.e. the diagnostics lead to benchmarks.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
One of the strongest findings across the sciences is that publication bias occurs. Of particular note is a “file drawer bias” where statistically significant results are privileged over non-significant results. Recognition of this bias, along with increased calls for “open science,” has led to an emphasis on replication studies. Yet, few have explored publication bias and its consequences in replication studies. We offer a model of the publication process involving an initial study and a replication. We use the model to describe three types of publication biases: 1) file drawer bias, 2) a “repeat study” bias against the publication of replication studies, and 3) a “gotcha bias” where replication results that run contrary to a prior study are more likely to be published. We estimate the model’s parameters with a vignette experiment conducted with political science professors teaching at Ph.D.-granting institutions in the United States. We find evidence of all three types of bias, although those explicitly involving replication studies are notably smaller. This bodes well for the replication movement. That said, the aggregation of all of the biases increases the number of false positives in a literature. We conclude by discussing a path for future work on publication biases.
Current Version:
Embargo Provenance: n/a
Dataset Title: Data for the article "Sampling Methodology Influences Habitat Suitability Modeling for Chiropteran Species"
Dataset Contributors:
Date of Issue: 2023-01-16
Publisher: Ecology and Evolution
License: Use of these data is covered by the following license: ...
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This dataset contains the yearly statistics on the victim types by bias motivation. Major categories of victim types include individuals, government, business/financial institution, religious organization, society/public and other or multiple victims. Major categories of bias motivations include Race/Ethnicity/Ancestry, Religion, Sexual Orientation, Disability, Gender and Gender Identity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Iterated algorithmic bias happens when an algorithm interacts with human response continuously, and updates its model after receiving feedback from the human. Meanwhile, the algorithm interacts with the human by showing only selected items or options. Other types of bias are static, which means they have a one-time influence on an algorithm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT
Background: The public typically believes psychotherapy to be more effective than pharmacotherapy for depression treatments. This is not consistent with current scientific evidence, which shows that both types of treatment are about equally effective.
Objective: The study investigates whether this bias towards psychotherapy guides online information search and whether the bias can be reduced by explicitly providing expert information (in a blog entry) and by providing tag clouds that implicitly reveal experts’ evaluations.
Methods: A total of 174 participants completed a fully automated Web-based study after we invited them via mailing lists. First, participants read two blog posts by experts that either challenged or supported the bias towards psychotherapy. Subsequently, participants searched for information about depression treatment in an online environment that provided more experts’ blog posts about the effectiveness of treatments based on alleged research findings. These blogs were organized in a tag cloud; both psychotherapy tags and pharmacotherapy tags were popular. We measured tag and blog post selection, efficacy ratings of the presented treatments, and participants’ treatment recommendation after information search.
Results: Participants demonstrated a clear bias towards psychotherapy (mean 4.53, SD 1.99) compared to pharmacotherapy (mean 2.73, SD 2.41; t173=7.67, P<.001, d=0.81) when rating treatment efficacy prior to the experiment. Accordingly, participants exhibited biased information search and evaluation. This bias was significantly reduced, however, when participants were exposed to tag clouds with challenging popular tags. Participants facing popular tags challenging their bias (n=61) showed significantly less biased tag selection (F2,168=10.61, P<.001, partial eta squared=0.112), blog post selection (F2,168=6.55, P=.002, partial eta squared=0.072), and treatment efficacy ratings (F2,168=8.48, P<.001, partial eta squared=0.092), compared to bias-supporting tag clouds (n=56) and balanced tag clouds (n=57). Challenging (n=93) explicit expert information as presented in blog posts, compared to supporting expert information (n=81), decreased the bias in information search with regard to blog post selection (F1,168=4.32, P=.04, partial eta squared=0.025). No significant effects were found for treatment recommendation (Ps>.33).
Conclusions: We conclude that the psychotherapy bias is most effectively attenuated—and even eliminated—when popular tags implicitly point to blog posts that challenge the widespread view. Explicit expert information (in a blog entry) was less successful in reducing biased information search and evaluation. Since tag clouds have the potential to counter biased information processing, we recommend their insertion.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Aim
Species distribution models (SDMs) that integrate presence-only and presence-absence data offer a promising avenue to improve information on species' geographic distributions. The use of such 'integrated SDMs' on a species range-wide extent has been constrained by the often-limited presence-absence data and by the heterogeneous sampling of the presence-only data. Here, we evaluate integrated SDMs for studying species ranges with a novel expert range map-based evaluation. We build a new understanding about how integrated SDMs address issues of estimation accuracy and data deficiency and thereby offer advantages over traditional SDMs.
Location
South and Central America.
Time period
1979-2017.
Major taxa studied
Hummingbirds.
Methods
We build integrated SDMs by linking two observation models – one for each data type – to the same underlying spatial process. We validate SDMs with two schemes: i) cross-validation with presence-absence data and ii) comparison with respect to the species' whole range as defined with IUCN range maps. We also compare models relative to the estimated response curves and compute the association between the benefit of the data integration and the number of presence records in each data set.
Results
The integrated SDM accounting for the spatially varying sampling intensity of the presence-only data was one of the top-performing models in both model validation schemes. Presence-only data alleviated overly large niche estimates, and data integration was beneficial compared to modelling solely presence-only data for species that had few presence points when predicting the species' whole range. On the community level, integrated models improved the species richness prediction.
Main conclusions
Integrated SDMs combining presence-only and presence-absence data are successfully able to borrow strengths from both data types and offer improved predictions of species' ranges. Integrated SDMs can potentially alleviate the impacts of taxonomically and geographically uneven sampling and to leverage the detailed sampling information in presence-absence data.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This dataset contains the yearly statistics on the number of offenses by offense types and by bias motivation. Major categories of offense types include crimes against persons, crimes against property and crimes against society. Each offense type is further categorized by type of crime such as murder, rape, trafficking, robbery etc. Major categories of bias motivations include Race/Ethnicity/Ancestry, Religion, Sexual Orientation, Disability, Gender and Gender Identity.
This national survey of prosecutors was undertaken to systematically gather information about the handling of bias or hate crime prosecutions in the United States. The goal was to use this information to identify needs and to enhance the ability of prosecutors to respond effectively to hate crimes by promoting effective practices. The survey aimed to address the following research questions: (1) What was the present level of bias crime prosecution in the United States? (2) What training had been provided to prosecutors to assist them in prosecuting hate- and bias-motivated crimes and what additional training would be beneficial? (3) What types of bias offenses were prosecuted in 1994-1995? (4) How were bias crime cases assigned and to what extent were bias crime cases given priority? and (5) What factors or issues inhibited a prosecutor's ability to prosecute bias crimes? In 1995, a national mail survey was sent to a stratified sample of prosecutor offices in three phases to solicit information about prosecutors' experiences with hate crimes. Questions were asked about size of jurisdiction, number of full-time staff, number of prosecutors and investigators assigned to bias crimes, and number of bias cases prosecuted. Additional questions measured training for bias-motivated crimes, such as whether staff received specialized training, whether there existed a written policy on bias crimes, how well prosecutors knew the bias statute, and whether there was a handbook on bias crime. Information elicited on case processing included the frequency with which certain criminal acts were charged and sentenced as bias crimes, the existence of a special bias unit, case tracking systems, preparation of witnesses, jury selection, and case disposition. Other topics specifically covered bias related to racial or ethnic differences, religious differences, sexual orientation, and violence against women.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the systematic literature review conducted in the study "Peer Review Under Scrutiny: Systematic Evidence of Bias in Research Funding". The data comprise a curated collection of empirical studies that investigated the existence of biases in peer review processes within research funding agencies worldwide.
The dataset includes detailed categorizations based on the types of biases investigated, methodologies employed, data sources, and the confirmation status of each bias identified in the selected studies. The file was structured to facilitate further analyses, replications, and methodological reviews in the field of research evaluation and science policy studies.
Data were collected through systematic searches in Scopus and Web of Science databases, followed by rigorous screening and classification procedures. The dataset may be particularly useful for researchers, policymakers, and evaluators interested in improving transparency and equity in research funding mechanisms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results obtained and analysed in Lamboley & Fourcade, Spatial filtering strategies for mitigating sampling bias in species distribution models. Briefly, we used two virtual species with contrasting levels of specialisation to explore the impact of spatial filtering distances on the performance of ecological niche models. This investigation was conducted across a spectrum of modelling conditions, encompassing diverse types and degrees of bias, as well as varying sample sizes.Results reporting the overlap between modelled and true distributions:Unbiased_distribution.csv: results for the models trained from unbiased, i.e. randomly sampled, datasetsBiased_corrected_distribution.csv: results for the models trained from biased datasets, corrected with various spatial filtering distancesResults reporting the overlap between modelled and true response curves:Unbiased_response_curves.csv: results for the models trained from unbiased, i.e. randomly sampled, datasetsBiased_corrected_response_curves.csv: results for the models trained from biased datasets, corrected with various spatial filtering distances
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article compares distribution functions among pairs of locations in their domains, in contrast to the typical approach of univariate comparison across individual locations. This bivariate approach is studied in the presence of sampling bias, which has been gaining attention in COVID-19 studies that over-represent more symptomatic people. In cases with either known or unknown sampling bias, we introduce Anderson–Darling-type tests based on both the univariate and bivariate formulation. A simulation study shows the superior performance of the bivariate approach over the univariate one. We illustrate the proposed methods using real data on the distribution of the number of symptoms suggestive of COVID-19.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveTo provide a practical guidance for the analysis of N-of-1 trials by comparing four commonly used models.MethodsThe four models, paired t-test, mixed effects model of difference, mixed effects model and meta-analysis of summary data were compared using a simulation study. The assumed 3-cycles and 4-cycles N-of-1 trials were set with sample sizes of 1, 3, 5, 10, 20 and 30 respectively under normally distributed assumption. The data were generated based on variance-covariance matrix under the assumption of (i) compound symmetry structure or first-order autoregressive structure, and (ii) no carryover effect or 20% carryover effect. Type I error, power, bias (mean error), and mean square error (MSE) of effect differences between two groups were used to evaluate the performance of the four models.ResultsThe results from the 3-cycles and 4-cycles N-of-1 trials were comparable with respect to type I error, power, bias and MSE. Paired t-test yielded type I error near to the nominal level, higher power, comparable bias and small MSE, whether there was carryover effect or not. Compared with paired t-test, mixed effects model produced similar size of type I error, smaller bias, but lower power and bigger MSE. Mixed effects model of difference and meta-analysis of summary data yielded type I error far from the nominal level, low power, and large bias and MSE irrespective of the presence or absence of carryover effect.ConclusionWe recommended paired t-test to be used for normally distributed data of N-of-1 trials because of its optimal statistical performance. In the presence of carryover effects, mixed effects model could be used as an alternative.
This dataset contains detailed information on cases where a hate or bias crime has been reported to the Bloomington Police Department. Hate crimes are criminal offenses motivated by bias against race, religion, ethnicity, sexual orientation, gender identity, or other protected characteristics. This dataset provides insights into the nature and demographics of hate crimes in Bloomington, aiding in understanding and addressing these incidents.
The dataset includes the following columns:
Column Name | Description | API Field Name | Data Type |
---|---|---|---|
case_number | Case Number | case_number | Text |
date | Date | date | Floating Timestamp |
weekday | Day of Week | day_of_week | Text |
victims | Total Number of Victims | victims | Number |
victim_race | Victim Race | victim_race | Text |
victim_gender | Victim Gender | victim_gender | Text |
victim_type | Victim Type | victim_type | Text |
offenders | Total Number of Offenders | offenders | Number |
offender_race | Offender Race | offender_race | Text |
offender_gender | Offender Gender | offender_gender | Text |
offense | Offense / Crime | offense | Text |
location_type | Offense / Crime Location Type | location_type | Text |
motivation | Offense/Crime Bias Motivation | motivation | Text |
This dataset can be used for:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Spatial patterns of biodiversity are inextricably linked to their collection methods, yet no synthesis of bias patterns or their consequences exists. As such, views of organismal distribution and the ecosystems they make up may be incorrect, undermining countless ecological and evolutionary studies. Using 742 million records of 374,900 species, we explore the global patterns and impacts of biases related to taxonomy, accessibility, ecotype, and data type across terrestrial and marine systems. Pervasive sampling and observation biases exist across animals, with only 6.74% of the globe sampled, and disproportionately poor tropical sampling. High -elevations and deep -seas are particularly unknown. Over 50% of records in most groups account for under 2% of species, and citizen-science only exacerbates biases. Additional data will be needed to overcome many of these biases, but we must increasingly value data publication to bridge this gap and better represent species' distributions from more distant and inaccessible areas, and provide the necessary basis for conservation and management.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe increased use of meta-analysis in systematic reviews of healthcare interventions has highlighted several types of bias that can arise during the completion of a randomised controlled trial. Study publication bias and outcome reporting bias have been recognised as a potential threat to the validity of meta-analysis and can make the readily available evidence unreliable for decision making.Methodology/Principal FindingsIn this update, we review and summarise the evidence from cohort studies that have assessed study publication bias or outcome reporting bias in randomised controlled trials. Twenty studies were eligible of which four were newly identified in this update. Only two followed the cohort all the way through from protocol approval to information regarding publication of outcomes. Fifteen of the studies investigated study publication bias and five investigated outcome reporting bias. Three studies have found that statistically significant outcomes had a higher odds of being fully reported compared to non-significant outcomes (range of odds ratios: 2.2 to 4.7). In comparing trial publications to protocols, we found that 40–62% of studies had at least one primary outcome that was changed, introduced, or omitted. We decided not to undertake meta-analysis due to the differences between studies.ConclusionsThis update does not change the conclusions of the review in which 16 studies were included. Direct empirical evidence for the existence of study publication bias and outcome reporting bias is shown. There is strong evidence of an association between significant results and publication; studies that report positive or significant results are more likely to be published and outcomes that are statistically significant have higher odds of being fully reported. Publications have been found to be inconsistent with their protocols. Researchers need to be aware of the problems of both types of bias and efforts should be concentrated on improving the reporting of trials.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This folder contains processed and derived data, and script for the manuscript, 'Detecting synthetic population bias using a spatially-oriented framework and independent validation data'.Abstract: Models of human mobility can be broadly applied to find solutions addressing diverse topics such as public health policy, transportation management, emergency management, and urban development. However, many mobility models require individual-level data that is limited in availability and accessibility. Synthetic populations are commonly used as the foundation for mobility models because they provide detailed individual-level data representing the different types and characteristics of people in a study area. Thorough evaluation of synthetic populations are required to detect data biases before the prejudices are transferred to subsequent applications. Although synthetic populations are commonly used for modeling mobility, they are conventionally validated by their sociodemographic characteristics, rather than mobility attributes. Mobility microdata provides an opportunity to independently/externally validate the mobility attributes of synthetic populations. This study demonstrates a spatially-oriented data validation framework and independent data validation to assess the mobility attributes of two synthetic populations at different spatial granularities. Validation using independent data (SafeGraph) and the validation framework replicated the spatial distribution of errors detected using source data (LODES) and total absolute error. Spatial clusters of error exposed the locations of underrepresented and overrepresented communities. This information can guide bias mitigation efforts to generate a more representative synthetic population.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
New methods for species distribution models (SDMs) utilise presence‐absence (PA) data to correct the sampling bias of presence‐only (PO) data in a spatial point process setting. These have been shown to improve species estimates when both data sets are large and dense. However, is a PA data set that is smaller and patchier than hitherto examined able to do the same? Furthermore, when both data sets are relatively small, is there enough information contained within them to produce a useful estimate of species’ distributions? These attributes are common in many applications.
A stochastic simulation was conducted to assess the ability of a pooled data SDM to estimate the distribution of species from increasingly sparser and patchier data sets. The simulated data sets were varied by changing the number of presence‐absence sample locations, the degree of patchiness of these locations, the number of PO observations, and the level of sampling bias within the PO observations. The performance of the pooled data SDM was compared to a PA SDM and a PO SDM to assess the strengths and limitations of each SDM.
The pooled data SDM successfully removed the sampling bias from the PO observations even when the presence‐absence data was sparse and patchy, and the PO observations formed the majority of the data. The pooled data SDM was, in general, more accurate and more precise than either the PA SDM or the PO SDM. All SDMs were more precise for the species responses than they were for the covariate coefficients.
The emerging SDM methodology that pools PO and PA data will facilitate more certainty around species’ distribution estimates, which in turn will allow more relevant and concise management and policy decisions to be enacted. This work shows that it is possible to achieve this result even in relatively data‐poor regions.