Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data collection for reviews 1-350 included.For reviews with an unclear risk of bias, separate tables included where more than one author was involved to determine overall consensus on risk of bias (low, high, unclear). For reviews that required emailing Cochrane authors, email responses are also included and assigned a risk of bias. For email responses that had common themes, a table of a bias assessment for common themes was created. Finally, a summarized results table is included.
Political science researchers have flexibility in how to analyze data, how to report data, and whether to report on data. Review of examples of reporting flexibility from the race and sex discrimination literature illustrates how research design choices can influence estimates and inferences. This reporting flexibility—coupled with the political imbalance among political scientists—creates the potential for political bias in reported political science estimates, but this potential for political bias can be reduced or eliminated through preregistration and preacceptance, in which researchers commit to a research design before completing data collection. Removing the potential for reporting flexibility can raise the credibility of political science research.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
One of the strongest findings across the sciences is that publication bias occurs. Of particular note is a “file drawer bias” where statistically significant results are privileged over non-significant results. Recognition of this bias, along with increased calls for “open science,” has led to an emphasis on replication studies. Yet, few have explored publication bias and its consequences in replication studies. We offer a model of the publication process involving an initial study and a replication. We use the model to describe three types of publication biases: 1) file drawer bias, 2) a “repeat study” bias against the publication of replication studies, and 3) a “gotcha bias” where replication results that run contrary to a prior study are more likely to be published. We estimate the model’s parameters with a vignette experiment conducted with political science professors teaching at Ph.D.-granting institutions in the United States. We find evidence of all three types of bias, although those explicitly involving replication studies are notably smaller. This bodes well for the replication movement. That said, the aggregation of all of the biases increases the number of false positives in a literature. We conclude by discussing a path for future work on publication biases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The EWEMBI dataset was compiled to support the bias correction of climate input data for the impact assessments carried out in phase 2b of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP2b; Frieler et al., 2017), which will contribute to the 2018 IPCC special report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways. The EWEMBI data cover the entire globe at 0.5° horizontal and daily temporal resolution from 1979 to 2013. Data sources of EWEMBI are ERA-Interim reanalysis data (ERAI; Dee et al., 2011), WATCH forcing data methodology applied to ERA-Interim reanalysis data (WFDEI; Weedon et al., 2014), eartH2Observe forcing data (E2OBS; Calton et al., 2016) and NASA/GEWEX Surface Radiation Budget data (SRB; Stackhouse Jr. et al., 2011). The SRB data were used to bias-correct E2OBS shortwave and longwave radiation (Lange, 2018). Variables included in the EWEMBI dataset are Near Surface Relative Humidity, Near Surface Specific Humidity, Precipitation, Snowfall Flux, Surface Air Pressure, Surface Downwelling Longwave Radiation, Surface Downwelling Shortwave Radiation, Near Surface Wind Speed, Near-Surface Air Temperature, Daily Maximum Near Surface Air Temperature, Daily Minimum Near Surface Air Temperature, Eastward Near-Surface Wind and Northward Near-Surface Wind. For data sources, units and short names of all variables see Frieler et al. (2017, Table 1).
If the publication decisions of journals are a function of the statistical significance of research findings, the published literature may suffer from “publication bias.” This paper describes a method for detecting publication bias. We point out that to achieve statistical significance, the effect size must be larger in small samples. If publications tend to be biased against statistically insignificant results, we should observe that the effect size diminishes as sample sizes increase. This proposition is tested and confirmed using the experimental literature on voter mobilization.
Supporting code for Cenci, S. Overlooked biases from misidentifications of causal structures The Journal of Finance and Data Science (2024)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This contains model code and data from the paper titled
"Perceived and observed biases within scientific communities: a case study in movement ecology"
By: Shaw AK, Fouda L, Mezzini S, Kim D, Chatterjee N, Wolfson D, Abrahms B, Attias N, Beardsworth CE, Beltran R, Binning SA, Blincow KM, Chan Y-C, Fronhofer EA, Hegemann A, Hurme ER, Iannarilli F, Kellner JB, McCoy KD, Rafiq K, Saastamoinen M, Sequeira AMM, Serota MW, Sumasgutner P, Tao Y, Torstenson M, Yanco SW, Beck KB, Bertram MG, Beumer LT, Bradarić M, Clermont J, Ellis-Soto D, Faltusová M, Fieberg J, Hall RJ, Kölzsch A, Lai S, Lee-Cruz L, Loretto M-C, Loveridge A, Michelangeli M, Mueller T, Riotte-Lambert L, Sapir N, Scacco M, Teitelbaum CS, Cagnacci F
Published in: Proceedings of the Royal Society B
Abstract:
Who conducts biological research, where, and how results are disseminated varies among geographies and identities. Identifying and documenting these forms of bias by research communities is a critical step towards addressing them. We documented perceived and observed biases in movement ecology, a rapidly expanding sub-discipline of biology, which is strongly underpinned by fieldwork and technology use. We surveyed attendees before an international conference to assess a baseline within-discipline perceived bias (uninformed perceived bias). We analysed geographic patterns in Movement Ecology articles, finding discrepancies between the country of the authors’ affiliation and study site location, related to national economics. We analysed race-gender identities of USA biology researchers (the closest-to-our-sub-discipline with data available), finding that they differed from national demographics. Finally, we discussed the quantitatively-observed bias at the conference, to assess within-discipline perceived bias informed with observational data (informed perceived bias). Although the survey indicated most conference participants as bias-aware, conversations only covered a subset of biases. We discuss potential causes of bias (parachute-science, fieldwork accessibility), solutions, and the need to evaluate mitigatory action effectiveness. Undertaking data-driven analysis of bias within sub-disciplines can help identify specific barriers and move towards the inclusion of a greater diversity of participants in the scientific process.
STEM students are often unable to recognize cognitive bias in their own disciplines, and simply describing cognitive bias to students has shown to be insufficient to improve critical thinking. However, habitual metacognitive techniques show promise for correcting cognitive biases, such as confirmation bias, a maladaptive cognitive strategy that specifically threatens the objectivity of scientists. As part of a course on metacognition in science, first-year STEM students were asked to give an oral presentation about a controversial socioscientific topic (e.g., GMO crops, de-extinction, or hydrofracking). The first year the course was offered, presentations exhibited confirmation bias at a high rate, despite instructions to examine multiple viewpoints about the scientific issue. In subsequent years, an intervention in the form of an interactive lecture/discussion/activity about confirmation bias and two specifically-designed homework assignments asked the students to reflect on evidence, search processes and potential biases. This intervention was jointly developed by faculty members in biology and philosophy to focus on habitual metacognitive techniques. Compared to no intervention, the resultant presentations had a higher percentage of reliable sources and a lower percentage of citations that only supported their conclusion. These results indicate that after the intervention exercise, students were discriminating among sources more carefully (Mann-Whitney p=0.009) and were using more sources from the other side of the argument, including presenting more reasons that refute their own ideas (Mann-Whitney p=0.003). We find that providing classroom instruction supported by deliberate practice to counteract confirmation bias improves students’ evaluation of scientific evidence.
Primary image: Venn diagram that illustrates the idea of confirmation bias.
Observer bias and other “experimenter effects†occur when researchers’ expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work “blind,†meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.
In the analysis of causal effects in non-experimental studies, conditioning on observable covariates is one way to try to reduce unobserved confounder bias. However, a developing literature has shown that conditioning on certain covariates may increase bias, and the mechanisms underlying this phenomenon have not been fully explored. We add to the literature on bias-increasing covariates by first introducing a way to decompose omitted variable bias into three constituent parts: bias due to an unobserved confounder, bias due to excluding observed covariates, and bias due to amplification. This leads to two important findings. While instruments have been the primary focus of the bias amplification literature to date, we identify the fact that the popular approach of adding group fixed-effects can lead to bias amplification as well. This is an important finding because many practitioners think that fixed effects are convenient way to account for any and all group-level confounding and are at worst harmless. The second finding introduces the concept of bias unmasking and shows how it can be even more insidious than bias amplification in some cases. After introducing these new results analytically, we use constructed observational placebo studies to illustrate bias amplification and bias unmasking with real data. Finally, we propose a way to add bias decomposition information to graphical displays for sensitivity analysis to help practitioners think through the potential for bias amplification and bias unmasking in actual applications.
This file contains all materials needed to replicate our analysis from the paper. We've also included all the raw coding data sheets for the articles analyzed in our paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets from the research project: The Impact of Geographical Bias when Judging Scientific Studies
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The interplay between an academic's gender and their scholarly output is a riveting topic at the intersection of scientometrics, data science, gender studies, and sociology. Its effects can be studied to analyze the role of gender in research productivity, tenure and promotion standards, collaboration and networks, or scientific impact, among others. The typical methodology in this field of research is based on a number of assumptions that are customarily not discussed in detail in the relevant literature, but undoubtedly merit a critical examination. Presumably the most confronting aspect is the categorization of gender. An author's gender is typically inferred from their name, further reduced to a binary feature by an algorithmic procedure. This and subsequent data processing steps introduce biases whose effects are hard to estimate. In this report we describe said problems and discuss the reception and interplay of this line of research within the field. We also outline the effect of obstacles, such as non-availability of data and code for transparent communication. Building on our research on gender effects on scientific publications, we challenge the prevailing methodology in the field and offer a critical reflection on some of its flaws and pitfalls. Our observations are meant to open up the discussion around the need and feasibility of more elaborated approaches to tackle gender in conjunction with analyses of bibliographic sources.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
VERSION HISTORY:- On June 26, 2018 all files were republished due to the incorporation of additional observational data covering years 2014 to 2016. Prior to that date, the dataset only covered years 1979 to 2013. Data for all years prior to 2014 are identical in this and the original version of the dataset. DATA DESCRIPTION:The EWEMBI dataset was compiled to support the bias correction of climate input data for the impact assessments carried out in phase 2b of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP2b; Frieler et al., 2017), which will contribute to the 2018 IPCC special report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways. The EWEMBI data cover the entire globe at 0.5° horizontal and daily temporal resolution from 1979 to 2013. Data sources of EWEMBI are ERA-Interim reanalysis data (ERAI; Dee et al., 2011), WATCH forcing data methodology applied to ERA-Interim reanalysis data (WFDEI; Weedon et al., 2014), eartH2Observe forcing data (E2OBS; Calton et al., 2016) and NASA/GEWEX Surface Radiation Budget data (SRB; Stackhouse Jr. et al., 2011). The SRB data were used to bias-correct E2OBS shortwave and longwave radiation (Lange, 2018). Variables included in the EWEMBI dataset are Near Surface Relative Humidity, Near Surface Specific Humidity, Precipitation, Snowfall Flux, Surface Air Pressure, Surface Downwelling Longwave Radiation, Surface Downwelling Shortwave Radiation, Near Surface Wind Speed, Near-Surface Air Temperature, Daily Maximum Near Surface Air Temperature, Daily Minimum Near Surface Air Temperature, Eastward Near-Surface Wind and Northward Near-Surface Wind. For data sources, units and short names of all variables see Frieler et al. (2017, Table 1).
Diversity and Distributions, 00, 1–16. https://doi.org/10.1111/ddi.13749
Access this dataset on Dryad: https://doi.org/10.5068/D1769Q
Description: R script for calculating bias in: 1) all iNaturalist plant observations and 2) iNaturalist and professional observations of the 4 study species, Hedychium gardnerianum, Lantana camara, Leucaena leucocephala, and Psidium cattleianum.
Description: R script for: 1) producing Hedychium gardnerianum, Lantana camara, Leucaena leucocephala, and Psidium cattleianum habitat suitability models and 2) calculating overlap among model series with Schoener's D.
Description: Comma-delimited file containing the ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article examines the opportunities and challenges presented by artificial intelligence in the scientific evaluation of the social sciences, a field that faces difficulties in quantifying the impact of its output due to the complexity and qualitative nature of its subjects. Unlike the natural sciences, social sciences do not always align well with traditional metrics, such as citations or impact indices. Artificial intelligence, through advanced tools like natural language processing and machine learning, offers alternatives to enhance these evaluation processes. This study follows an exploratory methodology, grounded in a critical literature review and content analysis, aiming to identify the potential of artificial intelligence for measuring academic and social impact within the social sciences. The literature review includes analyses of academic sources and policy documents and is structured around three key areas: improvements in evaluation metrics, innovations in social impact analysis, and proposals for implementation in social sciences. The article concludes that, although artificial intelligence enables more comprehensive evaluations, its application presents ethical challenges, especially regarding algorithmic biases and system transparency. As an original contribution, the article proposes a theoretical model to integrate qualitative and quantitative methods into a more equitable and thorough evaluation adapted to the unique nature of the social sciences. It emphasizes the importance of developing AI tools designed ethically and collaboratively.
This repository contains data and replication code for the article "Diagnosing Gender Bias in Image Recognition Systems", published in the journal Socius.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset covers additional CMIP6-based and bias-adjusted atmospheric climate input data published as secondary input data for ISIMIP3b. Included are datasets from the 5 CMIP6 global climate models that are included in the ISIMIP3b protocol (GFDL-ESM4, IPSL-CM6A-LR, MPI-ESM1-2-HR, MRI-ESM2-0, UKESM1-0-LL) for experiments which are not part of the ISIMIP3b protocol (hist-nat, ssp119, ssp245, ssp460, ssp534-over). Also included are datasets from 5 additional CMIP6 global climate models (CNRM-CM6-1, CNRM-ESM2-1, CanESM5, EC-Earth3, MIROC6) for the experiments of the ISIMIP3b protocol (picontrol, historical, ssp126, ssp370, ssp585) and hist-nat in some cases. For 4 models (CESM2-WACCM, IITM-ESM, KACE-1-0-G, TaiESM1) we provide this data for a subset of variables.
Version 1.1 of this dataset adds files for the ssp534-over scenario.
Version 1.2 of this dataset adds files for huss, hurs (GFDL-ESM4, IPSL-CM6A-LR, MPI-ESM1-2-HR, MRI-ESM2-0, UKESM1-0-LL) and prsn (IPSL-CM6A-LR, MPI-ESM1-2-HR, MRI-ESM2-0, UKESM1-0-LL) for the ssp245 scenario.
Version 1.3 of this dataset adds files for the ssp119 scenario (GFDL-ESM4: all variables; IPSL-CM6A-LR and MRI-ESM2-0: hurs, huss, prsn, ps) and the ssp460 scenario (IPSL-CM6A-LR and MRI-ESM2-0: hurs, huss, prsn, ps).
Version 1.4 of this dataset adds ps files for the ssp245 scenario.
Version 1.5 of this dataset adds hurs, pr, sfcwind, tas from 4 additional CMIP6 global climate models (CESM2-WACCM, IITM-ESM, KACE-1-0-G, TaiESM1) for 6 experiments (picontrol, historical, ssp126, ssp245, ssp370, ssp585).
In the lab, young infants prefer to look at simple, high visibility patterns - those with fewer, thicker, high contrast edges. It is unknown whether these preferences could or do manifest on the vast scale of everyday. This dataset contains images that were captured by fitting 1-3 month-old infants with lightweight head cameras, and adults for comparison. The scripts show how to extract variables related to simplicity (i.e., edge sparsity & orientation consistency) and visibility (root mean square contrast & amplitude by spatial frequency), as well as how those variables were used in analysis.
The data repository includes raw results for the paper and supporting information as well as post processing scripts to generate figures in the paper and supporting information. The repository also includes codes for generating the raw results. The data for this paper is stored on servers at Statistics Denmark. Due to security and privacy reasons, the data cannot be made publicly available and we as researcher do not have permission to extract or share micro data. Researchers who want to gain access to micro data can only be granted so by Statistics Denmark. Statistics Denmark has created detailed step-by-step descriptions of how such access is granted and a more general description of the Danish system for access to micro data. These descriptions, as of December 2018, are also uploaded to the Dataverse and available with Statistics Denmark: https://www.dst.dk/-/media/Kontorer/13-Forskning-og-Metode/Step-by-step-procedures-for-researchers-access-to-Microdata_082018.pdf?la=en https://www.dst.dk/ext/645846915/0/forskning/Access-to-micro-data-at-Statistics-Denmark_2014--pdf We welcome any inquiries with respect to data access.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data collection for reviews 1-350 included.For reviews with an unclear risk of bias, separate tables included where more than one author was involved to determine overall consensus on risk of bias (low, high, unclear). For reviews that required emailing Cochrane authors, email responses are also included and assigned a risk of bias. For email responses that had common themes, a table of a bias assessment for common themes was created. Finally, a summarized results table is included.