88 datasets found

Data and script for "Detecting synthetic population bias using a...
figshare.com
zip
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani (2024). Data and script for "Detecting synthetic population bias using a spatially-oriented framework and independent validation data" [Dataset]. http://doi.org/10.6084/m9.figshare.24664647.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24664647.v1
Dataset updated
May 15, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This folder contains processed and derived data, and script for the manuscript, 'Detecting synthetic population bias using a spatially-oriented framework and independent validation data'.Abstract: Models of human mobility can be broadly applied to find solutions addressing diverse topics such as public health policy, transportation management, emergency management, and urban development. However, many mobility models require individual-level data that is limited in availability and accessibility. Synthetic populations are commonly used as the foundation for mobility models because they provide detailed individual-level data representing the different types and characteristics of people in a study area. Thorough evaluation of synthetic populations are required to detect data biases before the prejudices are transferred to subsequent applications. Although synthetic populations are commonly used for modeling mobility, they are conventionally validated by their sociodemographic characteristics, rather than mobility attributes. Mobility microdata provides an opportunity to independently/externally validate the mobility attributes of synthetic populations. This study demonstrates a spatially-oriented data validation framework and independent data validation to assess the mobility attributes of two synthetic populations at different spatial granularities. Validation using independent data (SafeGraph) and the validation framework replicated the spatial distribution of errors detected using source data (LODES) and total absolute error. Spatial clusters of error exposed the locations of underrepresented and overrepresented communities. This information can guide bias mitigation efforts to generate a more representative synthetic population.

Data from: A global high-resolution and bias-corrected dataset of CMIP6...

zenodo.org

bin

Updated Sep 20, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Qinqin Kong; Qinqin Kong; Matthew Huber; Matthew Huber (2024). A global high-resolution and bias-corrected dataset of CMIP6 projected heat stress metrics [Dataset]. http://doi.org/10.5281/zenodo.13799897

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13799897

Dataset updated

Sep 20, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Qinqin Kong; Qinqin Kong; Matthew Huber; Matthew Huber

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Motivation

Increasing heat stress due to climate change poses significant risks to human health and can lead to widespread social and economic consequences. Evaluating these impacts requires reliable datasets of heat stress projections.

Data Record

We present a global dataset projecting future dry-bulb, wet-bulb, and wet-bulb globe temperatures under 1-4°C global warming scenarios (at 0.5°C intervals) relative to the preindustrial era, using outputs from 16 CMIP6 global climate models (GCMs) (Table 1). All variables were retrieved from the historical and SSP585 scenarios which were selected to maximize the warming signal.

The dataset was bias-corrected against ERA5 reanalysis by incorporating the GCM-simulated climate change signal onto the ERA5 baseline (1950-1976) at a 3-hourly frequency. It therefore includes a 27-year sample for each GCM under each warming target.

The data is provided at a fine spatial resolution of 0.25° x 0.25° and a temporal resolution of 3 hours, and is stored in a self-describing NetCDF format. Filenames follow the pattern "VAR_bias_corrected_3hr_GCM_XC_yyyy.nc", where:

"VAR" represents the variable (Ta, Tw, WBGT for dry-bulb, wet-bulb, and wet-bulb globe temperature, respectively),
"GCM" denotes the CMIP6 GCM name,
"X" indicates the warming target compared to the preindustrial period,
"yyyy" represents the year index (0001-0027) of the 27-year sample

Table 1 CMIP6 GCMs used for generating the dataset for Ta, Tw and WBGT.

GCM	Realization	GCM grid spacing	Ta	Tw	WBGT
ACCESS-CM2	r1i1p1f1	1.25ox1.875o	✓	✓	✓
BCC-CSM2-MR	r1i1p1f1	1.1ox1.125o	✓	✓	✓
CanESM5	r1i1p2f1	2.8ox2.8o	✓	✓	✓
CMCC-CM2-SR5	r1i1p1f1	0.94ox1.25o	✓	✓	✓
CMCC-ESM2	r1i1p1f1	0.94ox1.25o	✓	✓	✓
CNRM-CM6-1	r1i1p1f2	1.4ox1.4o	✓	✓
EC-Earth3	r1i1p1f1	0.7ox0.7o	✓	✓	✓
GFDL-ESM4	r1i1p1f1	1.0ox1.25o	✓	✓	✓
HadGEM3-GC31-LL	r1i1p1f3	1.25ox1.875o	✓	✓	✓
HadGEM3-GC31-MM	r1i1p1f3	0.55ox0.83o	✓	✓	✓
KACE-1-0-G	r1i1p1f1	1.25ox1.875o	✓	✓	✓
KIOST-ESM	r1i1p1f1	1.9ox1.9o	✓	✓	✓
MIROC-ES2L	r1i1p1f2	2.8ox2.8o	✓	✓	✓
MIROC6	r1i1p1f1	1.4ox1.4o	✓	✓	✓
MPI-ESM1-2-HR	r1i1p1f1	0.93ox0.93o	✓	✓	✓
MPI-ESM1-2-LR	r1i1p1f1	1.85ox1.875o	✓	✓	✓

Data Access

An inventory of the dataset is available in this repository. The complete dataset, approximately 57 TB in size, is freely accessible via Purdue Fortress' long-term archive through Globus at Globus Link. After clicking the link, users may be prompted to log in with a Purdue institutional Globus account. You can switch to your institutional account, or log in via a personal Globus ID, Gmail, GitHub handle, or ORCID ID. Alternatively, the dataset can be accessed by searching for the universally unique identifier (UUID): "6538f53a-1ea7-4c13-a0cf-10478190b901" in Globus.

Dataset Validation

We validate the bias-correction method and show that it significantly enhances the GCMs' accuracy in reproducing both the annual average and the full range of quantiles for all metrics within an ERA5 reference climate state. This dataset is expected to support future research on projected changes in mean and extreme heat stress and the assessment of related health and socio-economic impacts.

For a detailed introduction to the dataset and its validation, please refer to our data descriptor currently under review at Scientific Data. We will update this information upon publication.

g
EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for...
dataservices.gfz-potsdam.de
explore.openaire.eu
Updated 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Lange (2016). EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for ISIMIP (EWEMBI) [Dataset]. http://doi.org/10.5880/pik.2016.004
Explore at:
Unique identifier
https://doi.org/10.5880/pik.2016.004
Dataset updated
2016
Dataset provided by
GFZ Data Services
datacite
Authors
Stefan Lange
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Description
The EWEMBI dataset was compiled to support the bias correction of climate input data for the impact assessments carried out in phase 2b of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP2b; Frieler et al., 2017), which will contribute to the 2018 IPCC special report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways. The EWEMBI data cover the entire globe at 0.5° horizontal and daily temporal resolution from 1979 to 2013. Data sources of EWEMBI are ERA-Interim reanalysis data (ERAI; Dee et al., 2011), WATCH forcing data methodology applied to ERA-Interim reanalysis data (WFDEI; Weedon et al., 2014), eartH2Observe forcing data (E2OBS; Calton et al., 2016) and NASA/GEWEX Surface Radiation Budget data (SRB; Stackhouse Jr. et al., 2011). The SRB data were used to bias-correct E2OBS shortwave and longwave radiation (Lange, 2018). Variables included in the EWEMBI dataset are Near Surface Relative Humidity, Near Surface Specific Humidity, Precipitation, Snowfall Flux, Surface Air Pressure, Surface Downwelling Longwave Radiation, Surface Downwelling Shortwave Radiation, Near Surface Wind Speed, Near-Surface Air Temperature, Daily Maximum Near Surface Air Temperature, Daily Minimum Near Surface Air Temperature, Eastward Near-Surface Wind and Northward Near-Surface Wind. For data sources, units and short names of all variables see Frieler et al. (2017, Table 1).
Validation and automation of the attention bias test for anxious states in...
researchdata.edu.au
data.csiro.au
datadownload
Updated Aug 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caroline Lee; Sue Belson; Ian Colditz; Jessica Monk; Susan Belson; Jessica Monk; Ian Colditz; Caroline Lee (2021). Validation and automation of the attention bias test for anxious states in sheep (AEC16/19) [Dataset]. http://doi.org/10.25919/AFGX-EP76
Explore at:
datadownloadAvailable download formats
Unique identifier
https://doi.org/10.25919/AFGX-EP76
Dataset updated
Aug 4, 2021
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Caroline Lee; Sue Belson; Ian Colditz; Jessica Monk; Susan Belson; Jessica Monk; Ian Colditz; Caroline Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Aug 22, 2016 - Aug 22, 2017
Area covered

Description
Assessment of emotional states is becoming an increasingly important part of animal welfare research, but emotional state is hard to measure and often requires time consuming or complicated tests. A threat perception test has been developed as a measure of anxiety in sheep and validated using pharmacological models of anxiety. While it appears that the responses we measure in the test are directed towards the threat of a dog, a controlled study had not been conducted to confirm this. The main objective of this study was therefore to further investigate the behavioural responses of sheep in the threat perception test to differentiate between responses to the dog versus responses to the novel testing environment itself. A secondary aim of this study was to automate some of the behavioural measures taken during the test. The collection of key threat perception measures (vigilance and attention to threat) from videos is a long and labour intensive process. Accelerometers or similar devices have been used previously on sheep, attached via halters or collars, to monitor animal movements and feeding behaviours. This study investigated the use such devices to automate the collection of vigilance and attention to threat data, making the test faster, more practical and more accurate. Importantly, this study aimed to determine whether the attachment of data loggers to sheep would alter the behaviour of the animals during testing. Lineage: Details of the methods used to produce this data have been published and can be found at https://doi.org/10.1371/journal.pone.0190404 (see methods for Experiment 2)
Full Simulation Data, Validation Indices, and Frozen Repository for the...
zenodo.org
zip
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arkaprabha Ganguli; Arkaprabha Ganguli; Jeremy Feinstein; Jeremy Feinstein; Ibraheem Raji; Ibraheem Raji; Akintomide Afolayan Akinsanola; Akintomide Afolayan Akinsanola; Connor Aghili; Chunyong Jung; Chunyong Jung; Jordan Branham; Jordan Branham; Thomas Wall; Thomas Wall; Whitney Huang; Whitney Huang; Veerabhadra Rao Kotamarthi; Veerabhadra Rao Kotamarthi; Connor Aghili (2025). Full Simulation Data, Validation Indices, and Frozen Repository for the Empirical Mode Decomposition Based Bias Correction Approach [Dataset]. http://doi.org/10.5281/zenodo.15244202
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15244202
Dataset updated
Apr 25, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Arkaprabha Ganguli; Arkaprabha Ganguli; Jeremy Feinstein; Jeremy Feinstein; Ibraheem Raji; Ibraheem Raji; Akintomide Afolayan Akinsanola; Akintomide Afolayan Akinsanola; Connor Aghili; Chunyong Jung; Chunyong Jung; Jordan Branham; Jordan Branham; Thomas Wall; Thomas Wall; Whitney Huang; Whitney Huang; Veerabhadra Rao Kotamarthi; Veerabhadra Rao Kotamarthi; Connor Aghili
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This archive provides the full dataset and reproducibility materials for the EMDBC (Empirical Mode Decomposition-Based Bias Correction) method. It includes:

dataset.zip: Dataset containing the full spatial temperature time series for WRF simulations and validation indices:

ccsm_1995-2004_daily_t2.nc, ccsm_2045-2054_daily_t2.nc, and ccsm_2085-2094_daily_t2.nc: WRF-simulated near-surface air temperature data for historical and future periods.

validation_area_indices.txt: Spatial grid indices used for validation in the EMDBC evaluation.

repository.zip: A frozen snapshot of the GitHub repository at the time of publication. This includes the EMDBC source code and a minimal working example using sample time series.
Laying hen attention bias test data
researchdata.edu.au
data.csiro.au
datadownload
Updated Apr 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jim Lea; Sue Belson; Caroline Lee; Dana Campbell; Susan Belson; James Lea; Dana L.M. Campbell; Caroline Lee (2019). Laying hen attention bias test data [Dataset]. http://doi.org/10.25919/5CBD1CA76AA19
Explore at:
datadownloadAvailable download formats
Unique identifier
https://doi.org/10.25919/5CBD1CA76AA19
Dataset updated
Apr 22, 2019
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Jim Lea; Sue Belson; Caroline Lee; Dana Campbell; Susan Belson; James Lea; Dana L.M. Campbell; Caroline Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2012 - Jan 1, 2017
Description
Data for application of an attention bias test in free-range laying hens including pharmacological validation of the test using the anxiogenic drug m-CPP. Lineage: All data were obtained by staff and students employed within the Agriculture and Food business unit at the FD McMaster Laboratory, Chiswick or through the University of New England, Armidale, NSW.
t
Hate Crime Incident (Open Data)
data.tempe.gov
performance.tempe.gov
+7more
Updated Jan 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). Hate Crime Incident (Open Data) [Dataset]. https://data.tempe.gov/datasets/tempegov::hate-crime-incident-open-data-1/about
Explore at:
Dataset updated
Jan 17, 2024
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
The Tempe Police Department prides itself in its continued efforts to reduce harm within the community and is providing this dataset on hate crime incidents that occur in Tempe.The Tempe Police Department documents the type of bias that motivated a hate crime according to those categories established by the FBI. These include crimes motivated by biases based on race and ethnicity, religion, sexual orientation, disability, gender and gender identity.The Bias Type categories provided in the data come from the Bias Motivation Categories as defined in the Federal Bureau of Investigation (FBI) National Incident-Based Reporting System (NIBRS) manual, version 2020.1 dated 4/15/2021. The FBI NIBRS manual can be found at https://www.fbi.gov/file-repository/ucr/ucr-2019-1-nibrs-user-manua-093020.pdf with the Bias Motivation Categories found on pages 78-79.Although data is updated monthly, there is a delay by one month to allow for data validation and submission.Information about Tempe Police Department's collection and reporting process for possible hate crimes is included in https://storymaps.arcgis.com/stories/a963e97ca3494bfc8cd66d593eebabaf.Additional InformationSource: Data are from the Law Enforcement Records Management System (RMS)Contact: Angelique BeltranContact E-Mail: angelique_beltran@tempe.govData Source Type: TabularPreparation Method: Data from the Law Enforcement Records Management System (RMS) are entered by the Tempe Police Department into a GIS mapping system, which automatically publishes to open data.Publish Frequency: MonthlyPublish Method: New data entries are automatically published to open data. Data Dictionary
f
CV errors for the 5-Fold-CV.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chao-Yu Guo; Tse-Wei Liu; Yi-Hau Chen (2023). CV errors for the 5-Fold-CV. [Dataset]. http://doi.org/10.1371/journal.pone.0244094.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0244094.t002
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Chao-Yu Guo; Tse-Wei Liu; Yi-Hau Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CV errors for the 5-Fold-CV.
g
Data from: EartH2Observe, WFDEI and ERA-Interim data Merged and...
dataservices.gfz-potsdam.de
Updated 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Lange (2019). EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for ISIMIP (EWEMBI) [Dataset]. http://doi.org/10.5880/pik.2019.004
Explore at:
Unique identifier
https://doi.org/10.5880/pik.2019.004
Dataset updated
2019
Dataset provided by
GFZ Data Services
datacite
Authors
Stefan Lange
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Description
VERSION HISTORY:- On June 26, 2018 all files were republished due to the incorporation of additional observational data covering years 2014 to 2016. Prior to that date, the dataset only covered years 1979 to 2013. Data for all years prior to 2014 are identical in this and the original version of the dataset. DATA DESCRIPTION:The EWEMBI dataset was compiled to support the bias correction of climate input data for the impact assessments carried out in phase 2b of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP2b; Frieler et al., 2017), which will contribute to the 2018 IPCC special report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways. The EWEMBI data cover the entire globe at 0.5° horizontal and daily temporal resolution from 1979 to 2013. Data sources of EWEMBI are ERA-Interim reanalysis data (ERAI; Dee et al., 2011), WATCH forcing data methodology applied to ERA-Interim reanalysis data (WFDEI; Weedon et al., 2014), eartH2Observe forcing data (E2OBS; Calton et al., 2016) and NASA/GEWEX Surface Radiation Budget data (SRB; Stackhouse Jr. et al., 2011). The SRB data were used to bias-correct E2OBS shortwave and longwave radiation (Lange, 2018). Variables included in the EWEMBI dataset are Near Surface Relative Humidity, Near Surface Specific Humidity, Precipitation, Snowfall Flux, Surface Air Pressure, Surface Downwelling Longwave Radiation, Surface Downwelling Shortwave Radiation, Near Surface Wind Speed, Near-Surface Air Temperature, Daily Maximum Near Surface Air Temperature, Daily Minimum Near Surface Air Temperature, Eastward Near-Surface Wind and Northward Near-Surface Wind. For data sources, units and short names of all variables see Frieler et al. (2017, Table 1).
Z
Data from: Data files belonging to the paper "Dealing with clustered samples...
data.niaid.nih.gov
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
van Ebbenhorst Tengbergen, Tom (2024). Data files belonging to the paper "Dealing with clustered samples for assessing map accuracy by cross-validation" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6513428
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
van Ebbenhorst Tengbergen, Tom
Brus, Dick
Heuvelink, Gerard
Wadoux, Alexandre
de Bruin, Sytze
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mapping of environmental variables often relies on map accuracy assessment through cross-validation with the data used for calibrating the underlying mapping model. When the data points are spatially clustered, conventional cross-validation leads to optimistically biased estimates of map accuracy. Several papers have promoted spatial cross-validation as a means to tackle this over-optimism. Many of these papers blame spatial autocorrelation as the cause of the bias and propagate the widespread misconception that spatial proximity of calibration points to validation points invalidates classical statistical validation of maps. In the paper related to these data, we present and evaluate alternative cross-validation approaches for assessing map accuracy from clustered sample data.

The study area is western Europe, constrained in the north at 52° latitude and at -10° and 24° longitude The projection is IGNF:ETRS89LAEA (Lambert azimuthal equal area projection).

Files:

agb.tif = above ground biomass (AGB) map from version 3 of the 2017 CCI-Biomass product (https://catalogue.ceda.ac.uk/uuid/5f331c418e9f4935b8eb1b836f8a91b8) AGBstack.tif = covariates used for predicting AGB aggArea.tif = coarse grid used for simulation in the model-based methods ocs.tif = soil organic carbon stock (OCS) map (0-30 cm) from Soilgrids (https://www.isric.org/explore/soilgrids) OCSstack.tif = covariates used for predicting OCS strata.xxx = 100 compact geo-strata (ESRI shape) created with the spcosa package; used for generating clustered samples TOTmask.tif = mask of the area covered by the covariates

Details and data sources of the covariates in AGBstack.tif and OCSstack.tif:

Name

Description

Source

Note

ai

Aridity Index

https://chelsa-climate.org/downloads/

Version 2.1

bio1

Mean annual air temperature [°C]

https://chelsa-climate.org/downloads/ Version 2.1

bio5

Mean daily maximum air temperature of the warmest month [°C]

https://chelsa-climate.org/downloads/ Version 2.1

bio7

Annual range of air temperature [°C]

https://chelsa-climate.org/downloads/ Version 2.1

bio12

Annual precipitation [kg/m2]

https://chelsa-climate.org/downloads/ Version 2.1

bio15

Precipitation seasonality [kg/m2]

https://chelsa-climate.org/downloads/ Version 2.1

gdd10

Growing degree days heat sum above 10°C

https://chelsa-climate.org/downloads/ Version 2.1

clay

Clay content [g/kg] of the 0-5cm layer

https://soilgrids.org/

Only used for AGB

sand

Sand content [g/kg] of the 0-5cm layer

https://soilgrids.org/ as above

pH

Acidity (Ph(water)) of the 0-5cm layer

https://soilgrids.org/ as above

glc2017

Landcover 2017

https://land.copernicus.eu/global/products/lc, reclassified to: closed forest, open forest, natural non-forest veg., bare & sparse veg. cropland, built-up, water

Categorical variable

dem

Elevation

https://www.eea.europa.eu/data-and-maps/data/copernicus-land-monitoring-service-eu-dem

cosasp

Cosine of slope aspect

Computed with the terra package from elevation

Computed @25m resolution; next aggregated to 0.5km

sinasp

Sine of slope aspect

Computed with the terra package from elevation as above

slope

Slope

Computed with the terra package from elevation as above

TPI

Topographic position index

Computed with the terra package from elevation as above

TRI

Terrain ruggedness index

Computed with the terra package from elevation as above

TWI

Topographic wetness index

Computed with SAGA from 500m resolution (aggregated) dem

gedi

Forest height

https://glad.umd.edu/dataset/gedi

Zone: NAFR

xcoord

X coordinate

Using a mask created from the other covariates

ycoord

Y coordinate

Using a mask created from the other covariates

Dcoast

Distance from coast

Using a land mask created from the other covariates
d
Data from: Validation of COI metabarcoding primers for terrestrial...
dataone.org
datadryad.org
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vasco Elbrecht; Thomas W. A. Braukmann; Natalia V. Ivanova; Sean W. J. Prosser; Mehrdad Hajibabaei; Michael Wright; Evgeny V. Zakharov; Paul D. N. Hebert; Dirk Steinke (2025). Validation of COI metabarcoding primers for terrestrial arthropods [Dataset]. http://doi.org/10.5061/dryad.249rk92
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.249rk92
Dataset updated
Jun 2, 2025
Dataset provided by
Dryad Digital Repository
Authors
Vasco Elbrecht; Thomas W. A. Braukmann; Natalia V. Ivanova; Sean W. J. Prosser; Mehrdad Hajibabaei; Michael Wright; Evgeny V. Zakharov; Paul D. N. Hebert; Dirk Steinke
Time period covered
Jan 1, 2019
Description
Metabarcoding can rapidly determine the species composition of bulk samples and thus aids biodiversity and ecosystem assessment. However, it is essential to use primer sets that minimize amplification bias among taxa to maximize species recovery. Despite this fact, the performance of primer sets employed for metabarcoding terrestrial arthropods has not been sufficiently evaluated. This study tests the performance of 36 primer sets on a mock community containing 374 insect species. Amplification success was assessed with gradient PCRs and the 21 most promising primer sets selected for metabarcoding. These 21 primer sets were also tested by metabarcoding a Malaise trap sample. We identified eight primer sets, mainly those including inosine and/or high degeneracy, that recovered more than 95% of the species in the mock community. Results from the Malaise trap sample were congruent with the mock community, but primer sets generating short amplicons produced potential false positives. Taxon ...
Z
Data from: Deep Reinforcement Learning Enables Better Bias Control in...
data.niaid.nih.gov
zenodo.org
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li, Shan (2024). Deep Reinforcement Learning Enables Better Bias Control in Benchmark for Virtual Screening [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7861684
Explore at:
Dataset updated
Feb 16, 2024
Dataset provided by
Wu, Song
Zhang, Liangren
Wang, Dongmei
Li, Shan
Xia, Jie
Shen, Tao
Wang, Simon, Xiang
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
This compressed file contains all datasets made for the validation of MUBDsyn.

datasets_int_val: 17 cases in this folder are derived from MUBD for GPCRs. MUBDreal was made by MUBD-DecoyMaker2.0 and MUBDsyn was made by MUBD-DecoyMakersyn. datasets_ext_val_classical_VS: Five cases in this folder are derived from the shared cases of MUV and DUD-E. The active sets of MUV were taken as the input to make corresponding MUBD datasets. Files in SBVS are raw molecular docking results by smina. datasets_ext_val_SI_classical_VS: DeepCoy and TocoDecoy were used to make the datasets corresponding to the same five cases above. The data of DeepCoy was directly retrieved from DeepCoy resources at OPIG while topology decoys of TocoDecoy_9W were made based on the scripts provided at TocoDecoy GitHub Repository. Files in SBVS are raw molecular docking results by smina. datasets_ext_val_ML_VS: Ten cases in this folder are derived from NRLiSt-BDB. Corresponding MUBD datasets were made as described above. All these datasets can be used for the reproduction of validation performed in the manuscript or to benchmark various virtual screening methods.
f
Data from: Iterated Data Sharpening
tandf.figshare.com
zip
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hanxiao Chen; W. John Braun; Xiaoping Shi (2024). Iterated Data Sharpening [Dataset]. http://doi.org/10.6084/m9.figshare.25949771.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25949771.v1
Dataset updated
Jul 12, 2024
Dataset provided by
Taylor & Francis
Authors
Hanxiao Chen; W. John Braun; Xiaoping Shi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data sharpening in kernel regression has been shown to be an effective method of reducing bias while having minimal effects on variance. Earlier efforts to iterate the data sharpening procedure have been less effective, due to the employment of an inappropriate sharpening transformation. In this article, an iterated data sharpening algorithm is proposed which reduces the asymptotic bias at each iteration, while having modest effects on the variance. The efficacy of the iterative approach is demonstrated theoretically and via a simulation study. Boundary effects persist and the affected region successively grows when the iteration is applied to local constant regression. By contrast, boundary bias successively decreases for each iteration step when applied to local linear regression. This study also shows that after iteration, the resulting estimates are less sensitive to bandwidth choice, and a further simulation study demonstrates that iterated data sharpening with data-driven bandwidth selection via cross-validation can lead to more accurate regression function estimation. Examples with real data are used to illustrate the scope of change made possible by using iterated data sharpening and to also identify its limitations. Supplementary materials for this article are available online.
d
Data from: Safari science: assessing the reliability of citizen science data...
search.dataone.org
data.niaid.nih.gov
+2more
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cara Steger; Bilal Butt; Mevin B. Hooten (2025). Safari science: assessing the reliability of citizen science data for wildlife surveys [Dataset]. http://doi.org/10.5061/dryad.mb7qk
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.mb7qk
Dataset updated
Apr 2, 2025
Dataset provided by
Dryad Digital Repository
Authors
Cara Steger; Bilal Butt; Mevin B. Hooten
Time period covered
Jan 1, 2018
Description
Protected areas are the cornerstone of global conservation, yet financial support for basic monitoring infrastructure is lacking in 60% of them. Citizen science holds potential to address these shortcomings in wildlife monitoring, particularly for resource-limited conservation initiatives in developing countries - if we can account for the reliability of data produced by volunteer citizen scientists (VCS) .

This study tests the reliability of VCS data vs. data produced by trained ecologists, presenting a hierarchical framework for integrating diverse datasets to assess extra variability from VCS data.

Our results show that, while VCS data are likely to be overdispersed for our system, the overdispersion varies widely by species. We contend that citizen science methods, within the context of East African drylands, may be more appropriate for species with large body sizes, which are relatively rare, or those that form small herds. VCS perceptions of the charisma of a species ma...
G
Credit Card Application Decisions
gomask.ai
csv
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GoMask.ai (2025). Credit Card Application Decisions [Dataset]. https://gomask.ai/marketplace/datasets/credit-card-application-decisions
Explore at:
csv(Unknown)Available download formats
Dataset updated
Jul 21, 2025
Dataset provided by
GoMask.ai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
age, gender, occupation, risk_score, reviewed_by, address_city, applicant_id, credit_score, address_state, annual_income, and 19 more
Description
This dataset contains detailed synthetic records of credit card applications, including applicant demographics, financial profiles, application outcomes, and risk assessments. It is ideal for validating credit scoring models, detecting bias, and supporting regulatory compliance or fairness analysis in financial services. The flat schema design enables seamless integration with analytics and machine learning workflows.
B
Data from: A comprehensive analysis of autocorrelation and bias in home...
borealisdata.ca
search.dataone.org
+1more
Updated May 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael J. Noonan; Marlee A. Tucker; Christen H. Fleming; Tom S. Akre; Susan C. Alberts; Abdullahi H. Ali; Jeanne Altmann; Pamela C. Antunes; Jerrold L. Belant; Dean Beyer; Niels Blaum; Katrin Böhning-Gaese; Laury Cullen Jr.; Rogerio de Paula Cunha; Jasja Dekker; Jonathan Drescher-Lehman; Nina Farwig; Claudia Fichtel; Christina Fischer; Adam T. Ford; Jacob R. Goheen; René Janssen; Florian Jeltsch; Matthew Kauffman; Peter M. Kappeler; Flávia Koch; Scott LaPoint; A. Catherine Markham; Emilia Patricia Medici; Ronaldo G. Morato; Ran Nathan; Luiz Gustavo R. Oliveira-Santos; Kirk A. Olson; Bruce D. Patterson; Agustin Paviolo; Emiliano E. Ramalho; Sascha Rosner; Nuria Selva; Agnieszka Sergiel; Marina X. da Silva; Orr Spiegel; Peter Thompson; Wiebke Ullmann; Filip Zięba; Tomasz Zwijacz-Kozica; William F. Fagan; Thomas Mueller; Justin M. Calabrese (2021). Data from: A comprehensive analysis of autocorrelation and bias in home range estimation [Dataset]. http://doi.org/10.5683/SP2/OAJTAO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/OAJTAO
Dataset updated
May 19, 2021
Dataset provided by
Borealis
Authors
Michael J. Noonan; Marlee A. Tucker; Christen H. Fleming; Tom S. Akre; Susan C. Alberts; Abdullahi H. Ali; Jeanne Altmann; Pamela C. Antunes; Jerrold L. Belant; Dean Beyer; Niels Blaum; Katrin Böhning-Gaese; Laury Cullen Jr.; Rogerio de Paula Cunha; Jasja Dekker; Jonathan Drescher-Lehman; Nina Farwig; Claudia Fichtel; Christina Fischer; Adam T. Ford; Jacob R. Goheen; René Janssen; Florian Jeltsch; Matthew Kauffman; Peter M. Kappeler; Flávia Koch; Scott LaPoint; A. Catherine Markham; Emilia Patricia Medici; Ronaldo G. Morato; Ran Nathan; Luiz Gustavo R. Oliveira-Santos; Kirk A. Olson; Bruce D. Patterson; Agustin Paviolo; Emiliano E. Ramalho; Sascha Rosner; Nuria Selva; Agnieszka Sergiel; Marina X. da Silva; Orr Spiegel; Peter Thompson; Wiebke Ullmann; Filip Zięba; Tomasz Zwijacz-Kozica; William F. Fagan; Thomas Mueller; Justin M. Calabrese
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Global
Dataset funded by
National Science Foundation
Description
AbstractHome range estimation is routine practice in ecological research. While advances in animal tracking technology have increased our capacity to collect data to support home range analysis, these same advances have also resulted in increasingly autocorrelated data. Consequently, the question of which home range estimator to use on modern, highly autocorrelated tracking data remains open. This question is particularly relevant given that most estimators assume independently sampled data. Here, we provide a comprehensive evaluation of the effects of autocorrelation on home range estimation. We base our study on an extensive dataset of GPS locations from 369 individuals representing 27 species distributed across 5 continents. We first assemble a broad array of home range estimators, including Kernel Density Estimation (KDE) with four bandwidth optimizers (Gaussian reference function, autocorrelated-Gaussian reference function (AKDE), Silverman's rule of thumb, and least squares cross-validation), Minimum Convex Polygon, and Local Convex Hull methods. Notably, all of these estimators except AKDE assume independent and identically distributed (IID) data. We then employ half-sample cross-validation to objectively quantify estimator performance, and the recently introduced effective sample size for home range area estimation ($\hat{N}_\mathrm{area}$) to quantify the information content of each dataset. We found that AKDE 95\% area estimates were larger than conventional IID-based estimates by a mean factor of 2. The median number of cross-validated locations included in the holdout sets by AKDE 95\% (or 50\%) estimates was 95.3\% (or 50.1\%), confirming the larger AKDE ranges were appropriately selective at the specified quantile. Conversely, conventional estimates exhibited negative bias that increased with decreasing $\hat{N}_\mathrm{area}$. To contextualize our empirical results, we performed a detailed simulation study to tease apart how sampling frequency, sampling duration, and the focal animal's movement conspire to affect range estimates. Paralleling our empirical results, the simulation study demonstrated that AKDE was generally more accurate than conventional methods, particularly for small $\hat{N}_\mathrm{area}$. While 72\% of the 369 empirical datasets had \textgreater1000 total observations, only 4\% had an $\hat{N}_\mathrm{area}$ \textgreater1000, where 30\% had an $\hat{N}_\mathrm{area}$ \textless30. In this frequently encountered scenario of small $\hat{N}_\mathrm{area}$, AKDE was the only estimator capable of producing an accurate home range estimate on autocorrelated data. Usage notesEmpirical GPS tracking dataAnonymised, empirical tracking data used to estimate home range areas based on various home range estimators.Anonymised_Data.zip
Data for "Training data composition affects performance of protein structure...
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Oct 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Derry; Alexander Derry; Kristy A. Carpenter; Kristy A. Carpenter; Russ B. Altman; Russ B. Altman (2021). Data for "Training data composition affects performance of protein structure analysis algorithms" by A. Derry, K. A. Carpenter, & R. B. Altman [Dataset]. http://doi.org/10.5281/zenodo.5542201
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5542201
Dataset updated
Oct 1, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander Derry; Alexander Derry; Kristy A. Carpenter; Kristy A. Carpenter; Russ B. Altman; Russ B. Altman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This repository contains all data used in "Training data composition affects performance of protein structure analysis algorithms", published in the Pacific Symposium on Biocomputing 2022 by A. Derry, K. A. Carpenter, & R. B. Altman.

The data consists of the following files:

ema_zenodo_data.tar.gz: train, validation, and test splits for Estimation of Model Accuracy task, in LMDB format

design_zenodo_data.tar.gz: train, validation, and test splits for Protein Sequence Design task, in JSON format

enz_cat_res_zenodo_data.tar.gz: train, validation, and test splits for Catalytic Residue and Enzyme Prediction task, in TF record format

Details on dataset construction can be found in our paper and dataloaders can be found in our Github repo.

Reference

A. Derry*, K. A. Carpenter*, & R. B. Altman, "Training data composition affects performance of protein structure analysis algorithms", 2021.

Dataset References

Datasets used were derived from the following works:

Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K., & Moult, J. (2019). Critical assessment of methods of protein structure prediction (CASP)—Round XIII. In Proteins: Structure, Function and Bioinformatics (Vol. 87, Issue 12, pp. 1011–1020). https://doi.org/10.1002/prot.25823

Ingraham, J., Garg, V. K., Barzilay, R., & Jaakkola, T. (2019). Generative Models for Graph-Based Protein Design. https://openreview.net/pdf?id=SJgxrLLKOE

Furnham, N., Holliday, G. L., de Beer, T. A. P., Jacobsen, J. O. B., Pearson, W. R., & Thornton, J. M. (2014). The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes. Nucleic Acids Research, 42 (Database issue), D485–D489.
d
Data from: Evaluating citizen vs. professional data for modelling...
datadryad.org
zip
Updated Apr 25, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Courtney A. Tye; Robert A. McCleery; Robert J. Fletcher; Daniel U. Greene; Ryan S. Butryn (2017). Evaluating citizen vs. professional data for modelling distributions of a rare squirrel [Dataset]. http://doi.org/10.5061/dryad.8t475
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8t475
Dataset updated
Apr 25, 2017
Dataset provided by
Dryad
Authors
Courtney A. Tye; Robert A. McCleery; Robert J. Fletcher; Daniel U. Greene; Ryan S. Butryn
Time period covered
Apr 25, 2016
Area covered
Florida
Description
Squirrel locationsFox squirrel locations entered on website from public and professionals. File contains x,y coordinates and associated covariatespresencedata_final.csvSquirrel validation pointsFox squirrel occurrence data from camera trappingvalid_final.csv
Data from: Data for PAN at SemEval 2019 Task 4: Hyperpartisan News Detection...
zenodo.org
bin, zip
Updated Dec 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Kiesel; Johannes Kiesel; Maria Mestre; Rishabh Shukla; Emmanuel Vincent; David Corney; Payam Adineh; Benno Stein; Benno Stein; Martin Potthast; Martin Potthast; Maria Mestre; Rishabh Shukla; Emmanuel Vincent; David Corney; Payam Adineh (2021). Data for PAN at SemEval 2019 Task 4: Hyperpartisan News Detection [Dataset]. http://doi.org/10.5281/zenodo.1489920
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1489920
Dataset updated
Dec 13, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johannes Kiesel; Johannes Kiesel; Maria Mestre; Rishabh Shukla; Emmanuel Vincent; David Corney; Payam Adineh; Benno Stein; Benno Stein; Martin Potthast; Martin Potthast; Maria Mestre; Rishabh Shukla; Emmanuel Vincent; David Corney; Payam Adineh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training and validation data for the PAN @ SemEval 2019 Task 4: Hyperpartisan News Detection.

The data is split into multiple files. The articles are contained in the files with names starting with "articles-" (which validate against the XML schema article.xsd). The ground-truth information is contained in the files with names starting with "ground-truth-" (which validate against the XML schema ground-truth.xsd).

The first part of the data (filename contains "bypublisher") is labeled by the overall bias of the publisher as provided by BuzzFeed journalists or MediaBiasFactCheck.com. It contains a total of 750,000 articles, half of which (375,000) are hyperpartisan and half of which are not. Half of the articles that are hyperpartisan (187,500) are on the left side of the political spectrum, half are on the right side. This data is split into a training set (80%, 600,000 articles) and a validation set (20%, 150,000 articles), where no publisher that occurs in the training set also occurs in the validation set. Similarly, none of the publishers in those sets will occur in the test set.

The second part of the data (filename contains "byarticle") is labeled through crowdsourcing on an article basis. The data contains only articles for which a consensus among the crowdsourcing workers existed. It contains a total of 645 articles. Of these, 238 (37%) are hyperpartisan and 407 (63%) are not, We will use a similar (but balanced!) test set. Again, none of the publishers in this set will occur in the test set.

Note that article IDs are only unique within the parts.

The collection (including labels) are licensed under a Creative Commons Attribution 4.0 International License.

Acknowledgements: Thanks to Jonathan Miller for his assistance in cleaning the data!
p
Validation Condition.csv
psycharchives.org
Updated Dec 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Validation Condition.csv [Dataset]. https://psycharchives.org/handle/20.500.12034/4695
Explore at:
Dataset updated
Dec 14, 2021
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Data set for: P.G. Martins, A.; Köbrich, M.V.; Carstengerdes , N. & Biella, M. (submitted). All’s Well That Ends Well? Outcome Bias in Pilots During Instrument Flight Rules. Applied Cognitive Psychology. Data set for the two conditions, including the codebook: data Validation Condition

Facebook

Twitter

Click to copy link

Link copied

Cite

Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani (2024). Data and script for "Detecting synthetic population bias using a spatially-oriented framework and independent validation data" [Dataset]. http://doi.org/10.6084/m9.figshare.24664647.v1

Data and script for "Detecting synthetic population bias using a spatially-oriented framework and independent validation data"

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24664647.v1

Dataset updated

May 15, 2024

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This folder contains processed and derived data, and script for the manuscript, 'Detecting synthetic population bias using a spatially-oriented framework and independent validation data'.Abstract: Models of human mobility can be broadly applied to find solutions addressing diverse topics such as public health policy, transportation management, emergency management, and urban development. However, many mobility models require individual-level data that is limited in availability and accessibility. Synthetic populations are commonly used as the foundation for mobility models because they provide detailed individual-level data representing the different types and characteristics of people in a study area. Thorough evaluation of synthetic populations are required to detect data biases before the prejudices are transferred to subsequent applications. Although synthetic populations are commonly used for modeling mobility, they are conventionally validated by their sociodemographic characteristics, rather than mobility attributes. Mobility microdata provides an opportunity to independently/externally validate the mobility attributes of synthetic populations. This study demonstrates a spatially-oriented data validation framework and independent data validation to assess the mobility attributes of two synthetic populations at different spatial granularities. Validation using independent data (SafeGraph) and the validation framework replicated the spatial distribution of errors detected using source data (LODES) and total absolute error. Spatial clusters of error exposed the locations of underrepresented and overrepresented communities. This information can guide bias mitigation efforts to generate a more representative synthetic population.

Clear search

Close search

Google apps

Main menu

Data and script for "Detecting synthetic population bias using a...

Data from: A global high-resolution and bias-corrected dataset of CMIP6...

EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for...

Validation and automation of the attention bias test for anxious states in...

Full Simulation Data, Validation Indices, and Frozen Repository for the...

Laying hen attention bias test data

Hate Crime Incident (Open Data)

CV errors for the 5-Fold-CV.

Data from: EartH2Observe, WFDEI and ERA-Interim data Merged and...

Data from: Data files belonging to the paper "Dealing with clustered samples...

Data from: Validation of COI metabarcoding primers for terrestrial...

Data from: Deep Reinforcement Learning Enables Better Bias Control in...

Data from: Iterated Data Sharpening

Data from: Safari science: assessing the reliability of citizen science data...

Credit Card Application Decisions

Data from: A comprehensive analysis of autocorrelation and bias in home...

Data for "Training data composition affects performance of protein structure...

Data from: Evaluating citizen vs. professional data for modelling...

Data from: Data for PAN at SemEval 2019 Task 4: Hyperpartisan News Detection...

Validation Condition.csv

Data and script for "Detecting synthetic population bias using a spatially-oriented framework and independent validation data"