88 datasets found
  1. Data and script for "Detecting synthetic population bias using a...

    • figshare.com
    zip
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani (2024). Data and script for "Detecting synthetic population bias using a spatially-oriented framework and independent validation data" [Dataset]. http://doi.org/10.6084/m9.figshare.24664647.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 15, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This folder contains processed and derived data, and script for the manuscript, 'Detecting synthetic population bias using a spatially-oriented framework and independent validation data'.Abstract: Models of human mobility can be broadly applied to find solutions addressing diverse topics such as public health policy, transportation management, emergency management, and urban development. However, many mobility models require individual-level data that is limited in availability and accessibility. Synthetic populations are commonly used as the foundation for mobility models because they provide detailed individual-level data representing the different types and characteristics of people in a study area. Thorough evaluation of synthetic populations are required to detect data biases before the prejudices are transferred to subsequent applications. Although synthetic populations are commonly used for modeling mobility, they are conventionally validated by their sociodemographic characteristics, rather than mobility attributes. Mobility microdata provides an opportunity to independently/externally validate the mobility attributes of synthetic populations. This study demonstrates a spatially-oriented data validation framework and independent data validation to assess the mobility attributes of two synthetic populations at different spatial granularities. Validation using independent data (SafeGraph) and the validation framework replicated the spatial distribution of errors detected using source data (LODES) and total absolute error. Spatial clusters of error exposed the locations of underrepresented and overrepresented communities. This information can guide bias mitigation efforts to generate a more representative synthetic population.

  2. Data from: A global high-resolution and bias-corrected dataset of CMIP6...

    • zenodo.org
    bin
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qinqin Kong; Qinqin Kong; Matthew Huber; Matthew Huber (2024). A global high-resolution and bias-corrected dataset of CMIP6 projected heat stress metrics [Dataset]. http://doi.org/10.5281/zenodo.13799897
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Qinqin Kong; Qinqin Kong; Matthew Huber; Matthew Huber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Motivation

    Increasing heat stress due to climate change poses significant risks to human health and can lead to widespread social and economic consequences. Evaluating these impacts requires reliable datasets of heat stress projections.

    Data Record

    We present a global dataset projecting future dry-bulb, wet-bulb, and wet-bulb globe temperatures under 1-4°C global warming scenarios (at 0.5°C intervals) relative to the preindustrial era, using outputs from 16 CMIP6 global climate models (GCMs) (Table 1). All variables were retrieved from the historical and SSP585 scenarios which were selected to maximize the warming signal.

    The dataset was bias-corrected against ERA5 reanalysis by incorporating the GCM-simulated climate change signal onto the ERA5 baseline (1950-1976) at a 3-hourly frequency. It therefore includes a 27-year sample for each GCM under each warming target.

    The data is provided at a fine spatial resolution of 0.25° x 0.25° and a temporal resolution of 3 hours, and is stored in a self-describing NetCDF format. Filenames follow the pattern "VAR_bias_corrected_3hr_GCM_XC_yyyy.nc", where:

    • "VAR" represents the variable (Ta, Tw, WBGT for dry-bulb, wet-bulb, and wet-bulb globe temperature, respectively),

    • "GCM" denotes the CMIP6 GCM name,

    • "X" indicates the warming target compared to the preindustrial period,

    • "yyyy" represents the year index (0001-0027) of the 27-year sample

    Table 1 CMIP6 GCMs used for generating the dataset for Ta, Tw and WBGT.

    GCM

    Realization

    GCM grid spacing

    Ta

    Tw

    WBGT

    ACCESS-CM2

    r1i1p1f1

    1.25ox1.875o

    BCC-CSM2-MR

    r1i1p1f1

    1.1ox1.125o

    CanESM5

    r1i1p2f1

    2.8ox2.8o

    CMCC-CM2-SR5

    r1i1p1f1

    0.94ox1.25o

    CMCC-ESM2

    r1i1p1f1

    0.94ox1.25o

    CNRM-CM6-1

    r1i1p1f2

    1.4ox1.4o

    EC-Earth3

    r1i1p1f1

    0.7ox0.7o

    GFDL-ESM4

    r1i1p1f1

    1.0ox1.25o

    HadGEM3-GC31-LL

    r1i1p1f3

    1.25ox1.875o

    HadGEM3-GC31-MM

    r1i1p1f3

    0.55ox0.83o

    KACE-1-0-G

    r1i1p1f1

    1.25ox1.875o

    KIOST-ESM

    r1i1p1f1

    1.9ox1.9o

    MIROC-ES2L

    r1i1p1f2

    2.8ox2.8o

    MIROC6

    r1i1p1f1

    1.4ox1.4o

    MPI-ESM1-2-HR

    r1i1p1f1

    0.93ox0.93o

    MPI-ESM1-2-LR

    r1i1p1f1

    1.85ox1.875o

    Data Access

    An inventory of the dataset is available in this repository. The complete dataset, approximately 57 TB in size, is freely accessible via Purdue Fortress' long-term archive through Globus at Globus Link. After clicking the link, users may be prompted to log in with a Purdue institutional Globus account. You can switch to your institutional account, or log in via a personal Globus ID, Gmail, GitHub handle, or ORCID ID. Alternatively, the dataset can be accessed by searching for the universally unique identifier (UUID): "6538f53a-1ea7-4c13-a0cf-10478190b901" in Globus.

    Dataset Validation

    We validate the bias-correction method and show that it significantly enhances the GCMs' accuracy in reproducing both the annual average and the full range of quantiles for all metrics within an ERA5 reference climate state. This dataset is expected to support future research on projected changes in mean and extreme heat stress and the assessment of related health and socio-economic impacts.

    For a detailed introduction to the dataset and its validation, please refer to our data descriptor currently under review at Scientific Data. We will update this information upon publication.









  3. g

    EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for...

    • dataservices.gfz-potsdam.de
    • explore.openaire.eu
    Updated 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Lange (2016). EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for ISIMIP (EWEMBI) [Dataset]. http://doi.org/10.5880/pik.2016.004
    Explore at:
    Dataset updated
    2016
    Dataset provided by
    GFZ Data Services
    datacite
    Authors
    Stefan Lange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    The EWEMBI dataset was compiled to support the bias correction of climate input data for the impact assessments carried out in phase 2b of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP2b; Frieler et al., 2017), which will contribute to the 2018 IPCC special report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways. The EWEMBI data cover the entire globe at 0.5° horizontal and daily temporal resolution from 1979 to 2013. Data sources of EWEMBI are ERA-Interim reanalysis data (ERAI; Dee et al., 2011), WATCH forcing data methodology applied to ERA-Interim reanalysis data (WFDEI; Weedon et al., 2014), eartH2Observe forcing data (E2OBS; Calton et al., 2016) and NASA/GEWEX Surface Radiation Budget data (SRB; Stackhouse Jr. et al., 2011). The SRB data were used to bias-correct E2OBS shortwave and longwave radiation (Lange, 2018). Variables included in the EWEMBI dataset are Near Surface Relative Humidity, Near Surface Specific Humidity, Precipitation, Snowfall Flux, Surface Air Pressure, Surface Downwelling Longwave Radiation, Surface Downwelling Shortwave Radiation, Near Surface Wind Speed, Near-Surface Air Temperature, Daily Maximum Near Surface Air Temperature, Daily Minimum Near Surface Air Temperature, Eastward Near-Surface Wind and Northward Near-Surface Wind. For data sources, units and short names of all variables see Frieler et al. (2017, Table 1).

  4. Validation and automation of the attention bias test for anxious states in...

    • researchdata.edu.au
    • data.csiro.au
    datadownload
    Updated Aug 4, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caroline Lee; Sue Belson; Ian Colditz; Jessica Monk; Susan Belson; Jessica Monk; Ian Colditz; Caroline Lee (2021). Validation and automation of the attention bias test for anxious states in sheep (AEC16/19) [Dataset]. http://doi.org/10.25919/AFGX-EP76
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Aug 4, 2021
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Caroline Lee; Sue Belson; Ian Colditz; Jessica Monk; Susan Belson; Jessica Monk; Ian Colditz; Caroline Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 22, 2016 - Aug 22, 2017
    Area covered
    Description

    Assessment of emotional states is becoming an increasingly important part of animal welfare research, but emotional state is hard to measure and often requires time consuming or complicated tests. A threat perception test has been developed as a measure of anxiety in sheep and validated using pharmacological models of anxiety. While it appears that the responses we measure in the test are directed towards the threat of a dog, a controlled study had not been conducted to confirm this. The main objective of this study was therefore to further investigate the behavioural responses of sheep in the threat perception test to differentiate between responses to the dog versus responses to the novel testing environment itself. A secondary aim of this study was to automate some of the behavioural measures taken during the test. The collection of key threat perception measures (vigilance and attention to threat) from videos is a long and labour intensive process. Accelerometers or similar devices have been used previously on sheep, attached via halters or collars, to monitor animal movements and feeding behaviours. This study investigated the use such devices to automate the collection of vigilance and attention to threat data, making the test faster, more practical and more accurate. Importantly, this study aimed to determine whether the attachment of data loggers to sheep would alter the behaviour of the animals during testing. Lineage: Details of the methods used to produce this data have been published and can be found at https://doi.org/10.1371/journal.pone.0190404 (see methods for Experiment 2)

  5. Full Simulation Data, Validation Indices, and Frozen Repository for the...

    • zenodo.org
    zip
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arkaprabha Ganguli; Arkaprabha Ganguli; Jeremy Feinstein; Jeremy Feinstein; Ibraheem Raji; Ibraheem Raji; Akintomide Afolayan Akinsanola; Akintomide Afolayan Akinsanola; Connor Aghili; Chunyong Jung; Chunyong Jung; Jordan Branham; Jordan Branham; Thomas Wall; Thomas Wall; Whitney Huang; Whitney Huang; Veerabhadra Rao Kotamarthi; Veerabhadra Rao Kotamarthi; Connor Aghili (2025). Full Simulation Data, Validation Indices, and Frozen Repository for the Empirical Mode Decomposition Based Bias Correction Approach [Dataset]. http://doi.org/10.5281/zenodo.15244202
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Arkaprabha Ganguli; Arkaprabha Ganguli; Jeremy Feinstein; Jeremy Feinstein; Ibraheem Raji; Ibraheem Raji; Akintomide Afolayan Akinsanola; Akintomide Afolayan Akinsanola; Connor Aghili; Chunyong Jung; Chunyong Jung; Jordan Branham; Jordan Branham; Thomas Wall; Thomas Wall; Whitney Huang; Whitney Huang; Veerabhadra Rao Kotamarthi; Veerabhadra Rao Kotamarthi; Connor Aghili
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This archive provides the full dataset and reproducibility materials for the EMDBC (Empirical Mode Decomposition-Based Bias Correction) method. It includes:

    • dataset.zip: Dataset containing the full spatial temperature time series for WRF simulations and validation indices:
      • ccsm_1995-2004_daily_t2.nc, ccsm_2045-2054_daily_t2.nc, and ccsm_2085-2094_daily_t2.nc: WRF-simulated near-surface air temperature data for historical and future periods.
      • validation_area_indices.txt: Spatial grid indices used for validation in the EMDBC evaluation.
    • repository.zip: A frozen snapshot of the GitHub repository at the time of publication. This includes the EMDBC source code and a minimal working example using sample time series.
  6. Laying hen attention bias test data

    • researchdata.edu.au
    • data.csiro.au
    datadownload
    Updated Apr 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jim Lea; Sue Belson; Caroline Lee; Dana Campbell; Susan Belson; James Lea; Dana L.M. Campbell; Caroline Lee (2019). Laying hen attention bias test data [Dataset]. http://doi.org/10.25919/5CBD1CA76AA19
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Apr 22, 2019
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Jim Lea; Sue Belson; Caroline Lee; Dana Campbell; Susan Belson; James Lea; Dana L.M. Campbell; Caroline Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2012 - Jan 1, 2017
    Description

    Data for application of an attention bias test in free-range laying hens including pharmacological validation of the test using the anxiogenic drug m-CPP. Lineage: All data were obtained by staff and students employed within the Agriculture and Food business unit at the FD McMaster Laboratory, Chiswick or through the University of New England, Armidale, NSW.

  7. t

    Hate Crime Incident (Open Data)

    • data.tempe.gov
    • performance.tempe.gov
    • +7more
    Updated Jan 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2024). Hate Crime Incident (Open Data) [Dataset]. https://data.tempe.gov/datasets/tempegov::hate-crime-incident-open-data-1/about
    Explore at:
    Dataset updated
    Jan 17, 2024
    Dataset authored and provided by
    City of Tempe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    The Tempe Police Department prides itself in its continued efforts to reduce harm within the community and is providing this dataset on hate crime incidents that occur in Tempe.The Tempe Police Department documents the type of bias that motivated a hate crime according to those categories established by the FBI. These include crimes motivated by biases based on race and ethnicity, religion, sexual orientation, disability, gender and gender identity.The Bias Type categories provided in the data come from the Bias Motivation Categories as defined in the Federal Bureau of Investigation (FBI) National Incident-Based Reporting System (NIBRS) manual, version 2020.1 dated 4/15/2021. The FBI NIBRS manual can be found at https://www.fbi.gov/file-repository/ucr/ucr-2019-1-nibrs-user-manua-093020.pdf with the Bias Motivation Categories found on pages 78-79.Although data is updated monthly, there is a delay by one month to allow for data validation and submission.Information about Tempe Police Department's collection and reporting process for possible hate crimes is included in https://storymaps.arcgis.com/stories/a963e97ca3494bfc8cd66d593eebabaf.Additional InformationSource: Data are from the Law Enforcement Records Management System (RMS)Contact: Angelique BeltranContact E-Mail: angelique_beltran@tempe.govData Source Type: TabularPreparation Method: Data from the Law Enforcement Records Management System (RMS) are entered by the Tempe Police Department into a GIS mapping system, which automatically publishes to open data.Publish Frequency: MonthlyPublish Method: New data entries are automatically published to open data. Data Dictionary

  8. f

    CV errors for the 5-Fold-CV.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chao-Yu Guo; Tse-Wei Liu; Yi-Hau Chen (2023). CV errors for the 5-Fold-CV. [Dataset]. http://doi.org/10.1371/journal.pone.0244094.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Chao-Yu Guo; Tse-Wei Liu; Yi-Hau Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CV errors for the 5-Fold-CV.

  9. g

    Data from: EartH2Observe, WFDEI and ERA-Interim data Merged and...

    • dataservices.gfz-potsdam.de
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Lange (2019). EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for ISIMIP (EWEMBI) [Dataset]. http://doi.org/10.5880/pik.2019.004
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    GFZ Data Services
    datacite
    Authors
    Stefan Lange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    VERSION HISTORY:- On June 26, 2018 all files were republished due to the incorporation of additional observational data covering years 2014 to 2016. Prior to that date, the dataset only covered years 1979 to 2013. Data for all years prior to 2014 are identical in this and the original version of the dataset. DATA DESCRIPTION:The EWEMBI dataset was compiled to support the bias correction of climate input data for the impact assessments carried out in phase 2b of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP2b; Frieler et al., 2017), which will contribute to the 2018 IPCC special report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways. The EWEMBI data cover the entire globe at 0.5° horizontal and daily temporal resolution from 1979 to 2013. Data sources of EWEMBI are ERA-Interim reanalysis data (ERAI; Dee et al., 2011), WATCH forcing data methodology applied to ERA-Interim reanalysis data (WFDEI; Weedon et al., 2014), eartH2Observe forcing data (E2OBS; Calton et al., 2016) and NASA/GEWEX Surface Radiation Budget data (SRB; Stackhouse Jr. et al., 2011). The SRB data were used to bias-correct E2OBS shortwave and longwave radiation (Lange, 2018). Variables included in the EWEMBI dataset are Near Surface Relative Humidity, Near Surface Specific Humidity, Precipitation, Snowfall Flux, Surface Air Pressure, Surface Downwelling Longwave Radiation, Surface Downwelling Shortwave Radiation, Near Surface Wind Speed, Near-Surface Air Temperature, Daily Maximum Near Surface Air Temperature, Daily Minimum Near Surface Air Temperature, Eastward Near-Surface Wind and Northward Near-Surface Wind. For data sources, units and short names of all variables see Frieler et al. (2017, Table 1).

  10. Z

    Data from: Data files belonging to the paper "Dealing with clustered samples...

    • data.niaid.nih.gov
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    van Ebbenhorst Tengbergen, Tom (2024). Data files belonging to the paper "Dealing with clustered samples for assessing map accuracy by cross-validation" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6513428
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    van Ebbenhorst Tengbergen, Tom
    Brus, Dick
    Heuvelink, Gerard
    Wadoux, Alexandre
    de Bruin, Sytze
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mapping of environmental variables often relies on map accuracy assessment through cross-validation with the data used for calibrating the underlying mapping model. When the data points are spatially clustered, conventional cross-validation leads to optimistically biased estimates of map accuracy. Several papers have promoted spatial cross-validation as a means to tackle this over-optimism. Many of these papers blame spatial autocorrelation as the cause of the bias and propagate the widespread misconception that spatial proximity of calibration points to validation points invalidates classical statistical validation of maps. In the paper related to these data, we present and evaluate alternative cross-validation approaches for assessing map accuracy from clustered sample data.

    The study area is western Europe, constrained in the north at 52° latitude and at -10° and 24° longitude The projection is IGNF:ETRS89LAEA (Lambert azimuthal equal area projection).

    Files:

    agb.tif = above ground biomass (AGB) map from version 3 of the 2017 CCI-Biomass product (https://catalogue.ceda.ac.uk/uuid/5f331c418e9f4935b8eb1b836f8a91b8) AGBstack.tif = covariates used for predicting AGB aggArea.tif = coarse grid used for simulation in the model-based methods ocs.tif = soil organic carbon stock (OCS) map (0-30 cm) from Soilgrids (https://www.isric.org/explore/soilgrids) OCSstack.tif = covariates used for predicting OCS strata.xxx = 100 compact geo-strata (ESRI shape) created with the spcosa package; used for generating clustered samples TOTmask.tif = mask of the area covered by the covariates

    Details and data sources of the covariates in AGBstack.tif and OCSstack.tif:

    Name

    Description

    Source

    Note

    ai

    Aridity Index

    https://chelsa-climate.org/downloads/

        Version 2.1
    

    bio1

    Mean annual air temperature [°C]

        https://chelsa-climate.org/downloads/
        Version 2.1
    

    bio5

    Mean daily maximum air temperature of the warmest month [°C]

        https://chelsa-climate.org/downloads/
        Version 2.1
    

    bio7

    Annual range of air temperature [°C]

        https://chelsa-climate.org/downloads/
        Version 2.1
    

    bio12

    Annual precipitation [kg/m2]

        https://chelsa-climate.org/downloads/
        Version 2.1
    

    bio15

    Precipitation seasonality [kg/m2]

        https://chelsa-climate.org/downloads/
        Version 2.1
    

    gdd10

    Growing degree days heat sum above 10°C

        https://chelsa-climate.org/downloads/
        Version 2.1
    

    clay

    Clay content [g/kg] of the 0-5cm layer

    https://soilgrids.org/

    Only used for AGB

    sand

    Sand content [g/kg] of the 0-5cm layer

        https://soilgrids.org/
        as above
    

    pH

    Acidity (Ph(water)) of the 0-5cm layer

        https://soilgrids.org/
        as above
    

    glc2017

    Landcover 2017

    https://land.copernicus.eu/global/products/lc, reclassified to: closed forest, open forest, natural non-forest veg., bare & sparse veg. cropland, built-up, water

    Categorical variable

    dem

    Elevation

    https://www.eea.europa.eu/data-and-maps/data/copernicus-land-monitoring-service-eu-dem

    cosasp

    Cosine of slope aspect

    Computed with the terra package from elevation

        Computed @25m resolution; next aggregated to 0.5km
    

    sinasp

    Sine of slope aspect

        Computed with the terra package from elevation
        as above
    

    slope

    Slope

        Computed with the terra package from elevation
        as above
    

    TPI

    Topographic position index

        Computed with the terra package from elevation
        as above
    

    TRI

    Terrain ruggedness index

        Computed with the terra package from elevation
        as above
    

    TWI

    Topographic wetness index

    Computed with SAGA from 500m resolution (aggregated) dem

    gedi

    Forest height

    https://glad.umd.edu/dataset/gedi

    Zone: NAFR

    xcoord

    X coordinate

    Using a mask created from the other covariates

    ycoord

    Y coordinate

        Using a mask created from the other covariates
    

    Dcoast

    Distance from coast

    Using a land mask created from the other covariates

  11. d

    Data from: Validation of COI metabarcoding primers for terrestrial...

    • dataone.org
    • datadryad.org
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vasco Elbrecht; Thomas W. A. Braukmann; Natalia V. Ivanova; Sean W. J. Prosser; Mehrdad Hajibabaei; Michael Wright; Evgeny V. Zakharov; Paul D. N. Hebert; Dirk Steinke (2025). Validation of COI metabarcoding primers for terrestrial arthropods [Dataset]. http://doi.org/10.5061/dryad.249rk92
    Explore at:
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Vasco Elbrecht; Thomas W. A. Braukmann; Natalia V. Ivanova; Sean W. J. Prosser; Mehrdad Hajibabaei; Michael Wright; Evgeny V. Zakharov; Paul D. N. Hebert; Dirk Steinke
    Time period covered
    Jan 1, 2019
    Description

    Metabarcoding can rapidly determine the species composition of bulk samples and thus aids biodiversity and ecosystem assessment. However, it is essential to use primer sets that minimize amplification bias among taxa to maximize species recovery. Despite this fact, the performance of primer sets employed for metabarcoding terrestrial arthropods has not been sufficiently evaluated. This study tests the performance of 36 primer sets on a mock community containing 374 insect species. Amplification success was assessed with gradient PCRs and the 21 most promising primer sets selected for metabarcoding. These 21 primer sets were also tested by metabarcoding a Malaise trap sample. We identified eight primer sets, mainly those including inosine and/or high degeneracy, that recovered more than 95% of the species in the mock community. Results from the Malaise trap sample were congruent with the mock community, but primer sets generating short amplicons produced potential false positives. Taxon ...

  12. Z

    Data from: Deep Reinforcement Learning Enables Better Bias Control in...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, Shan (2024). Deep Reinforcement Learning Enables Better Bias Control in Benchmark for Virtual Screening [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7861684
    Explore at:
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Wu, Song
    Zhang, Liangren
    Wang, Dongmei
    Li, Shan
    Xia, Jie
    Shen, Tao
    Wang, Simon, Xiang
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    This compressed file contains all datasets made for the validation of MUBDsyn.

    datasets_int_val: 17 cases in this folder are derived from MUBD for GPCRs. MUBDreal was made by MUBD-DecoyMaker2.0 and MUBDsyn was made by MUBD-DecoyMakersyn. datasets_ext_val_classical_VS: Five cases in this folder are derived from the shared cases of MUV and DUD-E. The active sets of MUV were taken as the input to make corresponding MUBD datasets. Files in SBVS are raw molecular docking results by smina. datasets_ext_val_SI_classical_VS: DeepCoy and TocoDecoy were used to make the datasets corresponding to the same five cases above. The data of DeepCoy was directly retrieved from DeepCoy resources at OPIG while topology decoys of TocoDecoy_9W were made based on the scripts provided at TocoDecoy GitHub Repository. Files in SBVS are raw molecular docking results by smina. datasets_ext_val_ML_VS: Ten cases in this folder are derived from NRLiSt-BDB. Corresponding MUBD datasets were made as described above. All these datasets can be used for the reproduction of validation performed in the manuscript or to benchmark various virtual screening methods.

  13. f

    Data from: Iterated Data Sharpening

    • tandf.figshare.com
    zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanxiao Chen; W. John Braun; Xiaoping Shi (2024). Iterated Data Sharpening [Dataset]. http://doi.org/10.6084/m9.figshare.25949771.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Hanxiao Chen; W. John Braun; Xiaoping Shi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data sharpening in kernel regression has been shown to be an effective method of reducing bias while having minimal effects on variance. Earlier efforts to iterate the data sharpening procedure have been less effective, due to the employment of an inappropriate sharpening transformation. In this article, an iterated data sharpening algorithm is proposed which reduces the asymptotic bias at each iteration, while having modest effects on the variance. The efficacy of the iterative approach is demonstrated theoretically and via a simulation study. Boundary effects persist and the affected region successively grows when the iteration is applied to local constant regression. By contrast, boundary bias successively decreases for each iteration step when applied to local linear regression. This study also shows that after iteration, the resulting estimates are less sensitive to bandwidth choice, and a further simulation study demonstrates that iterated data sharpening with data-driven bandwidth selection via cross-validation can lead to more accurate regression function estimation. Examples with real data are used to illustrate the scope of change made possible by using iterated data sharpening and to also identify its limitations. Supplementary materials for this article are available online.

  14. d

    Data from: Safari science: assessing the reliability of citizen science data...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cara Steger; Bilal Butt; Mevin B. Hooten (2025). Safari science: assessing the reliability of citizen science data for wildlife surveys [Dataset]. http://doi.org/10.5061/dryad.mb7qk
    Explore at:
    Dataset updated
    Apr 2, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Cara Steger; Bilal Butt; Mevin B. Hooten
    Time period covered
    Jan 1, 2018
    Description
    1. Protected areas are the cornerstone of global conservation, yet financial support for basic monitoring infrastructure is lacking in 60% of them. Citizen science holds potential to address these shortcomings in wildlife monitoring, particularly for resource-limited conservation initiatives in developing countries - if we can account for the reliability of data produced by volunteer citizen scientists (VCS) .
    2. This study tests the reliability of VCS data vs. data produced by trained ecologists, presenting a hierarchical framework for integrating diverse datasets to assess extra variability from VCS data.
    3. Our results show that, while VCS data are likely to be overdispersed for our system, the overdispersion varies widely by species. We contend that citizen science methods, within the context of East African drylands, may be more appropriate for species with large body sizes, which are relatively rare, or those that form small herds. VCS perceptions of the charisma of a species ma...
  15. G

    Credit Card Application Decisions

    • gomask.ai
    csv
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Credit Card Application Decisions [Dataset]. https://gomask.ai/marketplace/datasets/credit-card-application-decisions
    Explore at:
    csv(Unknown)Available download formats
    Dataset updated
    Jul 21, 2025
    Dataset provided by
    GoMask.ai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    age, gender, occupation, risk_score, reviewed_by, address_city, applicant_id, credit_score, address_state, annual_income, and 19 more
    Description

    This dataset contains detailed synthetic records of credit card applications, including applicant demographics, financial profiles, application outcomes, and risk assessments. It is ideal for validating credit scoring models, detecting bias, and supporting regulatory compliance or fairness analysis in financial services. The flat schema design enables seamless integration with analytics and machine learning workflows.

  16. B

    Data from: A comprehensive analysis of autocorrelation and bias in home...

    • borealisdata.ca
    • search.dataone.org
    • +1more
    Updated May 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael J. Noonan; Marlee A. Tucker; Christen H. Fleming; Tom S. Akre; Susan C. Alberts; Abdullahi H. Ali; Jeanne Altmann; Pamela C. Antunes; Jerrold L. Belant; Dean Beyer; Niels Blaum; Katrin Böhning-Gaese; Laury Cullen Jr.; Rogerio de Paula Cunha; Jasja Dekker; Jonathan Drescher-Lehman; Nina Farwig; Claudia Fichtel; Christina Fischer; Adam T. Ford; Jacob R. Goheen; René Janssen; Florian Jeltsch; Matthew Kauffman; Peter M. Kappeler; Flávia Koch; Scott LaPoint; A. Catherine Markham; Emilia Patricia Medici; Ronaldo G. Morato; Ran Nathan; Luiz Gustavo R. Oliveira-Santos; Kirk A. Olson; Bruce D. Patterson; Agustin Paviolo; Emiliano E. Ramalho; Sascha Rosner; Nuria Selva; Agnieszka Sergiel; Marina X. da Silva; Orr Spiegel; Peter Thompson; Wiebke Ullmann; Filip Zięba; Tomasz Zwijacz-Kozica; William F. Fagan; Thomas Mueller; Justin M. Calabrese (2021). Data from: A comprehensive analysis of autocorrelation and bias in home range estimation [Dataset]. http://doi.org/10.5683/SP2/OAJTAO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2021
    Dataset provided by
    Borealis
    Authors
    Michael J. Noonan; Marlee A. Tucker; Christen H. Fleming; Tom S. Akre; Susan C. Alberts; Abdullahi H. Ali; Jeanne Altmann; Pamela C. Antunes; Jerrold L. Belant; Dean Beyer; Niels Blaum; Katrin Böhning-Gaese; Laury Cullen Jr.; Rogerio de Paula Cunha; Jasja Dekker; Jonathan Drescher-Lehman; Nina Farwig; Claudia Fichtel; Christina Fischer; Adam T. Ford; Jacob R. Goheen; René Janssen; Florian Jeltsch; Matthew Kauffman; Peter M. Kappeler; Flávia Koch; Scott LaPoint; A. Catherine Markham; Emilia Patricia Medici; Ronaldo G. Morato; Ran Nathan; Luiz Gustavo R. Oliveira-Santos; Kirk A. Olson; Bruce D. Patterson; Agustin Paviolo; Emiliano E. Ramalho; Sascha Rosner; Nuria Selva; Agnieszka Sergiel; Marina X. da Silva; Orr Spiegel; Peter Thompson; Wiebke Ullmann; Filip Zięba; Tomasz Zwijacz-Kozica; William F. Fagan; Thomas Mueller; Justin M. Calabrese
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Global
    Dataset funded by
    National Science Foundation
    Description

    AbstractHome range estimation is routine practice in ecological research. While advances in animal tracking technology have increased our capacity to collect data to support home range analysis, these same advances have also resulted in increasingly autocorrelated data. Consequently, the question of which home range estimator to use on modern, highly autocorrelated tracking data remains open. This question is particularly relevant given that most estimators assume independently sampled data. Here, we provide a comprehensive evaluation of the effects of autocorrelation on home range estimation. We base our study on an extensive dataset of GPS locations from 369 individuals representing 27 species distributed across 5 continents. We first assemble a broad array of home range estimators, including Kernel Density Estimation (KDE) with four bandwidth optimizers (Gaussian reference function, autocorrelated-Gaussian reference function (AKDE), Silverman's rule of thumb, and least squares cross-validation), Minimum Convex Polygon, and Local Convex Hull methods. Notably, all of these estimators except AKDE assume independent and identically distributed (IID) data. We then employ half-sample cross-validation to objectively quantify estimator performance, and the recently introduced effective sample size for home range area estimation ($\hat{N}_\mathrm{area}$) to quantify the information content of each dataset. We found that AKDE 95\% area estimates were larger than conventional IID-based estimates by a mean factor of 2. The median number of cross-validated locations included in the holdout sets by AKDE 95\% (or 50\%) estimates was 95.3\% (or 50.1\%), confirming the larger AKDE ranges were appropriately selective at the specified quantile. Conversely, conventional estimates exhibited negative bias that increased with decreasing $\hat{N}_\mathrm{area}$. To contextualize our empirical results, we performed a detailed simulation study to tease apart how sampling frequency, sampling duration, and the focal animal's movement conspire to affect range estimates. Paralleling our empirical results, the simulation study demonstrated that AKDE was generally more accurate than conventional methods, particularly for small $\hat{N}_\mathrm{area}$. While 72\% of the 369 empirical datasets had \textgreater1000 total observations, only 4\% had an $\hat{N}_\mathrm{area}$ \textgreater1000, where 30\% had an $\hat{N}_\mathrm{area}$ \textless30. In this frequently encountered scenario of small $\hat{N}_\mathrm{area}$, AKDE was the only estimator capable of producing an accurate home range estimate on autocorrelated data. Usage notesEmpirical GPS tracking dataAnonymised, empirical tracking data used to estimate home range areas based on various home range estimators.Anonymised_Data.zip

  17. Data for "Training data composition affects performance of protein structure...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Oct 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Derry; Alexander Derry; Kristy A. Carpenter; Kristy A. Carpenter; Russ B. Altman; Russ B. Altman (2021). Data for "Training data composition affects performance of protein structure analysis algorithms" by A. Derry, K. A. Carpenter, & R. B. Altman [Dataset]. http://doi.org/10.5281/zenodo.5542201
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 1, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Derry; Alexander Derry; Kristy A. Carpenter; Kristy A. Carpenter; Russ B. Altman; Russ B. Altman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This repository contains all data used in "Training data composition affects performance of protein structure analysis algorithms", published in the Pacific Symposium on Biocomputing 2022 by A. Derry, K. A. Carpenter, & R. B. Altman.

    The data consists of the following files:

    • ema_zenodo_data.tar.gz: train, validation, and test splits for Estimation of Model Accuracy task, in LMDB format
    • design_zenodo_data.tar.gz: train, validation, and test splits for Protein Sequence Design task, in JSON format
    • enz_cat_res_zenodo_data.tar.gz: train, validation, and test splits for Catalytic Residue and Enzyme Prediction task, in TF record format

    Details on dataset construction can be found in our paper and dataloaders can be found in our Github repo.

    Reference

    A. Derry*, K. A. Carpenter*, & R. B. Altman, "Training data composition affects performance of protein structure analysis algorithms", 2021.

    Dataset References

    Datasets used were derived from the following works:

    Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K., & Moult, J. (2019). Critical assessment of methods of protein structure prediction (CASP)—Round XIII. In Proteins: Structure, Function and Bioinformatics (Vol. 87, Issue 12, pp. 1011–1020). https://doi.org/10.1002/prot.25823

    Ingraham, J., Garg, V. K., Barzilay, R., & Jaakkola, T. (2019). Generative Models for Graph-Based Protein Design. https://openreview.net/pdf?id=SJgxrLLKOE

    Furnham, N., Holliday, G. L., de Beer, T. A. P., Jacobsen, J. O. B., Pearson, W. R., & Thornton, J. M. (2014). The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes. Nucleic Acids Research, 42 (Database issue), D485–D489.

  18. d

    Data from: Evaluating citizen vs. professional data for modelling...

    • datadryad.org
    zip
    Updated Apr 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Courtney A. Tye; Robert A. McCleery; Robert J. Fletcher; Daniel U. Greene; Ryan S. Butryn (2017). Evaluating citizen vs. professional data for modelling distributions of a rare squirrel [Dataset]. http://doi.org/10.5061/dryad.8t475
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 25, 2017
    Dataset provided by
    Dryad
    Authors
    Courtney A. Tye; Robert A. McCleery; Robert J. Fletcher; Daniel U. Greene; Ryan S. Butryn
    Time period covered
    Apr 25, 2016
    Area covered
    Florida
    Description

    Squirrel locationsFox squirrel locations entered on website from public and professionals. File contains x,y coordinates and associated covariatespresencedata_final.csvSquirrel validation pointsFox squirrel occurrence data from camera trappingvalid_final.csv

  19. Data from: Data for PAN at SemEval 2019 Task 4: Hyperpartisan News Detection...

    • zenodo.org
    bin, zip
    Updated Dec 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kiesel; Johannes Kiesel; Maria Mestre; Rishabh Shukla; Emmanuel Vincent; David Corney; Payam Adineh; Benno Stein; Benno Stein; Martin Potthast; Martin Potthast; Maria Mestre; Rishabh Shukla; Emmanuel Vincent; David Corney; Payam Adineh (2021). Data for PAN at SemEval 2019 Task 4: Hyperpartisan News Detection [Dataset]. http://doi.org/10.5281/zenodo.1489920
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Dec 13, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johannes Kiesel; Johannes Kiesel; Maria Mestre; Rishabh Shukla; Emmanuel Vincent; David Corney; Payam Adineh; Benno Stein; Benno Stein; Martin Potthast; Martin Potthast; Maria Mestre; Rishabh Shukla; Emmanuel Vincent; David Corney; Payam Adineh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Training and validation data for the PAN @ SemEval 2019 Task 4: Hyperpartisan News Detection.

    The data is split into multiple files. The articles are contained in the files with names starting with "articles-" (which validate against the XML schema article.xsd). The ground-truth information is contained in the files with names starting with "ground-truth-" (which validate against the XML schema ground-truth.xsd).

    The first part of the data (filename contains "bypublisher") is labeled by the overall bias of the publisher as provided by BuzzFeed journalists or MediaBiasFactCheck.com. It contains a total of 750,000 articles, half of which (375,000) are hyperpartisan and half of which are not. Half of the articles that are hyperpartisan (187,500) are on the left side of the political spectrum, half are on the right side. This data is split into a training set (80%, 600,000 articles) and a validation set (20%, 150,000 articles), where no publisher that occurs in the training set also occurs in the validation set. Similarly, none of the publishers in those sets will occur in the test set.

    The second part of the data (filename contains "byarticle") is labeled through crowdsourcing on an article basis. The data contains only articles for which a consensus among the crowdsourcing workers existed. It contains a total of 645 articles. Of these, 238 (37%) are hyperpartisan and 407 (63%) are not, We will use a similar (but balanced!) test set. Again, none of the publishers in this set will occur in the test set.

    Note that article IDs are only unique within the parts.


    The collection (including labels) are licensed under a Creative Commons Attribution 4.0 International License.

    Acknowledgements: Thanks to Jonathan Miller for his assistance in cleaning the data!

  20. p

    Validation Condition.csv

    • psycharchives.org
    Updated Dec 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Validation Condition.csv [Dataset]. https://psycharchives.org/handle/20.500.12034/4695
    Explore at:
    Dataset updated
    Dec 14, 2021
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Data set for: P.G. Martins, A.; Köbrich, M.V.; Carstengerdes , N. & Biella, M. (submitted). All’s Well That Ends Well? Outcome Bias in Pilots During Instrument Flight Rules. Applied Cognitive Psychology. Data set for the two conditions, including the codebook: data Validation Condition

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani (2024). Data and script for "Detecting synthetic population bias using a spatially-oriented framework and independent validation data" [Dataset]. http://doi.org/10.6084/m9.figshare.24664647.v1
Organization logoOrganization logo

Data and script for "Detecting synthetic population bias using a spatially-oriented framework and independent validation data"

Explore at:
zipAvailable download formats
Dataset updated
May 15, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jessica Embury; Atsushi Nara; Sergio Rey; Ming-Hsiang Tsou; Sahar Ghanipoor Machiani
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This folder contains processed and derived data, and script for the manuscript, 'Detecting synthetic population bias using a spatially-oriented framework and independent validation data'.Abstract: Models of human mobility can be broadly applied to find solutions addressing diverse topics such as public health policy, transportation management, emergency management, and urban development. However, many mobility models require individual-level data that is limited in availability and accessibility. Synthetic populations are commonly used as the foundation for mobility models because they provide detailed individual-level data representing the different types and characteristics of people in a study area. Thorough evaluation of synthetic populations are required to detect data biases before the prejudices are transferred to subsequent applications. Although synthetic populations are commonly used for modeling mobility, they are conventionally validated by their sociodemographic characteristics, rather than mobility attributes. Mobility microdata provides an opportunity to independently/externally validate the mobility attributes of synthetic populations. This study demonstrates a spatially-oriented data validation framework and independent data validation to assess the mobility attributes of two synthetic populations at different spatial granularities. Validation using independent data (SafeGraph) and the validation framework replicated the spatial distribution of errors detected using source data (LODES) and total absolute error. Spatial clusters of error exposed the locations of underrepresented and overrepresented communities. This information can guide bias mitigation efforts to generate a more representative synthetic population.

Search
Clear search
Close search
Google apps
Main menu