Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data set used to define the representativeness heuristic in trauma triage. We performed a retrospective observational cohort study of moderate-to-severely injured patients who presented to non-trauma centers at UPMC from 2010-2014. We identified these patients using a validated algorithm to converted ICD-9 discharge codes into Abbreviated Injury Scale scores and Injury Severity Scores. We then abstracted initial encounter notes from the UPMC medical record for these patients and coded them for evidence of "representative" characteristics. We looked for differences in the presence of these characteristics by injury subgroups in between patients who were appropriately transferred to a trauma center and those who were not. We then performed a multi-variate logistic regression with random effects for hospital to identify the effect of having any representative characteristics at all on odds of transfer while adjusting for other covariates.
The City of Bloomington contracted with National Research Center, Inc. to conduct the 2017 Bloomington Community Survey. This was the first time a scientific citywide survey had been completed covering resident opinions on service delivery satisfaction by the City of Bloomington and quality of life issues. This is the opt-in companion to that scientific survey. The statistically valid and representative survey results are available at https://bloomington.data.socrata.com/dataset/Community-Survey-2017-Survey-Data/p8uv-cjhr An additional 1,435 residents completed an opt-in survey online. The data in this collection is opt-in data and is provided in the interest of transparency. It is not recommended for analysis. The statistically valid and representative survey results are available at https://bloomington.data.socrata.com/stories/s/bsc2-z6t2
The "Quantifying representativeness in RCTs using ML fairness metrics - Data and codes" is used to quantify representativeness in randomized clinical trials (RCTs) and provide insights to improve the clinical trial equity and health equity. We developed RCT representativeness metrics based on Machine Learning (ML) Fairness Research. Visualizations and statistical tests based on proposed metrics enable researchers and physicians to rapidly visualize and assess subgroup representation in RCTs. The approach enables users to determine underrepresentation, absence, or other misrepresentation of subgroups indicating potential limitations of RCTs. The method could help support generalizability evaluation of existing RCT cohorts, enrollment target decisions for new RCTs (if eligibility criteria are included), and monitoring of RCT enrollment, ultimately contributing to more equitable public health outcomes. We apply the proposed RCT representativeness metrics to three landmark clinical trials r...
This archive contains the replication files for "National Origin Identity and Descriptive Representativeness: Understanding Preferences for Asian Candidates and Representation". The data archive include de-identified survey responses for both Study 1 and Study 2, as CSV and DTA files. Cleaning scripts for both are provided for reference about cleaning decisions. Analysis scripts for the main results are provided as well.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication data and code for all tables and figures in the main text and the online appendix for: "Politicians, the Representativeness Heuristic and Decision-Making Biases"
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
This file only contains the stratum sizes of the contexts analyzed. The information is freely accessible.
This is the dataset that was used for our manuscript titled "How to Improve Representativeness and Cost-effectiveness in Samples Recruited through Meta: A Comparison of Advertisement Tools." This manuscript is accepted for publication at Plos One.
Stores information about appointed representatives used for reporting purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Errors are given for both the training + unseen classifier and the unseen classifier, for 100, 1,000 and 18,000 unseen building patches per scan.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Content
A dataset of counties that are representative for Germany with regard to
the average disposable income,
the quota of divorces,
the respective quotas of employees working in the services (excluding logistics, security, and cleaning) and the MINT sectors,
the proportions of age groups in the total proportion of the respective population, with age groups in five-year strata for the population aged between 30 and 65 and the population in the age range between 65 and 75 each considered separately for the calculation of representativeness.
In addition, data from the four big cities Berlin, München (Munich), Hamburg, and Köln (Cologne) were collected and reflected in the dataset.
The dataset is based on the most recent data available at the time of the creation of the dataset, mainly deriving from 2022, as set out in detail in the readme.md file.
Method applied
The selection of the representative counties, as reflected in the dataset, was performed on the basis of official statistics with the aim of obtaining a confidence rate of 95%. The selection was based on a principal component analysis of the statistical data available for Germany and the addition of the regions with the lowest population density and the highest and lowest per capita disposable income. A check of the representativity of the selected counties was performed.
In the case of Leipzig, the city and the district had to be treated together, in deviation from the official territorial division, with respect to a specific use case of the data.
This data set collection consists of data products described in Hoffman et. al., 2013. Resource and logistical constraints limit the frequency and extent of environmental observations, particularly in the Arctic, necessitating the development of a systematic sampling strategy to maximize coverage and objectively represent environmental variability at desired scales. A quantitative methodology for stratifying sampling domains, informing site selection, and determining the representativeness of measurement sites and networks is described here. Multivariate spatiotemporal clustering was applied to down-scaled general circulation model results and data for the State of Alaska at 4 km2 resolution to define multiple sets of ecoregions across two decadal time periods. Maps of ecoregions for the present (2000-2009) and future (2090-2099) were produced, showing how combinations of 37 characteristics are distributed and how they may shift in the future. Representative sampling locations are identified on present and future ecoregion maps. A representativeness metric was developed, and representativeness maps for eight candidate sampling locations were produced. This metric was used to characterize the environmental similarity of each site. This analysis provides model-inspired insights into optimal sampling strategies, offers a framework for up-scaling measurements, and provides a down-scaling approach for integration of models and measurements. These techniques can be applied at different spatial and temporal scales to meet the needs of individual measurement campaigns. This dataset contains one zipped file, one .txt file, and one .sh file. The Next-Generation Ecosystem Experiments: Arctic (NGEE Arctic), was a research effort to reduce uncertainty in Earth System Models by developing a predictive understanding of carbon-rich Arctic ecosystems and feedbacks to climate. NGEE Arctic was supported by the Department of Energy's Office of Biological and Environmental Research. The NGEE Arctic project had two field research sites: 1) located within the Arctic polygonal tundra coastal region on the Barrow Environmental Observatory (BEO) and the North Slope near Utqiagvik (Barrow), Alaska and 2) multiple areas on the discontinuous permafrost region of the Seward Peninsula north of Nome, Alaska. Through observations, experiments, and synthesis with existing datasets, NGEE Arctic provided an enhanced knowledge base for multi-scale modeling and contributed to improved process representation at global pan-Arctic scales within the Department of Energy's Earth system Model (the Energy Exascale Earth System Model, or E3SM), and specifically within the E3SM Land Model component (ELM).
Contains data for the Representative Payee application and selection process.
This record is for the dataset “Supplementary Dataset for "Representativeness of Eddy-Covariance Flux Footprints for Areas Surrounding AmeriFlux Sites"” at https://doi.org/10.5281/zenodo.4015350 These datasets are supplementary to the paper "Representativeness of Eddy-Covariance Flux Footprints for Areas Surrounding AmeriFlux Sites" by Chu et al. Dataset S1. Summary of site-specific footprint metrics filename: All_site_fpt_summary.csv readme: All_site_fpt_summary-README.csv Dataset S2. All monthly footprint climatology weight maps filename: monthly_footprint_climatology_weight_map.zip the zip folder contains individual files of all monthly footprint weight maps filename: _fpt_weight.tif readme: README.txt Dataset S3. All site-year footprint climatology overlapped with true-color satellite images. filename: site-year_footprint_climatology_realcolor_map.zip the zip folder contains individual files of footprint climatologies from all site-years filename: _shrink_footprint_climatology.png readme: README.txt Dataset S4. Site-specific results and representativeness index based on the land cover type analysis. filename: All_site_land_cover_dominant_summary2.csv readme:All_site_land_cover_dominant_summary2-README.csv Dataset S5. Site-specific results and representativeness index based on the EVI analysis. filename: All_site_Landsat_EVI_fpt_comparison2.csv readme: All_site_Landsat_EVI_fpt_comparison2-README.csv Dataset S6. All available site-month EVI and time-explicit representativeness. filename: All_site_Landsat_EVI_all_cutout2.csv readme: All_site_Landsat_EVI_all_cutout2-README.csv This dataset can be downloaded at https://doi.org/10.5281/zenodo.4015350
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Voters tend to be richer, more conservative, and more educated than non-voters. While many electoral reforms promise to increase political participation, these policy instruments may have multidimensional and differential effects that can increase or decrease the representativeness of turnout. We develop an approach that allows us to estimate these effects and assess the impact of postal voting on representational inequality in Swiss referendums using individual-level (N=79,000) and aggregate-level data from 1981 to 2009. We find that postal voting mobilizes equally across a wide range of political and sociodemographic groups but more strongly activates high earners, those with medium education levels, and less politically interested individuals. Yet, those who vote are not less politically knowledgeable and the effects on the composition of turnout remain limited. Our results inform research on the consequences of electoral reforms meant to increase political participation in large electorates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplemental material of the research published in HSSCOMMS (2021) by Antoine Mazieres, Telmo Menezes and Camille Roth. More info on https://mazieres.gitlab.io/gender-movies
ABSTRACT
Gender representation in mass media has long been mainly studied by qualitatively analyzing content. This article illustrates how automated computational methods may be used in this context to scale up such empirical observations and increase their resolution and significance. We specifically apply a face and gender detection algorithm on a broad set of popular movies spanning more than three decades to carry out a large-scale appraisal of the on-screen presence of women and men. Beyond the confirmation of a strong under-representation of women, we exhibit a clear temporal trend towards a fairer representativeness. We further contrast our findings with respect to movie genre, budget, and various audience-related features such as movie gross and user ratings. We lastly propose a fine description of significant asymmetries in the mise-en-scène and mise-en-cadre of characters in relation to their gender and the spatial composition of a given frame.
DATA
facialfeatures.csv Raw inferences from the face and gender detection models.
metadata.csv Movies metadata.
human_evaluation.csv Results from the human evaluation of the detection models.
model_correction.csv FFR_corrected = a + b * FFR_uncorrected
Contains data for the Representative Payee Accounting process.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for single-particle data representativeness and uncertainty study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the correlation data for species-coverage-based log representativeness measure and Trace-based Log Representativeness Approximation (TLRA) across event logs of 60 generative systems and varying log sizes and noise levels.
This data set is captured from a robot workcell that is performing activities representative of several manufacturing operations. The workcell contains two, 6-degree-of-freedom robot manipulators where one robot is performing material handling operations (e.g., transport parts into and out of a specific work space) while the other robot is performing a simulated precision operation (e.g., the robot touching the center of a part with a tool tip that leaves a mark on the part). This precision operation is intended to represent a precise manufacturing operation (e.g., welding, machining). The goal of this data set is to provide robot level and process level measurements of the workcell operating in nominal parameters. There are no known equipment or process degradations in the workcell. The material handling robot will perform pick and place operations, including moving simulated parts from an input area to in-process work fixtures. Once parts are placed in/on the work fixtures, the second robot will interact with the part in a specified precise manner. In this specific instance, the second robot has a pen mounted to its tool flange and is drawing the NIST logo on a surface of the part. When the precision operation is completed, the material handling robot will then move the completed part to an output. This suite of data includes process data and performance data, including timestamps. Timestamps are recorded at predefined state changes and events on the PLC and robot controllers, respectively. Each robot controller and the PLC have their own internal clocks and, due to hardware limitations, the timestamps recorded on each device are relative to their own internal clocks. All timestamp data collected on the PLC is available for real-time calculations and is recorded. The timestamps collected on the robots are only available as recorded data for post-processing and analysis. The timestamps collected on the PLC correspond to 14 part state changes throughout the processing of a part. Timestamps are recorded when PLC-monitored triggers are activated by internal processing (PLC trigger origin) or after the PLC receives an input from a robot controller (robot trigger origin). Records generated from PLC-originated triggers include parts entering the work cell, assignment of robot tasks, and parts leaving the work cell. PLC-originating triggers are activated by either internal algorithms or sensors which are monitored directly in the PLC Inputs/Outputs (I/O). Records generated from a robot-originated trigger include when a robot begins operating on a part, when the task operation is complete, and when the robot has physically cleared the fixture area and is ready for a new task assignment. Robot-originating triggers are activated by PLC I/O. Process data collected in the workcell are the variable pieces of process information. This includes the input location (single option in the initial configuration presented in this paper), the output location (single option in the initial configuration presented in this paper), the work fixture location, the part number counted from startup, and the part type (task number for drawing robot). Additional information on the context of the workcell operations and the captured data can be found in the attached files, which includes a README.txt, along with several noted publications. Disclaimer: Certain commercial entities, equipment, or materials may be identified or referenced in this data, or its supporting materials, in order to illustrate a point or concept. Such identification or reference is not intended to imply recommendation or endorsement by NIST; nor does it imply that the entities, materials, equipment or data are necessarily the best available for the purpose. The user assumes any and all risk arising from use of this dataset.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Demographic rates are rarely estimated over an entire species range, limiting empirical tests of ecological patterns and theories, and raising questions about the representativeness of studies that use data from a small part of a range. The uncertainty that results from using demographic rates from just a few sites is especially pervasive in population projections, which are critical for a wide range of questions in ecology and conservation. We developed a simple simulation to quantify how this lack of geographic representativeness can affect inferences about the global mean and variance of growth rates, which has implications for the robust design of a wide range of population studies. Using a coastal songbird, saltmarsh sparrow (Ammodramus caudacutus), as a case study, we first estimated survival, fecundity, and population growth rates at 21 sites distributed across much of their breeding range. We then subsampled this large, representative dataset according to five sampling scenarios in order to simulate a variety of geographic biases in study design. We found spatial variation in demographic rates, but no large systematic patterns. Estimating the global mean and variance of growth rates using subsets of the data suggested that at least 10-15 sites were required for reasonably unbiased estimates, highlighting how relying on demographic data from just a few sites can lead to biased results when extrapolating across a species range. Sampling at the full 21 sites, however, offered diminishing returns, raising the possibility that for some species accepting some geographical bias in sampling can still allow for robust range-wide inferences. The sub-sampling approach presented here, while conceptually simple, could be used with both new and existing data to encourage efficiency in the design of long-term or large-scale ecological studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data set used to define the representativeness heuristic in trauma triage. We performed a retrospective observational cohort study of moderate-to-severely injured patients who presented to non-trauma centers at UPMC from 2010-2014. We identified these patients using a validated algorithm to converted ICD-9 discharge codes into Abbreviated Injury Scale scores and Injury Severity Scores. We then abstracted initial encounter notes from the UPMC medical record for these patients and coded them for evidence of "representative" characteristics. We looked for differences in the presence of these characteristics by injury subgroups in between patients who were appropriately transferred to a trauma center and those who were not. We then performed a multi-variate logistic regression with random effects for hospital to identify the effect of having any representative characteristics at all on odds of transfer while adjusting for other covariates.