Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Objective(s): Data sharing has enormous potential to accelerate and improve the accuracy of research, strengthen collaborations, and restore trust in the clinical research enterprise. Nevertheless, there remains reluctancy to openly share raw data sets, in part due to concerns regarding research participant confidentiality and privacy. We provide an instructional video to describe a standardized de-identification framework that can be adapted and refined based on specific context and risks. Data Description: Training video, presentation slides. Related Resources: The data de-identification algorithm, dataset, and data dictionary that correspond with this training video are available through the Smart Triage sub-Dataverse. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator on this page under "collaborate with the pediatric sepsis colab."
Facebook
TwitterThis dataset contains de-identified data with an accompanying data dictionary and the R script for de-identification procedures., Objective(s): To demonstrate application of a risk based de-identification framework using the Smart Triage dataset as a clinical example. Data Description: This dataset contains the de-identified version of the Smart Triage Jinja dataset with the accompanying data dictionary and R script for de-identification procedures. Limitations: Utility of the de-identified dataset has only been evaluated with regard to use for the development of prediction models based on a need for hospital admission. Abbreviations: NA Ethics Declaration: The study was reviewed by the instituational review boards at the University of British Columbia in Canada (ID: H19-02398; H20-00484), The Makerere University School of Public Health in Uganda and the Uganda National Council for Science and Technology
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A de-identified open dataset of 507 adults, capturing how people understand their core personal values, sense of meaning, life direction, fulfilment, behavioural alignment, and emotional wellbeing in the context of digital and AI-mediated environments.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.
Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.
Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.
Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.
Methods eLAB Development and Source Code (R statistical software):
eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).
eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.
Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.
The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).
Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.
Data Dictionary (DD)
EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.
Study Cohort
This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.
Statistical Analysis
OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Metadata (data dictionary) and statistical analysis plan (including outcomes definitions for data dictionary) for the ACTORDS 20-year follow-up study. The DOI for the primary study publication is https://doi.org/10.1371/journal.pmed.1004618.Data and associated documentation for participants who have consented to future re-use of their data are available to other users under the data sharing arrangements provided by the University of Auckland’s Human Health Research Services (HHRS) platform (https://research-hub.auckland.ac.nz/subhub/human-health-research-services-platform). The data dictionary and metadata are published on the University of Auckland’s data repository Figshare, which allocates a DOI and thus makes these details searchable and available indefinitely. Researchers are able to use this information and the provided contact address (dataservices@auckland.ac.nz) to request a de-identified dataset through the HHRS Data Access Committee. Data will be shared with researchers who provide a methodologically sound proposal and have appropriate ethical approval, where necessary, to achieve the research aims in the approved proposal. Data requestors are required to sign a Data Access Agreement that includes a commitment to using the data only for the specified proposal, not to attempt to identify any individual participant, a commitment to secure storage and use of the data, and to destroy or return the data after completion of the project. The HHRS platform reserves the right to charge a fee to cover the costs of making data available, if needed, for data requests that require additional work to prepare.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains de-identified clinical data of 123 patients with sepsis admitted to the intensive care unit (ICU) between 2015 and 2018. Variables include demographic data (age, sex, BMI), disease severity scores (SOFA, APACHE II, NUTRIC), body composition measurements from abdominal CT (SMA, SMI, SMD, SAT), biomarkers (PCT), and clinical outcomes such as 90-day survival status and Charlson Comorbidity Index (CCI).Variable definitions:- Survival_Status: 1 = survived at 90 days, 2 = deceased- Sex: 1 = male, 2 = female- SMD_HU: Skeletal Muscle Density in Hounsfield Units- SMA_cm2 / SAT_cm2: area in square centimetersThis dataset supports the findings of the manuscript titled “Risk factors analysis of 90-day mortality in patients with sepsis in intensive care unit” submitted to PLOS ONE.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.
We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:
The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.
This dataset consists of five comma-separated values (.csv) files describing our inventory:
Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:
The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb
Facebook
TwitterNote: This web page provides data on health facilities only. To file a complaint against a facility, please see: https://www.cdph.ca.gov/Programs/CHCQ/LCP/Pages/FileAComplaint.aspx
Skilled Nursing Facility (SNF) testing and case data for the COVID-19 response. For details on the SNF COVID-19 data, please visit this site: https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/SNFsCOVID_19.aspx
Please note that values of less than eleven (11) are masked (shown as blank) in accordance with de-identification guidelines. This means the cumulative sum in this dataset will not match the totals from the dashboard due to data artifact from small cell size suppression.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Digitization of healthcare data along with algorithmic breakthroughts in AI will have a major impact on healthcare delivery in coming years. Its intresting to see application of AI to assist clinicians during patient treatment in a privacy preserving way. While scientific knowledge can help guide interventions, there remains a key need to quickly cut through the space of decision policies to find effective strategies to support patients during the care process.
Offline Reinforcement learning (also referred to as safe or batch reinforcement learning) is a promising sub-field of RL which provides us with a mechanism for solving real world sequential decision making problems where access to simulator is not available. Here we assume that learn a policy from fixed dataset of trajectories with further interaction with the environment(agent doesn't receive reward or punishment signal from the environment). It has shown that such an approach can leverage vast amount of existing logged data (in the form of previous interactions with the environment) and can outperform supervised learning approaches or heuristic based policies for solving real world - decision making problems. Offline RL algorithms when trained on sufficiently large and diverse offline datasets can produce close to optimal policies(ability to generalize beyond training data).
As Part of my PhD, research, I investigated the problem of developing a Clinical Decision Support System for Sepsis Management using Offline Deep Reinforcement Learning.
MIMIC-III ('Medical Information Mart for Intensive Care') is a large open-access anonymized single-center database which consists of comprehensive clinical data of 61,532 critical care admissions from 2001–2012 collected at a Boston teaching hospital. Dataset consists of 47 features (including demographics, vitals, and lab test results) on a cohort of sepsis patients who meet the sepsis-3 definition criteria.
we try to answer the following question:
Given a particular patient’s characteristics and physiological information at each time step as input, can our DeepRL approach, learn an optimal treatment policy that can prescribe the right intervention(e.g use of ventilator) to the patient each stage of the treatment process, in order to improve the final outcome(e.g patient mortality)?
we can use popular state-of-the-art algorithms such as Deep Q Learning(DQN), Double Deep Q Learning (DDQN), DDQN combined with BNC, Mixed Monte Carlo(MMC) and Persistent Advantage Learning (PAL). Using these methods we can train an RL policy to recommend optimum treatment path for a given patient.
Data acquisition, standard pre-processing and modelling details can be found here in Github repo: https://github.com/asjad99/MIMIC_RL_COACH
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Plasma samples collected from Stanford University's “Infection Recovery in SARS-CoV-2” (IRIS) study participants during acute SARS-CoV-2 infection, approximately 3 months post infection, and approximately 12 months post infection were analyzed using the Olink® Target 96 Inflammation panel and Olink® Target 96 Immune Response panel.
Proteomic data were obtained from the Olink biomarker platform and presented as "NPX" (i.e. Normalized Protein eXpression) for each protein assay. The dataset features de-identified patient metadata, including participant study number (i.e. IRIS_number), sex, long COVID status (0 = recovered; 1 = Long COVID), and timepoint of the plasma sample (i.e. acute infection sample, approximately 3 months post infection, or approximately 12 months post infection).
Notes:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundPrevious studies have assessed the incremental economic burden of treatment-resistant depression (TRD) versus non-treatment-resistant major depressive disorder (i.e., non-TRD MDD) in commercially-insured and Medicaid-insured patients, but none have focused on Medicare-insured patients.ObjectiveTo assess healthcare resource utilization (HRU) and costs of patients with TRD versus non-TRD MDD or without major depressive disorder (MDD; i.e., non-MDD) in a Medicare-insured population.MethodsAdult patients were retrospectively identified from the Chronic Condition Warehouse de-identified 100% Medicare database (01/2010-12/2016). MDD was defined as ≥1 MDD diagnosis and ≥1 claim for an antidepressant. Patients initiated on a third antidepressant following two antidepressant treatment regimens of adequate dose and duration were considered to have TRD. The index date was defined as the date of the first antidepressant claim for the TRD and non-TRD MDD cohorts, and as a randomly imputed date for the non-MDD cohort. Patients with TRD were matched 1:1 to non-TRD MDD patients and randomly selected non-MDD patients based on propensity scores. Analyses were also performed for a subset of patients aged ≥65.ResultsOf 29,543 patients with MDD, 3,225 (10.9%) met the study definition of TRD; 157,611 were included in the non-MDD cohort. Matched patients with TRD and non-TRD MDD were, on average, 58.9 and 59.0 years old, respectively. The TRD cohort had higher per-patient-per-year (PPPY) HRU than the non-TRD MDD (e.g., inpatient visits: incidence rate ratio [IRR] = 1.36) and non-MDD cohorts (e.g., inpatient visits: IRR = 1.84, all P
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Introduction As mobile robots proliferate in communities, designers must consider the impacts these systems have on the users, onlookers, and places they encounter. It becomes increasingly necessary to study situations where humans and robots coexist in common spaces, even if they are not directly interacting. This dataset presents a multidisciplinary approach to study human-robot encounters in an indoor apartment-like setting between participants and two mobile robots. Participants take questionnaires, wear sensors for physiological measures, and take part in a focus group after experiments finish. This dataset contains raw time series data from sensors and robots, and qualitative results from focus groups. The data can be used to analyze measures of human physiological response to varied encounter conditions, and to gain insights into human preferences and comfort during community encounters with mobile robots. Dataset Contents A dictionary of terms found in the dataset can be found in the "Data-Dictionary.pdf" Synchronized XDF files from every trial with raw data from electrodermal activity (EDA), electrocardiography (ECG), photoplethysmography (PPG) and seismocardiography (SCG). These synchronized files also contain robot pose data and microphone data. Results from analysis of two important features found from heart rate variability (HRV) and EDA. Specifically, HRV_CMSEn and nsEDRfreq is computed for each participant over each trial. These results also include Robot Confidence, which is a classification score representing the confidence that the 80 physiological features considered originate from a subject in a robot encounter. The higher the score, the higher the confidence A vectormap of the environment used during testing ("AHG_vectormap.txt") and a csv with locations of participant seating within the map ("Participant-Seating-Coordinates.csv"). Each line of the vectormap represents two endpoints of a line: x1,y1,x2,y2. The coordinates of participant seating are x,y positions and rotation about the vertical axis in radians. Anonymized videos captured using two static cameras placed in the environment. They are located in the living room and small room, respectively. Animations visualized from XDF files that show participant location, robot behaviors and additional characteristics like participant-robot line-of-sight and relative audio volume. Quotes associated with themes taken from focus group data. These quotes demonstrate and justify the results of the thematic analysis. Raw text from focus groups is not included for privacy concerns. Quantitative results from focus groups associated with factors influencing perceived safety. These results demonstrate the findings from deductive content analysis. The deductive codebook is also included. Results from pre-experiment and between-trial questionnaires Copies of both questionnaires and the semi-structured focus group protocol. Human Subjects This dataset contain de-identified information for 24 total subjects over 13 experiment sessions. The population for the study is the students, faculty and staff at the University of Texas at Austin. Of the 24 participants, 18 are students and 6 are staff at the university. Ages range from 19-48 and there are 10 males and 14 females who participated. Published data has been de-identified in coordination with the university Internal Review Board. All participants signed informed consent to participate in the study and for the distribution of this data. Access Restrictions Transcripts from focus groups are not published due to privacy concerns. Videos including participants are de-identified with overlays on videos. All other data is labeled only by participant ID, which is not associated with any identifying characteristics. Experiment Design Robots This study considers indoor encounters with two quadruped mobile robots. Namely, the Boston Dynamics Spot and Unitree Go1. These mobile robots are capable of everyday movement tasks like inspection, search or mapping which may be common tasks for autonomous agents in university communities. The study focus on perceived safety of bystanders under encounters with these relevant platforms. Control Conditions and Experiment Session Layout We control three variables in this study: Participant seating social (together in the living room) v. isolated (one in living room, other in small room) Robots Together v. Separate Robot Navigation v. Search Behavior A visual representation of the three control variables are shown on the left in (a)-(d) including the robot behaviors and participant seating locations, shown as X's. Blue represent social seating and yellow represent isolated seating. (a) shows the single robot navigation path. (b) is the two robot navigation paths. In (c) is the single robot search path and (d) shows the two robot search paths. The order of behaviors and seating locations are randomized and then inserted into the experiment session as...
Facebook
TwitterImages for segmentation of optical coherence tomography images with diabetic macular edema.
S. J. Chiu, M. J. Allingham, P. S. Mettu, S. W. Cousins, J. A. Izatt, S. Farsiu, "Kernel regression based segmentation of optical coherence tomography images with diabetic macular edema", ( BIOMEDICAL OPTICS EXPRESS), 6(4), pp. 1172-1194, April, 2015
To learn our DME classifier, we obtained training data separate from our validation data set. We used the Duke Enterprise Data Unified Content Explorer search engine to retrospectively identify patients within the Duke Eye Center Medical Retina practice with a billing code for DME (ICD-9 362.07) associated with their visit. An ophthalmologist then identified six patients imaged in clinic using the standard Spectralis (Heidelberg Engineering, Heidelberg, Germany) 61-line volume scan protocol with severe DME pathology and varying image quality. Averaging of the B-scans was determined by the photographer, and ranged from 9 to 21 raw images per averaged B-scan. The volumetric scans were Q = 61 B-scans × N = 768 231093 - $15.00 USD Received 19 Dec 2014; revised 25 Feb 2015; accepted 27 Feb 2015; published 9 Mar 2015 (C) 2015 OSA 1 Apr 2015 | Vol. 6, No. 4 | DOI:10.1364/BOE.6.001172 | BIOMEDICAL OPTICS EXPRESS 1181 A-scans with an axial resolution 3.87 ir = µm/pixel, lateral resolution ( )j r ranging from 11.07 – 11.59 µm/pixel, and azimuthal resolution ( ) kr ranging from 118 – 128 µm/pixel.
To generate the target classes for classifier training, we manually segmented fluid-filled regions and semi-automatically segmented all eight retinal layer boundaries following the definitions in Fig. 3. This was done for 12 B-scans within the training data set (two from each volume). The B-scans selected consisted of six images near the fovea (B-scan 31 for all volumes) and six peripheral images (B-scans 1, 6, 11, 16, 21, and 26, one for each of the six volumes). We then used the manual segmentations to assign the true class for each pixel, with a total of eight possible classes defined in Table 1 and the classified result shown in Fig. 4(a).
We obtained our validation data set by identifying ten patients with DME that were not included in the training data set. The method for selecting these data sets is described in Section 5.1, with the difference that the images had to be of adequate quality (i.e. layer and fluid boundaries needed to be visible). The image acquisition parameters were consistent with the training data set, and lateral and azimuthal resolutions ranged from 10.94 – 11.98 µm/pixel and 118 – 128 µm/pixel, respectively. We made the entire data set available online, including the training and validation data sets and their corresponding automatic and manual segmentation results. This data set can be found at www.duke.edu/~sf59/Chiu_BOE_2014_dataset.htm.
http://people.duke.edu/~sf59/Chiu_BOE_2014_dataset.htm
S. J. Chiu, M. J. Allingham, P. S. Mettu, S. W. Cousins, J. A. Izatt, S. Farsiu, "Kernel regression based segmentation of optical coherence tomography images with diabetic macular edema", ( BIOMEDICAL OPTICS EXPRESS), 6(4), pp. 1172-1194, April, 2015
Please reference the above paper if you would like to use any part of this datasets.
“All images included in this website have been fully de-identified. Any dates associated with the imaging files do not relate to the subject or to date of image acquisition. Images are intended for use in research and educational settings. Commercialization/redistribution of the images is prohibited. In the unlikely event that you identify any remaining identifiers in the images, you are prohibited from further disclosure and should destroy all copies of the image and immediately notify the owner of this webpage at: sina.farsiu@duke.edu All use of the images should include citation and credit to this paper.”
Please contact Dr. Stephanie Chiu , who published this paper under the supervision of Prof. Sina Farsiu if you have questions about the dataset.
Banner Image by Harry Quan on Unsplash
Facebook
TwitterAdolescent girls and young women (AGYW) have a disproportionately high incidence of HIV compared to males of the same age in Uganda. AGYW are a priority sub-group for daily oral Pre-Exposure Prophylaxis (PrEP), but their adherence has consistently remained low. Short Message Service (SMS) reminders could improve adherence to PrEP in AGYW. However, there is a paucity of literature about the acceptability of SMS reminders among AGYW using PrEP. We assessed the level of acceptability of SMS reminders as a PrEP adherence support tool and the associated factors, among AGYW in Mukono district, Central Uganda. We consecutively enrolled AGYW using PrEP in Mukono district in a cross-sectional study. A structured pre-tested questionnaire was administered to participants by three trained research assistants. Data were analyzed in STATA 17.0; continuous variables were summarized using median and interquartile range (IQR) while categorical variables were summarized using frequencies and percentages...., The data set was collected through a reseacher administered questionnaire. The main dependent variable was acceptability of SMS reminders. This was measured using the seven constructs derived from the Theoretical Framework of Acceptability (TFA)(1). These include; affective attitude, burden, perceived effectiveness, ethicality, intervention coherence, opportunity costs, and self-efficacy. A 5-point Likert item question per construct was used and each level of a Likert scale was given a weight ranging from one to five. The summated scores from the weights assigned to each response were computed. The obtained summated acceptability score was then dichotomized using the 50th percentile of the possible summated scores which ranges from 7 to 35 (the 50th percentile is 21). Therefore “Acceptability of SMS reminders" was defined as a value greater than 21. The independent variables were captured as described in the data dictionary attached Data analysis was performed in STATA versi..., , The participants gave written informed consent to publish de-identified data in accordance with Uganda National Cuncil for Science and Technology (UNCST), a local human participant research regulator. The identifying characteristics like numerical age, physical address were reducted., # Acceptability of short message service reminders as the support tool for PrEP adherence among young women in Mukono district, Uganda
Dataset DOI: 10.5061/dryad.cvdncjt8h
In this dataset, we aimed to assess the acceptability of short message service (SMS) reminders among Adolescent Girls and Young Women (AGYW) prescribed Pre-Exposure Prophylaxis (PrEP). We also measured demographic and other individual factorsÂ
File: Manuscript_dataset.dta
Description:Â This section describes the variables included in the dataset (data dictionary)
| Variable Name | Variable type | Variable Label | Value Labels | | :------------------- | :------------ | :---------------------------------...
Facebook
TwitterObjective: Case-control study designs are commonly used in retrospective analyses of Real-World Evidence (RWE). Due to the increasingly wide availability of RWE, it can be difficult to determine whether findings are robust or the result of testing multiple hypotheses. Materials and Methods: We investigate the potential effects of modifying cohort definitions in a case-control association study between depression and Type 2 Diabetes Mellitus (T2D). We used a large (>75 million individuals) de-identified administrative claims database to observe the effects of minor changes to the requirements of glucose and hemoglobin A1c tests in the control group. Results: We found that small permutations to the criteria used to define the control population result in significant shifts in both the demographic structure of the identified cohort as well as the odds ratio of association. These differences remain present when testing against age and sex-matched controls. Discussion: Analyses of RWE need to be carefully designed to avoid issues of multiple testing. Minor changes to control cohorts can lead to significantly different results and have the potential to alter even prospective studies through selection bias. Conclusion: We believe this work offers strong support for the need for robust guidelines, best practices, and regulations around the use of observational RWE for clinical or regulatory decision making.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1. List of definitions: the Excel file “List of definitions_data protection.xlsx” includes legal definitions for the terms “personal data”, “anonymized”, “de-identified”, “pseudonymized” and “encrypted”, as provided by the participating ICN countries/regions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These three datasets contain de-identified data on testing for pests in imports of horticultural products into Australia in a period within 2021-2023. The creator of this data page is distributing the data with the permission of the data owner (emails 14/6/2024, 25/6/2024, 1/7/2024).
Dataset anonymized_hort_aggdat_01-07-2024.csv
This dataset (anonymized_hort_aggdat_01-07-2024.csv) has one row for each line of fruit or vegetables tested. Consignments of fruits or vegetables are divided into lines (details may depend on the type of fruit or vegetable). 600 units are sampled from each line, where a unit is usually a single fruit or vegetable (rounding may occur, for example if fruit are grouped into punnets). A result is then obtained from each line ("inspection result"). If the result is not Pass, then fumigation or other actions may be taken. The columns of the data are:
| Variable Name | Values | Definition |
| entry | ANONYMIZED_VALUE1, ANONYMIZED_VALUE2, etc | anonymised identifier of the consignment |
| volume | numeric | volume of the line |
| volume_unit |
KG – kilograms | units in which volume is measured (almost always kg) |
| arrival_date | date | |
| importer_name | ANONYMIZED_VALUE_1, ANONYMIZED_VALUE2 etc | anonymised identifer of the importer |
| supplier_name | ANONYMIZED_VALUE_1, ANONYMIZED_VALUE2 etc | anonymised identifer of the supplier |
| cargo_type | the freight type of the consignment (e.g., FCL and FCX are container types via sea and AIR is air freight) | |
| port | character valued code | destination port of the consignment/entry |
| country | ANONYMIZED_VALUE_1, ANONYMIZED_VALUE2 etc | anonymised country of origin |
| finalise_type | whether the line was released as normal, from biosecurity control, disposed of, destroyed or exported | |
| document_failure | Pass, Fail | whether a failure was recorded against a line at onshore document verification. Note: A fail then followed by a pass and goods moving to inspection, will display fail. |
| inspection_result | Pass, Fail | whether a failure was recorded against a line at onshore verification inspection. Note: A fail then followed by a pass and goods being released, will display fail. Lines that qualified for the Compliance-Based Intervention Scheme (CBIS) may not have been inspected as a result. See here for more information about CBIS. |
| fumigated | Not fumigated, Fumigated | Whether line was fumigated |
| other_treatment | character | other remedial treatment applied to the line/entry (reconditioning for seeds) |
| cbis_commodity |
Fresh CBIS, Other |
"Fresh CBIS" means that the line qualified for the Compliance-Based Intervention Scheme (CBIS) and may not have been inspected as a result. "Other" means that the line did not qualify for CBIS. See here for more information about CBIS. |
| actionable | Where the department's Science Services Group have determined that detected biosecurity risk material requires remedial action to mitigate biosecurity risk. Note: Seeds are only actioned if a high risk weed seed is detected or were 3 or more species of biosecurity concern are identified. | |
| commodity | character | Commodity description |
| rcd_nbr | 1, 2, 3 etc | anonymised identifier of line |
Dataset anonymized_hort_pests_01-07-2024.csv
This dataset contains a row for when there is a pest detection. Note that not all pest detections require action. It may be linked to anonymized_hort_aggdat_01-07-2024.csv using rcd_nbr as a key. The columns of the data are:
| Variable Name | Values | Definition |
| rcd_nbr | 1, 2, 3 etc | anonymised identifier of line |
| bottle_number | numeric | identifier for a particular pest for a particular line |
| pest_type | Disease, Invertebrate, Plant, Seed, Vertebrate, Na, blank | type of potential pest |
Dataset anonymized_hort_seeds_incidents_01-07-2024.csv
This dataset contains a row for seeds detections. Note that not all seed detections require action. It may be linked to anonymized_hort_aggdat_01-07-2024.csv by rcd_nbr as a key and to anonymized_hort_pests_01-07-2024.csv using bottle_number as a key. The columns of the data are:
| Variable Name | Values | Definition |
| rcd_nbr | 1, 2, 3 etc | anonymised identifier of line |
| bottle_number | numeric | identifier for a particular pest for a particular line |
| pest_type | Disease, Invertebrate, Plant, Seed, Vertebrate, Na, blank | type of potential pest (always equal to Seed in this spreadsheet) |
| comments | text field | comments |
| other_treatment | Reconditioned, or blank | other treatments applied |
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Please Note: As announced by the Minister for Immigration and Border Protection on 25 June 2017, the Department of Immigration and Border Protection (DIBP) retired the paper-based Outgoing Passenger Cards (OPC) from 1 July 2017. The information previously gathered via paper-based outgoing passenger cards is now be collated from existing government data and will continue to be provided to users. Further information can be accessed here: http://www.minister.border.gov.au/peterdutton/Pages/removal-of-the-outgoing-passenger-card-jun17.aspx.
Due to the retirement of the OPC, the Australian Bureau of Statistics (ABS) undertook a review of the OAD data based on a new methodology. Further information on this revised methodology is available at: http://www.abs.gov.au/AUSSTATS/abs@.nsf/Previousproducts/3401.0Appendix2Jul%202017?opendocument&tabname=Notes&prodno=3401.0&issue=Jul%202017&num=&view=
A sampling methodology has been applied to this dataset. This method means that data will not replicate, exactly, data released by the ABS, but the differences should be negligible.
Due to ‘Return to Source’ limitations, data supplied to ABS from non-DIPB sources are also excluded.
Overseas Arrivals and Departures (OAD) data refers to the arrival and departure of Australian residents or overseas visitors, through Australian airports and sea ports, which have been recorded on incoming or outgoing passenger cards. OAD data describes the number of movements of travellers rather than the number of travellers. That is, multiple movements of individual persons during a given reference period are all counted. OAD data will differ from data derived from other sources, such as Migration Program Outcomes, Settlement Database or Visa Grant information. Travellers granted a visa in one year may not arrive until the following year, or may not travel to Australia at all. Some visas permit multiple entries to Australia, so travellers may enter Australia more than once on a visa. Settler Arrivals includes New Zealand citizens and other non-program settlers not included on the Settlement Database. The Settlement Database includes onshore processed grants not included in Settler Arrivals.
These de-identified statistics are periodically checked for privacy and other compliance requirements. The statistics were temporarily removed in March 2024 in response to a question about privacy within the emerging technological environment. Following a thorough review and risk assessment, the Department of Home Affairs has republished the dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coups d'État are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d'État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e. realized or successful coups, unrealized coup attempts, or thwarted conspiracies) the type of actor(s) who initiated the coup (i.e. military, rebels, etc.), as well as the fate of the deposed leader. This current version, Version 2.1.2, adds 6 additional coup events that occurred in 2022 and updates the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrects a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixes this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removes two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and adds executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Changes from the previously released data (v2.0.0) also include: 1. Adding additional events and expanding the period covered to 1945-2022 2. Filling in missing actor information 3. Filling in missing information on the outcomes for the incumbent executive 4. Dropping events that were incorrectly coded as coup events
Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.2 Codebook.pdf - This 16-page document provides a description of the Cline Center Coup d’État Project Dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d’État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2023 2. Coup Data v2.1.2.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 981 observations. Revised February 2023 3. Source Document v2.1.2.pdf - This 315-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2023 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2023
Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2023. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.2. February 23. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V6 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2023. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.2. February 23. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V6
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Objective(s): Data sharing has enormous potential to accelerate and improve the accuracy of research, strengthen collaborations, and restore trust in the clinical research enterprise. Nevertheless, there remains reluctancy to openly share raw data sets, in part due to concerns regarding research participant confidentiality and privacy. We provide an instructional video to describe a standardized de-identification framework that can be adapted and refined based on specific context and risks. Data Description: Training video, presentation slides. Related Resources: The data de-identification algorithm, dataset, and data dictionary that correspond with this training video are available through the Smart Triage sub-Dataverse. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator on this page under "collaborate with the pediatric sepsis colab."