17 datasets found

D
Data De-identification and Pseudonymity Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data De-identification and Pseudonymity Software Report [Dataset]. https://www.marketresearchforecast.com/reports/data-de-identification-and-pseudonymity-software-30730
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Mar 9, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data De-identification and Pseudonymization Software market is experiencing robust growth, projected to reach $1941.6 million in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 7.3%. This expansion is driven by increasing regulatory compliance needs (like GDPR and CCPA), heightened concerns regarding data privacy and security breaches, and the burgeoning adoption of cloud-based solutions. The market is segmented by deployment (cloud-based and on-premises) and application (large enterprises and SMEs). Cloud-based solutions are gaining significant traction due to their scalability, cost-effectiveness, and ease of implementation, while large enterprises dominate the application segment due to their greater need for robust data protection strategies and larger budgets. Key market players include established tech giants like IBM and Informatica, alongside specialized providers such as Very Good Security and Anonomatic, indicating a dynamic competitive landscape with both established and emerging players vying for market share. Geographic expansion is also a key driver, with North America currently holding a significant market share, followed by Europe and Asia Pacific. The forecast period (2025-2033) anticipates continued growth fueled by advancements in artificial intelligence and machine learning for enhanced de-identification techniques, and the increasing demand for data anonymization across various sectors like healthcare, finance, and government. The restraining factors, while present, are not expected to significantly hinder the market’s overall growth trajectory. These limitations might include the complexity of implementing robust de-identification solutions, the potential for re-identification risks despite advanced techniques, and the ongoing evolution of privacy regulations necessitating continuous adaptation of software capabilities. However, ongoing innovation and technological advancements are anticipated to mitigate these challenges. The continuous development of more sophisticated algorithms and solutions addresses re-identification vulnerabilities, while proactive industry collaboration and regulatory guidance aim to streamline implementation processes, ultimately fostering continued market expansion. The increasing adoption of data anonymization across diverse sectors, coupled with the expanding global digital landscape and related data protection needs, suggests a positive outlook for sustained market growth throughout the forecast period.
c
Data in Support of the MIDI-B Challenge (MIDI-B-Synthetic-Validation,...
cancerimagingarchive.net
csv, dicom, n/a +1
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2025). Data in Support of the MIDI-B Challenge (MIDI-B-Synthetic-Validation, MIDI-B-Curated-Validation, MIDI-B-Synthetic-Test, MIDI-B-Curated-Test) [Dataset]. http://doi.org/10.7937/cf2p-aw56
Explore at:
sqlite and zip, dicom, csv, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/cf2p-aw56
Dataset updated
May 2, 2025
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 2, 2025
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Abstract
These resources comprise a large and diverse collection of multi-site, multi-modality, and multi-cancer clinical DICOM images from 538 subjects infused with synthetic PHI/PII in areas encountered by TCIA curation teams. Also provided is a TCIA-curated version of the synthetic dataset, along with mapping files for mapping identifiers between the two.
This new MIDI data resource includes DICOM datasets used in the Medical Image De-Identification Benchmark (MIDI-B) challenge at MICCAI 2024. They are accompanied by ground truth answer keys and a validation script for evaluating the effectiveness of medical image de-identification workflows. The validation script systematically assesses de-identified data against an answer key outlining appropriate actions and values for proper de-identification of medical images, promoting safer and more consistent medical image sharing.
Introduction
Medical imaging research increasingly relies on large-scale data sharing. However, reliable de-identification of DICOM images still presents significant challenges due to the wide variety of DICOM header elements and pixel data where identifiable information may be embedded. To address this, we have developed an openly accessible synthetic dataset containing artificially generated protected health information (PHI) and personally identifiable information (PII).
These resources complement our earlier work (Pseudo-PHI-DICOM-data ) hosted on The Cancer Imaging Archive. As an example of its use, we also provide a version curated by The Cancer Imaging Archive (TCIA) curation team. This resource builds upon best practices emphasized by the MIDI Task Group who underscore the importance of transparency, documentation, and reproducibility in de-identification workflows, part of the themes at recent conferences (Synapse:syn53065760) and workshops (2024 MIDI-B Challenge Workshop).
This framework enables objective benchmarking of de-identification performance, promotes transparency in compliance with regulatory standards, and supports the establishment of consistent best practices for sharing clinical imaging data. We encourage the research community to use these resources to enhance and standardize their medical image de-identification workflows.
Methods
Subject Inclusion and Exclusion Criteria
The source data were selected from imaging already hosted in de-identified form on TCIA. Imaging containing faces were excluded, and no new human studies were performed for his project.
Data Acquisition
To build the synthetic dataset, image series were selected from TCIA’s curated datasets to represent a broad range of imaging modalities (CR, CT, DX, MG, MR, PT, SR, US) , manufacturers including (GE, Siemens, Varian , Confirma, Agfa, Eigen, Elekta, Hologic, KONICA MINOLTA, others) , scan parameters, and regions of the body. These were processed to inject the synthetic PHI/PII as described.
Data Analysis
Synthetic pools of PHI, like subject and scanning institution information, were generated using the Python package Faker (https://pypi.org/project/Faker/8.10.3/). These were inserted into DICOM metadata of selected imaging files using a system of inheritable rule-based templates outlining re-identification functions for data insertion and logging for answer key creation. Text was also burned-in to the pixel data of a number of images. By systematically embedding realistic synthetic PHI into image headers and pixel data, accompanied by a detailed ground-truth answer key, our framework enables users transparency, documentation, and reproducibility in de-identification practices, aligned with the HIPAA Safe Harbor method, DICOM PS3.15 Confidentiality Profiles, and TCIA best practices.
Usage Notes
This DICOM collection is split into two datasets, synthetic and curated. The synthetic dataset is the PHI/PII infused DICOM collection accompanied by a validation script and answer keys for testing, refining and benchmarking medical image de-identification pipelines. The curated dataset is a version of the synthetic dataset curated and de-identified by members of The Cancer Imaging Archive curation team. It can be used as a guide, an example of medical image curation best practices. For the purposes of the De-Identification challenge at MICCAI 2024, the synthetic and curated datasets each contain two subsets, a portion for Validation and the other for Testing.
To link a curated dataset to the original synthetic dataset and answer keys, a mapping between the unique identifiers (UIDs) and patient IDs must be provided in CSV format to the evaluation software. We include the mapping files associated with the TCIA-curated set as an example. Lastly, for both the Validation and Testing datasets, an answer key in sqlite.db format is provided. These components are for use with the Python validation script linked below (4). Combining these components, a user developing or evaluating de-identification methods can ensure they meet a specification for successfully de-identifying medical image data.
s
i2b2 Research Data Warehouse
scicrunch.org
rrid.site
Updated Oct 17, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). i2b2 Research Data Warehouse [Dataset]. http://identifiers.org/RRID:SCR_013276
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013276
Dataset updated
Oct 17, 2019
Description
A data warehouse that integrates information on patients from multiple sources and consists of patient information from all the visits to Cincinnati Children''''s between 2003 and 2007. This information includes demographics (age, gender, race), diagnoses (ICD-9), procedures, medications and lab results. They have included extracts from Epic, DocSite, and the new Cerner laboratory system and will eventually load public data sources, data from the different divisions or research cores (such as images or genetic data), as well as the research databases from individual groups or investigators. This information is aggregated, cleaned and de-identified. Once this process is complete, it is presented to the user, who will then be able to query the data. The warehouse is best suited for tasks like cohort identification, hypothesis generation and retrospective data analysis. Automated software tools will facilitate some of these functions, while others will require more of a manual process. The initial software tools will be focused around cohort identification. They have developed a set of web-based tools that allow the user to query the warehouse after logging in. The only people able to see your data are those to whom you grant authorization. If the information can be provided to the general research community, they will add it to the warehouse. If it cannot, they will mark it so that only you (or others in your group with proper approval) can access it.
o
Data from: Medical data formatting to improve physician interpretation speed...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Peterson (2022). Medical data formatting to improve physician interpretation speed in the military healthcare system [Dataset]. http://doi.org/10.5061/dryad.mkkwh712w
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.mkkwh712w
Dataset updated
Jun 13, 2022
Authors
Jacob Peterson
Description
Study Design: De-identified chemistry and hematology results were presented to participants using the two data formats (tabular and fishbone diagram) along with questionnaires requesting the identification of individual values and trends. Participants completed the two questionnaires in a balanced crossover experiment. After completing both questionnaires participants were asked to complete a 3-question survey rating perceived ease of use and indicating an overall preference for one of the data formats. Participants: A total of 35 participants were recruited at a daily internal medicine residency didactic session. Participants were asked to abstain if they were unfamiliar with either data format. Patient Cases: Each laboratory data format was applied to a pair of basic metabolic panels (BMP) and a pair of complete blood counts (CBC) labeled as being from sequential days (one CBC and BMP for each day). The laboratory data were identical in quantity and type of information but individual result values used for each data format differed. Procedure: Before the study, every participant was informed about the project and confirmed familiarity with both data formats. Participants were each given both questionnaires (one for each data format) and a survey with the lab data hidden by a cover sheet. Participants were informed they would have 60 seconds to answer as many questions as possible about the data set provided and then would answer a set of questions about a set of data. The questions were designed so that each questionnaire requested identical cognitive tasks in the same order. For example, question three asked to identify a trend on both questionnaires but one questionnaire asked about anemia, the other about renal dysfunction. The study materials were distributed randomly but were prepared such that 50% of participants had the questionnaire with data formatted using a table as the first questionnaire. The remaining 50% started the questionnaire with data formatted using fishbone diagrams. Participants completed the two questionnaires in the assigned order and then completed a three-question survey. Outcome Measures: Responses were graded manually with incorrect or partially correct answers both counted as erroneous interpretations. Omitted questions, which were rare, were not considered to have undergone interpretation and were counted neither towards total interpretations nor as erroneous. For each questionnaire, the number of questions answered and the number of errors committed were recorded. For the survey results, the ratings for ease of use (1-5 on a Likert scale with 5 being easy) were recorded for each data format. The data format preference of each participant was also recorded. Objective: The purpose of this project was to improve the ease and speed of physician comprehension when interpreting daily laboratory data for patients admitted within the Military Healthcare System (MHS). Materials and Methods: A JavaScript program was created to convert the laboratory data obtained via the outpatient electronic medical record (EMR) into a “fishbone diagram” format that is familiar to most physicians. Using a balanced crossover design, 35 internal medicine trainees and staff were asked to complete timed comprehension tests for laboratory data sets formatted in the outpatient EMR’s format and in fishbone diagram format. The number of responses per second and error rate per response were measured for each format. Participants were asked to rate relative ease of use for each format and indicate which format they preferred. Results: Comprehension speed increased 37% (6.28 seconds per interpretation) with the fishbone diagram format with no observed increase in errors. Using a Likert scale of 1 to 5 (1 being hard, 5 easy), participants indicated the new format was easier to use (4.14 for fishbone vs 2.14 for table) with 89% expressing a preference for the new format. Discussion: The publically available web application that converts tabular lab data to fishbone diagram format is currently used 10,000-12,000 times per month across the MHS, delivering significant benefit to the enterprise in terms of time saved and improved physician experience. Conclusions: This study supports the use of fishbone diagram formatting for laboratory data for inpatients within the MHS. Microsoft Excel or similar spreadsheet software.
Properties of healthcare teaming networks as a function of network...
plos.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin S. Zand; Melissa Trayhan; Samir A. Farooq; Christopher Fucile; Gourab Ghoshal; Robert J. White; Caroline M. Quill; Alexander Rosenberg; Hugo Serrano Barbosa; Kristen Bush; Hassan Chafi; Timothy Boudreau (2023). Properties of healthcare teaming networks as a function of network construction algorithms [Dataset]. http://doi.org/10.1371/journal.pone.0175876
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0175876
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Martin S. Zand; Melissa Trayhan; Samir A. Farooq; Christopher Fucile; Gourab Ghoshal; Robert J. White; Caroline M. Quill; Alexander Rosenberg; Hugo Serrano Barbosa; Kristen Bush; Hassan Chafi; Timothy Boudreau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Network models of healthcare systems can be used to examine how providers collaborate, communicate, refer patients to each other, and to map how patients traverse the network of providers. Most healthcare service network models have been constructed from patient claims data, using billing claims to link a patient with a specific provider in time. The data sets can be quite large (106–108 individual claims per year), making standard methods for network construction computationally challenging and thus requiring the use of alternate construction algorithms. While these alternate methods have seen increasing use in generating healthcare networks, there is little to no literature comparing the differences in the structural properties of the generated networks, which as we demonstrate, can be dramatically different. To address this issue, we compared the properties of healthcare networks constructed using different algorithms from 2013 Medicare Part B outpatient claims data. Three different algorithms were compared: binning, sliding frame, and trace-route. Unipartite networks linking either providers or healthcare organizations by shared patients were built using each method. We find that each algorithm produced networks with substantially different topological properties, as reflected by numbers of edges, network density, assortativity, clustering coefficients and other structural measures. Provider networks adhered to a power law, while organization networks were best fit by a power law with exponential cutoff. Censoring networks to exclude edges with less than 11 shared patients, a common de-identification practice for healthcare network data, markedly reduced edge numbers and network density, and greatly altered measures of vertex prominence such as the betweenness centrality. Data analysis identified patterns in the distance patients travel between network providers, and a striking set of teaming relationships between providers in the Northeast United States and Florida, likely due to seasonal residence patterns of Medicare beneficiaries. We conclude that the choice of network construction algorithm is critical for healthcare network analysis, and discuss the implications of our findings for selecting the algorithm best suited to the type of analysis to be performed.
d
Teaching undergraduates with quantitative data in the social sciences at...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Renata GonÃ§alves Curty; Rebecca Greer; Torin White (2024). Teaching undergraduates with quantitative data in the social sciences at University of California Santa Barbara [Dataset]. http://doi.org/10.25349/D9402J
Explore at:
Unique identifier
https://doi.org/10.25349/D9402J
Dataset updated
Jun 14, 2024
Dataset provided by
Dryad Digital Repository
Authors
Renata GonÃ§alves Curty; Rebecca Greer; Torin White
Time period covered
Apr 15, 2022
Description
The interview data was gathered for a project that investigatedÂ the practices of instructors who use quantitative data to teach undergraduate courses within the Social Sciences. The study was undertaken by employees of the University of California, Santa Barbara (UCSB) Library, who participated in this research project with 19 other colleges and universities across the U.S. under the direction of Ithaka S+R. Ithaka S+R is a New York-based research organization, which, among other goals, seeks to develop strategies, services, and products to meet evolving academic trends to support faculty and students.

The field of Social Sciences has been notoriously known for valuing the contextual component of data and increasingly entertaining more quantitative and computational approaches to research in response to the prevalence of data literacy skills needed to navigate both personal and professional contexts. Thus, this study becomes particularly timely to identify current instructorsâ€™ practi..., The project followed a qualitative and exploratory approach to understand current practices of faculty teaching with data. The study was IRB approved and was exempt by the UCSBâ€™s Office of Research in July 2020 (Protocol 1-20-0491).Â

The identification and recruitment of potential participants took into account the selection criteria pre-established by Ithaka S+R: a) instructors of courses within the Social Sciences, considering the field as broadly defined, and making the best judgment in cases the discipline intersects with other fields; b) instructors who teach undergraduate courses or courses where most of the students are at the undergraduate level; c) instructors of any rank, including adjuncts and graduate students; as long as they were listed as instructors of record of the selected courses; d) instructors who teach courses were students engage with quantitative/computational data.Â

The sampling process followed a combination of strategies to more easily identify instructo..., The data folder contains 10Â pdf files with de-identified transcriptions of the interviews and the pdf files with the recruitment email and the interview guide.Â
Infrared Thermography Temperature
kaggle.com
Updated Jan 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joakim Arvidsson (2024). Infrared Thermography Temperature [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/infrared-thermography-temperature/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 17, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Joakim Arvidsson
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Abstract

To evaluate the benefit of implementing standardized deployment and acquisition practices in the measurement of elevated body temperature (EBT) with infrared thermographs (IRTs), we conducted a clinical study with more than one thousand subjects. Subject oral temperatures were measured and facial thermal images captured with two evaluated IRTs. Based on the thermal images, temperatures from different locations on the face were extracted based on developed method and are listed in six CSV file as the open database. All data in these files has been de-identified. Based on these data, we published two main clinical study papers, with a focus on IRT effectiveness for EBT measurements and metrics for evaluating IRT clinical accuracy, respectively. Further analysis of this database can still be valuable.

Background

Infrared thermographs (IRTs) have been utilized for measuring elevated body temperature (EBT) during infectious disease outbreaks such as severe acute respiratory syndrome, Ebola virus disease, and coronavirus disease 2019. While IRTs hold promise as a tool for measuring EBT, the literature suggests that their diagnostic performance can be inconsistent, potentially due to wide variations in device quality and implementation methodology. Recommendations for improved measurement of EBT with IRTs have been published in an international standard document [1].

To evaluate the utility of this approach, we performed a clinical thermographic imaging study of more than 1000 subjects, acquiring facial temperature data from different locations with two IRTs that had been evaluated in our lab and reference oral temperature data with an oral thermometer. The aims of this clinical study were to (1) evaluate the performance of IRTs following consensus guidelines, (2) evaluate the effect of facial measurement location, (3) compare methods for IRT calibration, (4) identify best practices for assessing IRT clinical accuracy, and (5) compare IRT clinical accuracy to that of non-contact infrared thermometers.

Since the publication of our initial results [2, 3], we have received much interest from researchers regarding the availability of this data. Given its unique characteristics – including extensive size and rigor in acquisition (e.g., use of an in-frame blackbody to maximize accuracy) – we believe that this data may provide substantial value to both academic and industrial researchers. Furthermore, we are aware of no other publicly available IRT study data set that is currently available. It is our hope that releasing this data will benefit the development of effective thermal imaging systems to enhance public health, whether for standard clinical applications or future pandemics.

Methods

Two evaluated infrared thermographs (IRTs) [4] were used for the clinical study – one from FLIR (IRT-1) and the other from ICI (IRT-2). Originally, data was collected from 1115 subjects, but 6 subjects had incomplete records and 56 subjects had only one oral temperature reading or two oral temperature readings with difference greater than 0.5°C. Additionally, 33 subjects from IRT-1 and 43 subjects from IRT-2 had blurred images caused by motion artifacts. Therefore, data from these subjects were excluded. In total, data from 1020 subjects measured with IRT-1 and 1009 subjects measured with IRT-2 are available for analysis. About 11% of these subjects exhibited reference temperature above 37.5 °C. The demographics of study subjects can be found in our publications [2, 3]. Themethodology employed for the study execution, data collection and initial processing are also detailed in the publications.

The raw data from the clinical study are mainly thousands of thermal and visible images captured with two evaluated IRTs [4] and a webcam together with oral temperature data measured with an oral thermometer. Based on these raw data, temperatures at different facial locations for each subject were extracted through a free-form deformation approach for registration of visible and infrared facial images [5]. A total of 26 facial temperature variables were defined as explained in the Data Description section. Figure 1 illustrated some of these variables that have been used in our publications [2, 3]. In addition, we recorded the oral temperatures of each subject twice, under two operation modes (fast mode and monitor mode) of the oral thermometer each time. This database includes the average value for each mode. Since the fast mode is less accurate than the monitor mode, only data from the monitor mode were used in our publications. Furthermore, demographic information such as gender, age, ethnicity, ambient temperature, relative humidity, measuring distance, whether cosmetic was applied, and mearing time and date were recorded for each subject and listed in the database. Not every subject has a full record of al...
f
PAC De-Identified study data.
plos.figshare.com
xlsx
Updated May 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deyana Altahsh; Linda Holdbrook; Eric Norrie; Adanech Sahilie; Mohammad Yasir Essar; Rabina Grewal; Olha Horbach; Fawzia Abdaly; Maria Santana; Rachel Talavlikar; Michael Aucoin; Annalee Coakley; Gabriel E. Fabreau (2025). PAC De-Identified study data. [Dataset]. http://doi.org/10.1371/journal.pone.0323746.s007
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0323746.s007
Dataset updated
May 9, 2025
Dataset provided by
PLOS ONE
Authors
Deyana Altahsh; Linda Holdbrook; Eric Norrie; Adanech Sahilie; Mohammad Yasir Essar; Rabina Grewal; Olha Horbach; Fawzia Abdaly; Maria Santana; Rachel Talavlikar; Michael Aucoin; Annalee Coakley; Gabriel E. Fabreau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundDespite rising forced displacement globally, refugees’ health and research priorities are largely unknown. We investigated whether a diverse refugee committee could utilize participatory methods to identify health priorities and a research agenda to address them.MethodsWe conducted a qualitative study with focus groups of current and former refugees, asylum claimants and evacuees from a specialized refugee clinic over a year in Calgary, Alberta, Canada. We collected sociodemographic data using standardized instruments, then utilized a four-step nominal group technique process (idea generation, recording, discussion, and voting) to identify and rank participants’ health and research priorities. Participants ranked their top five priorities across three time periods: Pre-migration/early arrival (0–3 months), post-migration (3 months–2 years), and long-term health (>2 years). Participants created overarching priorities and corroborated findings via a member checking step.FindingsTwenty-three participants (median age 35 years) attended one or more of five focus groups. Twenty-one completed sociodemographic surveys: 16/21 (76%) were women, representing 8 countries of origin. Participants identified “more family physicians” and “improving health system navigation” (11/60 votes each) as top health and research priorities respectively across all resettlement periods. Participants also prioritized pre-departure healthcare system orientation and improved post-arrival and long-term mental health services. Twelve participants completed the member checking process, affirming the results with minor clarifications.InterpretationThis proof-of-concept study illustrates how refugees can use a rigorous consensus process without external influence to prioritize their healthcare needs, direct a health research agenda to address those needs, and co-produce research. These low-cost participatory methods should be replicated elsewhere.
n
Data from: Natural language processing and recurrent network models for...
data.niaid.nih.gov
datadryad.org
zip
Updated Feb 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meijian Guan; Samuel Cho; Robin Petro; Wei Zhang; Boris Pasche; Umit Topaloglu (2019). Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes [Dataset]. http://doi.org/10.5061/dryad.f9m8217
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.f9m8217
Dataset updated
Feb 6, 2019
Dataset provided by
Wake Forest University
Wake Forest Baptist Comprehensive Cancer Center, Winston Salem, North Carolina, USA
Authors
Meijian Guan; Samuel Cho; Robin Petro; Wei Zhang; Boris Pasche; Umit Topaloglu
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Objectives: Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients.

Methods: We obtained 5,889 de-identified progress reports (2,439 words on average) for 755 cancer patients who have undergone a clinical Next Generation Sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit (GRU), long-short term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to five machine learning algorithms including Naive Bayes (NB), K-nearest Neighbor (KNN), Support Vector Machine for classification (SVC), Random Forest (RF), and Logistic Regression (LR).

Results: Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pre-trained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%.

Discussion and Conclusion: NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.
c
A dataset of histopathological whole slide images for classification of...
stage.cancerimagingarchive.net
cancerimagingarchive.net
n/a, svs, xlsx
Updated Apr 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2023). A dataset of histopathological whole slide images for classification of Treatment effectiveness to ovarian cancer [Dataset]. http://doi.org/10.7937/TCIA.985G-EY35
Explore at:
svs, xlsx, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/TCIA.985G-EY35
Dataset updated
Apr 26, 2023
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Apr 26, 2023
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Despite the progress made during the last two decades in the surgery and chemotherapy of ovarian cancer, more than 70% of advanced patients are with recurrent cancer and decease. Bevacizumab has been recently approved by FDA as a monotherapy for advanced ovarian cancer in combination with chemotherapy. Considering the cost, potential toxicity, and finding that only a portion of patients will benefit from these drugs, the identification of a new predictive method for the treatment of ovarian cancer remains an urgent unmet medical need. Prediction of therapeutic effects and individualization of therapeutic strategies are critical, but to the authors' best knowledge, there are no effective biomarkers that can be used to predict patient response to bevacizumab treatment for ovarian cancer. This dataset helps researchers to explore and develop methods to predict the therapeutic effect of patients with epithelial ovarian cancer to bevacizumab.
The dataset consists of de-identified 288 hematoxylin and eosin (H&E) stained whole slides with clinical information from 78 patients. The slides were collected from the tissue bank of the Tri-Service General Hospital and the National Defense Medical Center, Taipei, Taiwan. Whole Slide Images (WSIs) were acquired with a digital slide scanner (Leica AT2) with a 20x objective lens. The dimension of the ovarian cancer slides is 54342x41048 in pixels and 27.34 x 20.66mm on average. The bevacizumab treatment is effective in 162 and invalid in 126 of the dataset. Ethical approvals have been obtained from the research ethics committee of the Tri-Service General Hospital (TSGHIRB No.1-107-05-171 and No.B202005070), and the data were de-identified and used for a retrospective study without impacting patient care.
The clinicopathologic characteristics of patients were recorded by the data managers of the Gynecologic Oncology Center. Age, pre- and post-treatment serum CA-125 concentrations, histologic subtype, and recurrence, and survival status were recorded. A tumor, which is resistant to bevacizumab therapy, is defined as a measurable regrowth of the tumor or as a serum CA-125 concentration more than twice the value of the upper limit of normal during the treatment course for the bevacizumab therapy (i.e., the patient had the detectable disease or elevated CA-125 level following cytoreductive surgery combine with carboplatin/paclitaxel plus bevacizumab). A tumor, which is sensitive to bevacizumab therapy, is defined as no measurable regrowth of the tumor or as a serum CA-125 concentration under than twice the value of the upper limit of normal during the treatment course for the bevacizumab therapy.
This dataset is further described in the following publications:
Wang et al. Weakly Supervised Deep Learning for Prediction of Treatment Effectiveness on Ovarian Cancer from Histopathology Images. Computerized Medical Imaging and Graphics. https://doi.org/10.1016/j.compmedimag.2022.102093
Wang, CW., Chang, CC., Khalil, M.A. et al. Histopathological whole slide image dataset for classification of treatment effectiveness to ovarian cancer. Sci Data 9, 25 (2022). https://doi.org/10.1038/s41597-022-01127-6
Facebook: Survey on Gender Equality at Home 2020 - World
catalog.ihsn.org
Updated Nov 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Equal Measures 2030 (2021). Facebook: Survey on Gender Equality at Home 2020 - World [Dataset]. https://catalog.ihsn.org/catalog/9885
Explore at:
Dataset updated
Nov 3, 2021
Dataset provided by
World Bankhttp://worldbank.org/
UN Womenhttp://unwomen.org/
Facebookhttps://www.fb.com/
Ladysmith
Equal Measures 2030
Time period covered
2020
Area covered
World, World
Description
Abstract

Facebook’s Survey on Gender Equality at Home generates a global snapshot of women and men’s access to resources, their time spent on unpaid care work, and their attitudes about equality. This survey covers topics about gender dynamics and norms, unpaid caregiving, and life during the COVID-19 pandemic. Aggregated data is available publicly on Humanitarian Data Exchange (HDX). De-identified microdata is also available to eligible nonprofits and universities through Facebook’s Data for Good (DFG) program. For more information, please email dataforgood@fb.com.

Geographic coverage

This survey is fielded once a year in over 200 countries and 60 languages. The data can help researchers track trends in gender equality and progress on the Sustainable Development Goals.

Analysis unit

Public Aggregate Data on HDX: country or regional levels

De-identified Microdata through Facebook Data for Good program: Individual level

Universe

The survey was fielded to active Facebook users.

Kind of data

Sample survey data [ssd]

Sampling procedure

Respondents were sampled across seven regions: - East Asia and Pacific; Europe and Central Asia - Latin America and Caribbean - Middle East and North Africa - North America - Sub-Saharan Africa - South Asia

For the purposes of this report, responses have been aggregated up to the regional level; these regional estimates form the basis of this report and its associated products (Regional Briefs). In order to ensure respondent confidentiality, these estimates are based on responses where a sufficient number of people responded to each question and thus where confidentiality can be assured. This results in a sample of 461,748 respondents.

The sampling frame for this survey is the global database of Facebook users who were active on the platform at least once over the past 28 days, which offers a number of advantages: It allows for the design, implementation, and launch of a survey in a timely manner. Large sample sizes allow for more questions to be asked through random assignment of modules, avoiding respondent fatigue. Samples may be drawn from diverse segments of the online population. Knowledge of the overall sampling frame allowed for more rigorous probabilistic sampling techniques and non-response adjustments than is typical for online and phone surveys

Mode of data collection

Internet [int]

Research instrument

The survey includes a total of 75 questions, split across into the following sections: - Basic demographics and gender norms - Decision making and resource allocation across household members - Unpaid caregiving - Additional household demographics and COVID-19 impact - Optional questions for special groups (e.g. students, business owners, the employed, and the unemployed)

Questions were developed collaboratively by a team of economists and gender experts from the World Bank, UN Women, Equal Measures 2030, and Ladysmith. Some of the questions have been borrowed from other surveys that employ alternative modes of administration (e.g., face-to-face, telephone surveys, etc.); this allows for comparability and identification of potential gaps and biases inherent to Facebook and other online survey platforms. As such, the survey also generates methodological insights that are useful to researchers undertaking alternative modes of data collection during the COVID-19 era.

In order to avoid “survey fatigue,” wherein respondents begin to disengage from the survey content and responses become less reliable, each respondent was only asked to answer a subset of questions. Specifically, each respondent saw a maximum of 30 questions, comprising demographics (asked of all respondents) and a set of additional questions randomly and purposely allocated to them.

Response rate

Response rates to online surveys vary widely depending on a number of factors including survey length, region, strength of the relationship with invitees, incentive mechanisms, invite copy, interest of respondents in the topic and survey design.

Sampling error estimates

Any survey data is prone to several forms of error and biases that need to be considered to understand how closely the results reflect the intended population. In particular, the following components of the total survey error are noteworthy:

Sampling error is a natural characteristic of every survey based on samples and reflects the uncertainty in any survey result that is attributable to the fact that not the whole population is surveyed.

Other factors beyond sampling error that contribute to such potential differences are frame or coverage error and nonresponse error.

Data appraisal

Survey Limitations The survey only captures respondents who: (1) have access to the Internet (2) are Facebook users (3) opt to take this survey through the Facebook platform. Knowledge of the overall demographics of the online population in each region allows for calibration such that estimates are representative at this level. However, this means the results only tell us something about the online population in each region, not the overall population. As such, the survey cannot generate global estimates or meaningful comparisons across countries and regions, given the heterogeneity in internet connectivity across countries. Estimates have only been generated for respondents who gave their gender as male or female. The survey included an “other” option but very few respondents selected it, making it impossible to generate meaningful estimates for non-binary populations. It is important to note that the survey was not designed to paint a comprehensive picture of household dynamics but rather to shed light on respondents’ reported experiences and roles within households
Firemaker image collection for benchmarking forensic writer identification...
zenodo.org
explore.openaire.eu
+1more
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lambert Schomaker; Louis Vuurpijl; Lambert Schomaker; Louis Vuurpijl (2020). Firemaker image collection for benchmarking forensic writer identification using image-based pattern recognition [Dataset]. http://doi.org/10.5281/zenodo.1194612
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1194612
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lambert Schomaker; Louis Vuurpijl; Lambert Schomaker; Louis Vuurpijl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Disclaimer and terms of use:
============================

/*****************************************************************************\
* *
* *
* This is the Firemaker NFI-images Distribution *
* *
* This distribution contains 1000 images of scanned handwritten text, *
* scanned at resolution 300dpi grey scale, containing pages of *
* handwritten text by 250 writers, four pages per writer, from four *
* writing conditions, one condition per page. The conditions are: *
* p1: copied, natural style, p2: copied, UPPER case, p3: copied and forged, *
* i.e.,"try to write in a different style than your natural style", and p4, *
* self generated, i.e., text produced to describe a given cartoon. *
* *
* *
* *
* Copyright The International Unipen Foundation, 2000, All rights reserved *
*******************************************************************************
* *
* *
* DISCLAIMER AND COPYRIGHT NOTICE FOR ALL DATA CONTAINED ON THIS CDROM: *
* *
* *
* 1) PERMISSION IS HEREBY GRANTED TO USE THE DATA FOR RESEARCH *
* PURPOSES. IT IS NOT ALLOWED TO DISTRIBUTE THIS DATA FOR COMMERCIAL *
* PURPOSES. *
* *
* *
* 2) PROVIDER GIVES NO EXPRESS OR IMPLIED WARRANTY OF ANY KIND AND ANY *
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR PURPOSE ARE *
* DISCLAIMED. *
* *
* 3) PROVIDER SHALL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL, *
* INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THIS *
* DATA. *
* *
* 4) THE USER SHOULD REFER TO THE FIRST PUBLIC ARTICLE ON THIS DATA SET: *
* *
* M. Bulacu, L. Schomaker & L. Vuurpijl (2003). *
* Writer identification using edge-based directional features. *
* ICDAR '03: Proceedings of the 7th International Conference on Document *
* Analysis and Recognition, pp. 937-941. *
* Piscataway: IEEE Computer, ISBN 0-7695-1960-1 *
* *
* 5) THE RECIPIENT SHOULD REFRAIN FROM PROLIFERATING THE DATA SET TO THIRD *
* PARTIES EXTERNAL TO HIS/HER LOCAL RESEARCH GROUP. PLEASE REFER INTERESTED *
* RESEARCHERS TO HTTP://UNIPEN.ORG FOR OBTAINING THEIR OWN COPY. *
\*****************************************************************************/

BibTeX entry:

@inproceedings{Firemaker,
author = {Bulacu, M. and Schomaker, L.R.B. and Vuurpijl, L.},
title = {Writer Identification Using Edge-Based Directional Features},
booktitle = {ICDAR '03: Proceedings of the 7th International
Conference on Document Analysis and Recognition},
year = {2003},
isbn = {0-7695-1960-1},
pages = {937-941},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

In the project "Vergelijk", a grant obtained from the Dutch Forensic Science
Institute, two existing professional writer-identification systems have been
compared regarding usability studies and in particular recognition
performance (Schomaker & Vuurpijl, 2000). The results of this comparison
are contained in a confidential report:

L.R.B. Schomaker and L.G. Vuurpijl (2000).
Forensic writer identification: A benchmark data set
and a comparison of two systems. Technical report,
Nijmegen Institute for Cognition and Information (NICI),
University of Nijmegen, The Netherlands.

Informative and non-confidential details from this report are
given in the accompanying file: 'firemaker-dbase.pdf'

To compare both systems, a carefully designed experiment was conducted to
record handwritten samples from male and female writers in several conditions:

Condition 1: Normal constrained handwriting
==============================================

Below, the Dutch text writers had to produce in normal handwriting is given.

--- start text ----
Zij bezochten veilingen en reisden met de KLM. Voor
korte afstanden huurden ze een auto, meestal een VW
of een Ford.

Condition 2: Production of constrained block capital handwriting
================================================================

In this condition, the writers had to produce the following text
in block-capital handwriting:

--- start text ----
NADAT ZE IN NEW YORK, TOKYO, QUÉBEC, PARIJS, ZÜRICH
EN OSLO WAREN GEWEEST, VLOGEN ZE UIT DE USA TERUG
MET VLUCHT KL 658 OM 12 UUR.

Condition 3: Production of free-forged handwriting
==================================================

Below, the text writers had to produce in the free-forged handwriting
condition is given. No example of handwriting is given which they have to
mimick (forge), the condition concerns a self-conceived distorted
handwriting style.

--- start text ----
Nog dezelfde avond reden ze naar hun vrienden
Chris, Emile, Jan, Irene en Henk, nadat ze hun
vriendinnen Greta en Maria hadden opgehaald.

Condition 4: Production of unconstrained handwriting
====================================================

The final text writers had to produce is unconstrained handwriting.
The cartoon, a series of pictures concerning a 'UFO' landing had
to be described in their own words, in at least six lines of text.
See image file "space.gif".

Thruth labels and writer identifications
========================================

Each writer has a unique id, specified as:

id: {num}{set}
num: a three-digit number
set: either 01, 02, 03 or 04, identifying one of the 4 experiments

The vast majority of the writers producing sets 01, 02 and 03 mimicked the
content and layout (empty lines) of the constrained texts they had to copy
sufficiently accurately, such that the example texts are a good indication of
the contents. However, as set 04 ("describe cartoon story") contains
unconstrained self-generated handwriting, the corresponding thruth labels had
to be extracted manually. The resulting label files are contained in the
directory ./300dpi/p4-self-natural/labels/

Note: no letter, word, line or paragraph segmentation is provided with this
data set. The main text can be cropped easily. Since the orientation is
horizontal, projection techniques can be used to extract lines, using
a line-spacing parameter (~94 pixels line height) as an additional check.

Overview of directories:

300dpi/
p1-copy-normal/ Copying task, normal writing style
p2-copy-upper/ Copying task, UPPER-case
p3-copy-forged/ Copying task, instructed to mimic another script style
p4-self-natural/ Self-generated text, natural writing condition

Note: the original raw collection contained writer #155, who has been removed
from this data set, as his first condition (p1) was
Storage Area Network (San) Market Analysis North America, Europe, APAC,...
technavio.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio, Storage Area Network (San) Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, UK, Canada, Germany, China - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/storage-area-network-san-market-analysis
Explore at:
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
United States, Global
Description
Snapshot img

Storage Area Network (SAN) Market Size 2024-2028

The storage area network (san) market size is forecast to increase by USD 35.46 billion, at a CAGR of 16.83% between 2023 and 2028.

The market is experiencing significant growth, driven primarily by the increasing need for data backup and redundancy in the context of digital transformation. Businesses are increasingly adopting digital strategies, leading to an explosion of data. SAN technology offers a scalable, flexible, and high-performance solution for managing this data, making it an essential component of modern IT infrastructure. However, this market is not without challenges. Cybersecurity threats pose a significant obstacle, with SANs being a prime target due to their critical role in data management. Ensuring the security of SANs is a top priority for organizations, requiring significant investment in cybersecurity solutions and best practices. Additionally, the complexity of SANs can make implementation and management challenging, necessitating specialized expertise and resources. Companies seeking to capitalize on the opportunities presented by the SAN market must navigate these challenges effectively, investing in robust security measures and building a skilled workforce.

What will be the Size of the Storage Area Network (SAN) Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free SampleThe market continues to evolve, with dynamic market activities unfolding across various sectors. Hybrid cloud storage solutions are increasingly adopted, integrating SAN with cloud storage for enhanced performance and flexibility. Data management remains a key focus, with de-identification, backup, and retention strategies being continually refined. Software-defined storage and data deduplication are transforming the landscape, enabling optimization of data center infrastructure. Multi-cloud storage and performance tuning are also gaining traction, allowing businesses to manage and distribute data more efficiently. Network Attached Storage (NAS) and Object Storage are complementing SAN, offering different access methods and use cases. Data compression, compression, and archiving are essential for capacity planning and cost optimization. Security remains a top priority, with encryption, masking, and compliance measures being implemented to protect sensitive data. Disaster recovery and data governance are crucial components of a robust data management strategy. File and block level storage, as well as flash storage, offer varying benefits depending on the application. Storage analytics, auditing, and tiered storage solutions provide valuable insights for capacity planning and performance monitoring. Fibre Channel and Ethernet technologies continue to shape the market, while hyperconverged infrastructure and SAN switches offer streamlined management and consolidation. The ongoing evolution of SAN market is driven by the continuous pursuit of improved performance, cost savings, and enhanced security.

How is this Storage Area Network (SAN) Industry segmented?

The storage area network (san) industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments. ComponentHardwareSoftwareServicesTechnologyFiber channelFiber channel over ethernetInfinibandiSCSI protocolGeographyNorth AmericaUSCanadaEuropeGermanyUKAPACChinaRest of World (ROW).

By Component Insights

The hardware segment is estimated to witness significant growth during the forecast period.The market encompasses hardware, software, and services that interconnect storage devices and servers. Hardware components, a crucial part of this infrastructure, consist of fiber channels and related hardware such as hubs, switches, gateways, directors, and routers. The market's growth is driven by the escalating demand for data backup and high-speed networking. Additionally, the ongoing digital transformation worldwide is anticipated to significantly boost the hardware segment's expansion. Data backup and disaster recovery are essential functions in today's business environment, necessitating the need for efficient and reliable storage solutions. Software-defined storage, data deduplication, compression, and tiered storage are some advanced technologies enhancing data backup and recovery capabilities. Furthermore, data security, compliance, and governance are critical concerns, leading to the adoption of data encryption, masking, and access control mechanisms. Network Attached Storage (NAS) and Cloud Storage offer alternative storage architectures to SAN. Multi-cloud storage and hybrid cloud strategies are gaining traction, necessitating seamless integration and
p
MIMIC-IV
physionet.org
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
Explore at:
Unique identifier
https://doi.org/10.13026/kpb9-mt58
Dataset updated
Oct 11, 2024
Authors
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
n
Great Lakes Restoration Initiative human wellbeing survey
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Jurjonas (2024). Great Lakes Restoration Initiative human wellbeing survey [Dataset]. http://doi.org/10.5061/dryad.4b8gthtk3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.4b8gthtk3
Dataset updated
May 23, 2024
Dataset provided by
The Nature Conservancy
Authors
Matthew Jurjonas
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
The Great Lakes
Description
This dataset represents a comprehensive exploration of ecosystem restoration practices and their impacts on both ecological and human wellbeing indicators. Traditionally, ecosystem restoration efforts have focused on ecological benchmarks such as water and habitat quality, species abundance, and vegetation cover. However, there is an increasing recognition of the interplay between restoration and human communities, evidenced by positive socio-ecological connections like property value, natural hazard mitigation, recreation opportunities, and overall happiness. With the United Nations declaring 2021-2030 as the "Decade of Ecosystem Restoration" and a push for more socio-ecological goals in restoration, this dataset delves into the degree to which restoration practitioners consider human wellbeing. It is based on a case study of the Great Lakes Restoration Initiative (GLRI), a federally funded program that has awarded over $3.5 billion to 5,300 projects. A total of 1,574 GLRI projects were surveyed, with 437 responses received, revealing that almost half of these projects set human wellbeing goals, and more than 70% believed they achieved them. In comparison, 90% of project managers believed they met their ecological goals. This dataset highlights the documented perceptions of positive impacts on both people and nature, suggesting that restoration efforts often go beyond traditional indicators. As such, it advocates for the adoption of a socio-ecological perspective in ecosystem restoration programs to comprehensively document the full extent of restoration outcomes. The data collection process included a survey methodology, and the dataset provides insights into project design, implementation, and success measurements. The data was collected between November 2020 and March 2021, with a maximum of three contact attempts for each project. It offers a unique perspective on the relationship between ecosystem restoration and human wellbeing, emphasizing the importance of capturing the often "unseen" benefits of these projects. Methods Data collection involved a survey method for evaluating 1,574 projects related to the Great Lakes Restoration Initiative (GLRI). The definition of restoration was broad, encompassing various aspects of improvement determined by project managers. Human wellbeing was assessed using defined components. Local recipient organizations from the GLRI database were identified as representatives for the projects. After three contact attempts, we received 437 completed project surveys, resulting in a 27.9% response rate. Notably, out of the 406 recipient organizations responsible for these projects, we received survey responses from 205 unique recipient organizations, achieving a 50.5% response rate. The number of surveys completed per recipient organization ranged from 1 to 36. The initial email script requested recipient organizations to discreetly forward the survey to the relevant internal project managers. Consequently, we lack precise information on the total number of individual survey participants. We included 14 incomplete responses in our analysis and resolved 9 duplicate submissions by considering the more comprehensive survey. In addition, we encountered five refusals, with one being an outright refusal and four indicating that the project managers were no longer affiliated with the organization. The survey was conducted between November 2020 and March 2021, with a maximum of three contact attempts. Recipients were informed of the voluntary nature of participation, data confidentiality, and data de-identification. The initial email included project details and a unique survey ID#, and reminders were sent in January and March 2021. Contact information for the research team was provided in all communications.
Identifying evidence-practice gaps and strategies for improvement in...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melanie E. Gibson-Helm; Jodie Bailie; Veronica Matthews; Alison F. Laycock; Jacqueline A. Boyle; Ross S. Bailie (2023). Identifying evidence-practice gaps and strategies for improvement in Aboriginal and Torres Strait Islander maternal health care [Dataset]. http://doi.org/10.1371/journal.pone.0192262
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0192262
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Melanie E. Gibson-Helm; Jodie Bailie; Veronica Matthews; Alison F. Laycock; Jacqueline A. Boyle; Ross S. Bailie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Torres Strait
Description
IntroductionAdverse pregnancy outcomes are more common among Aboriginal and Torres Strait Islander populations than non-Indigenous populations in Australia. Later in life, most of the difference in life expectancy between Aboriginal and Torres Strait Islander women and non-Indigenous women is due to non-communicable diseases (NCDs). Most Aboriginal and Torres Strait Islander women attend health services regularly during pregnancy. Providing high-quality care within these appointments has an important role to play in improving the current and future health of women and babies.AimThis study engaged stakeholders in a theory-informed process to use aggregated continuous quality improvement (CQI) data to identify 1) priority evidence-practice gaps in Aboriginal and Torres Strait Islander maternal health care, 2) barriers and enablers to high-quality care, and 3) strategies to address identified priorities.MethodsThree phases of reporting and feedback were implemented using de-identified CQI data from 91 health services between 2007 and 2014 (4,402 client records). Stakeholders (n = 172) from a range of professions and organisations participated.ResultsStakeholders identified four priority areas relating to NCDs: smoking, alcohol, psychosocial wellbeing and nutrition. Barriers or enablers to high-quality care included workforce support, professional development, teamwork, woman-centred care, decision support, equipment and community engagement. Strategies to address the priorities included upskilling staff to provide best practice care in priority areas, advocating for availability of healthy food, housing and local referral options, partnering with communities on health promotion projects, systems to facilitate continuity of care and clear referral pathways.ConclusionsThis novel use of large-scale aggregate CQI data facilitated stakeholder input on priority evidence-practice gaps in maternal health care in Australia. Evidence-practice gaps relating to NCD risk factors and social determinants of health were prioritised, and stakeholders suggested both healthcare-focussed initiatives and approaches involving the community and the wider health sector. The findings can inform health service planning, advocacy, inter-agency strategies, and future research.
f
Data from: Identification and Characterization of the Sulfolobus...
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poh Kuan Chong; Phillip C. Wright (2023). Identification and Characterization of the Sulfolobus solfataricus P2 Proteome [Dataset]. http://doi.org/10.1021/pr0501214.s001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1021/pr0501214.s001
Dataset updated
May 31, 2023
Dataset provided by
ACS Publications
Authors
Poh Kuan Chong; Phillip C. Wright
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Via combined separation approaches, a total of 1399 proteins were identified, representing 47% of the Sulfolobus solfataricus P2 theoretical proteome. This includes 1323 proteins from the soluble fraction, 44 from the insoluble fraction and 32 from the extra-cellular or secreted fraction. We used conventional 2-dimensional gel electrophoresis (2-DE) for the soluble fraction, and shotgun proteomics for all three cell fractions (soluble, insoluble, and secreted). Two gel-based fractionation methods were explored for shotgun proteomics, namely: (i) protein separation utilizing 1-dimensional gel electrophoresis (1-DE) followed by peptide fractionation by iso-electric focusing (IEF), and (ii) protein and peptide fractionation both employing IEF. Results indicate that a 1D-IEF fractionation workflow with three replicate mass spectrometric analyses gave the best overall result for soluble protein identification. A greater than 50% increment in protein identification was achieved with three injections using LC−ESI−MS/MS. Protein and peptide fractionation efficiency; together with the filtration criteria are also discussed. Keywords: 2-DE • shotgun • LC−MS/MS • multiple injections • pre-fractionation • S. solfataricus
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Market Research Forecast (2025). Data De-identification and Pseudonymity Software Report [Dataset]. https://www.marketresearchforecast.com/reports/data-de-identification-and-pseudonymity-software-30730

Data De-identification and Pseudonymity Software Report

Explore at:

ppt, doc, pdfAvailable download formats

Dataset updated

Mar 9, 2025

Dataset authored and provided by

Market Research Forecast

License

https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

Time period covered

2025 - 2033

Area covered

Global

Variables measured

Market Size

Description

The Data De-identification and Pseudonymization Software market is experiencing robust growth, projected to reach $1941.6 million in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 7.3%. This expansion is driven by increasing regulatory compliance needs (like GDPR and CCPA), heightened concerns regarding data privacy and security breaches, and the burgeoning adoption of cloud-based solutions. The market is segmented by deployment (cloud-based and on-premises) and application (large enterprises and SMEs). Cloud-based solutions are gaining significant traction due to their scalability, cost-effectiveness, and ease of implementation, while large enterprises dominate the application segment due to their greater need for robust data protection strategies and larger budgets. Key market players include established tech giants like IBM and Informatica, alongside specialized providers such as Very Good Security and Anonomatic, indicating a dynamic competitive landscape with both established and emerging players vying for market share. Geographic expansion is also a key driver, with North America currently holding a significant market share, followed by Europe and Asia Pacific. The forecast period (2025-2033) anticipates continued growth fueled by advancements in artificial intelligence and machine learning for enhanced de-identification techniques, and the increasing demand for data anonymization across various sectors like healthcare, finance, and government. The restraining factors, while present, are not expected to significantly hinder the market’s overall growth trajectory. These limitations might include the complexity of implementing robust de-identification solutions, the potential for re-identification risks despite advanced techniques, and the ongoing evolution of privacy regulations necessitating continuous adaptation of software capabilities. However, ongoing innovation and technological advancements are anticipated to mitigate these challenges. The continuous development of more sophisticated algorithms and solutions addresses re-identification vulnerabilities, while proactive industry collaboration and regulatory guidance aim to streamline implementation processes, ultimately fostering continued market expansion. The increasing adoption of data anonymization across diverse sectors, coupled with the expanding global digital landscape and related data protection needs, suggests a positive outlook for sustained market growth throughout the forecast period.

Clear search

Close search

Google apps

Main menu

Data De-identification and Pseudonymity Software Report

Data in Support of the MIDI-B Challenge (MIDI-B-Synthetic-Validation,...

Abstract

Introduction

Methods

Subject Inclusion and Exclusion Criteria

Data Acquisition

Data Analysis

Usage Notes

i2b2 Research Data Warehouse

Data from: Medical data formatting to improve physician interpretation speed...

Properties of healthcare teaming networks as a function of network...

Teaching undergraduates with quantitative data in the social sciences at...

Infrared Thermography Temperature

PAC De-Identified study data.

Data from: Natural language processing and recurrent network models for...

A dataset of histopathological whole slide images for classification of...

Facebook: Survey on Gender Equality at Home 2020 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Sampling error estimates

Data appraisal

Firemaker image collection for benchmarking forensic writer identification...

Storage Area Network (San) Market Analysis North America, Europe, APAC,...

Snapshot img

MIMIC-IV

Great Lakes Restoration Initiative human wellbeing survey

Identifying evidence-practice gaps and strategies for improvement in...

Data from: Identification and Characterization of the Sulfolobus...

Data De-identification and Pseudonymity Software Report