71 datasets found

New York State Hospital De-Identified Data Data Package
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). New York State Hospital De-Identified Data Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/new-york-state-hospital-de-identified-data-data-package/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
New York
Description
This data package shows the information on hospital discharges at patient-level data with basic record details without showing protected health information (PHI) and was made not identifiable. The data is classified by Health Service Area and county.
Z
MultiCaRe: An open-source clinical case dataset for medical image...
data.niaid.nih.gov
explore.openaire.eu
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nievas Offidani, Mauro (2025). MultiCaRe: An open-source clinical case dataset for medical image classification and multimodal AI applications [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10079369
Explore at:
Dataset updated
Mar 9, 2025
Dataset authored and provided by
Nievas Offidani, Mauro
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The dataset contains multi-modal data from over 70,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

More than 90,000 patients and 280,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.The license of the dataset as a whole is CC BY-NC-SA. However, its individual contents may have less restrictive license types (CC BY, CC BY-NC, CC0). For instance, regarding image filess, 66K of them are CC BY, 32K are CC BY-NC-SA, 32K are CC BY-NC, and 20 of them are CC0.
Z
Data set supplementing "Determinants of Laypersons' Trust in Medical...
data.niaid.nih.gov
zenodo.org
Updated Mar 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rieger, Tobias (2022). Data set supplementing "Determinants of Laypersons' Trust in Medical Decision Aids: Randomized Controlled Trial" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6340520
Explore at:
Dataset updated
Mar 12, 2022
Dataset provided by
Schmieding, Malte
Feufel, Markus
Roesler, Eileen
Kopka, Marvin
Rieger, Tobias
Balzer, Felix
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the de-identified data set used to conduct the analyses in the preprint submitted to JMIR Human Factors under the title "Determinants of Laypersons’ Trust in Medical Decision Aids: Randomized Controlled Trial" (https://doi.org/10.2196/35219).

This dataset contains 494 respondents' appraisals of a fictitious case vignette. They received support from a decision aid (that always disagreed with participants' first appraisal) showing a mock symptom checker logo, a decision aid framed as anthropomorphic or as an AI. Their second appraisal - taking into account the symptom checker advice - was collected again.

Additionally, the data contains participants'

age

gender

education

medical training

propensity to trust

eHealth Literacy

certainty in their appraisals

trust in the decision aid
NIST Collaborative Research Cycle Data and Metrics Archive
catalog.data.gov
data.nist.gov
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). NIST Collaborative Research Cycle Data and Metrics Archive [Dataset]. https://catalog.data.gov/dataset/nist-collaborative-research-cycle-data-and-metrics-archive
Explore at:
Dataset updated
Apr 11, 2024
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This repository contains the collected resources submitted to and created by the NIST Collaborative Research Cycle (CRC) Data and Metrics Archive. The NIST Collaborative Research Cycle (CRC) is an ongoing effort to benchmark, compare, and investigate deidentification technologies. The program asks the research community to deidentify a compact and interesting dataset called the NIST Diverse Communities Data Excerpts, demographic data from communities across the U.S. sourced from the American Community Survey. This repository contains all of the submitted deidentified data instances each accompanied by a detailed abstract describing how the deidentified data were generated. We conduct an extensive standardized evaluation of each deidentified instance using a host of fidelity, utility, and privacy metrics, using out tool, SDNist. We?ve packaged the data, abstracts, and evaluation results into a human- and machine-readable archive.
Hospital Inpatient Discharges (SPARCS De-Identified): 2019
health.data.ny.gov
application/rdfxml +5
Updated Feb 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York State Department of Health (2024). Hospital Inpatient Discharges (SPARCS De-Identified): 2019 [Dataset]. https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/4ny4-j5zv
Explore at:
csv, tsv, xml, application/rssxml, json, application/rdfxmlAvailable download formats
Dataset updated
Feb 15, 2024
Dataset authored and provided by
New York State Department of Health
Description
The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified File contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data file contains basic record level detail for the discharge. The de-identified data file does not contain data that is protected health information (PHI) under HIPAA. The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed. Note: This dataset may be downloaded from the attachments section of this page in a smaller, compressed format.
RECAP dataset: Subject, exposure, and health endpoint (blood, lipids,...
catalog.data.gov
Updated Oct 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). RECAP dataset: Subject, exposure, and health endpoint (blood, lipids, cardiac, and lung) data [Dataset]. https://catalog.data.gov/dataset/recap-dataset-subject-exposure-and-health-endpoint-blood-lipids-cardiac-and-lung-data
Explore at:
Dataset updated
Oct 4, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This dataset contains deidentified subject level data from the study titled: Responses to Exposure to Low Levels of Concentrated Ambient Particles in Healthy Young Adults (RECAP). Subject, exposure, and health endpoint data are included in the dataset. Health endpoint data includes inflammatory, heart rate variability and cardiac repolarization, lung function, blood chemistry, and lipids measures. This dataset is associated with the following publication: Wyatt, L., R. Devlin, A. Rappold, and M. Case. Low levels of fine particulate matter increase vascular damage and reduce pulmonary function in young healthy adults. Particle and Fibre Toxicology. BioMed Central Ltd, London, UK, 17(1): 58, (2020).
Envestnet | Yodlee's De-Identified Online Purchase Data | Row/Aggregate...
datarade.ai
.sql, .txt
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Envestnet | Yodlee, Envestnet | Yodlee's De-Identified Online Purchase Data | Row/Aggregate Level | USA Consumer Data covering 3600+ corporations | 90M+ Accounts [Dataset]. https://datarade.ai/data-products/envestnet-yodlee-s-de-identified-online-purchase-data-row-envestnet-yodlee
Explore at:
.sql, .txtAvailable download formats
Dataset provided by
Yodlee
Envestnethttp://envestnet.com/
Authors
Envestnet | Yodlee
Area covered
United States of America
Description
Envestnet®| Yodlee®'s Online Purchase Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.

Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.

We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.

Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?

Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.

Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking

Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)

Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence

Market Data: AnalyticsB2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis
d
Smart Triage Jinja Data De-identification
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mawji, Alishah (2023). Smart Triage Jinja Data De-identification [Dataset]. http://doi.org/10.5683/SP3/MSTH98
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/MSTH98
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Mawji, Alishah
Description
This dataset contains de-identified data with an accompanying data dictionary and the R script for de-identification procedures., Objective(s): To demonstrate application of a risk based de-identification framework using the Smart Triage dataset as a clinical example. Data Description: This dataset contains the de-identified version of the Smart Triage Jinja dataset with the accompanying data dictionary and R script for de-identification procedures. Limitations: Utility of the de-identified dataset has only been evaluated with regard to use for the development of prediction models based on a need for hospital admission. Abbreviations: NA Ethics Declaration: The study was reviewed by the instituational review boards at the University of British Columbia in Canada (ID: H19-02398; H20-00484), The Makerere University School of Public Health in Uganda and the Uganda National Council for Science and Technology
p
MIMIC-IV
physionet.org
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
Explore at:
Unique identifier
https://doi.org/10.13026/kpb9-mt58
Dataset updated
Oct 11, 2024
Authors
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
d
Intuizi's De-identified Location Data for Brazil | 6.6+mm Unique Daily...
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Intuizi, Intuizi's De-identified Location Data for Brazil | 6.6+mm Unique Daily Devices | Aggregated Footfall Data [Dataset]. https://datarade.ai/data-products/intuizi-s-gps-location-data-for-brazil-6-6-mm-unique-daily-intuizi
Explore at:
.json, .csvAvailable download formats
Dataset authored and provided by
Intuizi
Area covered
Brazil
Description
This Brazil mobility dataset, provided by Intuizi, is essential for understanding mobility patterns across specific areas in Brazil. Customers use this comprehensive mobile location data to build sophisticated mobility data models, analyze visitation to their own or competitors' premises, and investigate changes in visitation patterns over time. The Intuizi Visitation Dataset includes fully-consented mobile device data, de-identified at the source by entities legally authorized to process such data. We ensure the creation of a de-identified dataset of encrypted ID visitation and mobility data, making it a reliable source for detailed location data insights. Whether you need visit data or aggregated foot traffic data, Intuizi provides the precise solution you require.
d
Land Tenure (de-identified) (LGATE-457) - Datasets - data.wa.gov.au
catalogue.data.wa.gov.au
Updated Apr 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Land Tenure (de-identified) (LGATE-457) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/land-tenure-de-identified
Explore at:
Dataset updated
Apr 6, 2023
Area covered
Western Australia
Description
This dataset should be used in place of Land Tenure (LGATE-226) where data users do not require proprietor names for their business purposes. Information contained within this dataset is not subject to the same strict regulatory compliance measures as Land Tenure (LGATE-226). This dataset contains cadastral polygons, related certificate of title (vol/fol) numbers and document details affecting the last change of land ownership. This layer also contains strata property details along with full property street address information in Australian Standard (AS4590) and text string formats. The data contained within this layer is sourced from Western Australia's digital Land Registry and the authoritative property street address dataset maintained by Landgate. _ NOTE: This product is for information purposes only and is not guaranteed. The information may be out of date and should not be relied upon without further verification from the original documents. Where the information is being used for legal purposes then the original documents must be searched for all legal requirements. Strict access criteria applies, due to sensitivity of information contained in this data service, please contact BusinessSolutions@landgate.wa.gov.au for further information. _ Key information and attributes Certificate of title number, property street address, land parcel identifier (lot on survey), area, document number and type, land type (eg reserve, vacant, freehold, leasehold, easement), survey details, easement and interests. Geometry type: polygon Update cycle: daily Coverage: whole of state Accuracy: This service should not be used for legal purposes. For all legal requirements, please refer to imaged original documents held by Landgate.
H
IC3D participant dataset deidentified
dataverse.harvard.edu
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan McBain (2025). IC3D participant dataset deidentified [Dataset]. http://doi.org/10.7910/DVN/RMUZIO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/RMUZIO
Dataset updated
May 21, 2025
Dataset provided by
Harvard Dataverse
Authors
Ryan McBain
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/RMUZIOhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/RMUZIO
Description
Researchers collected this dataset during a stepped-wedge, cluster-randomized controlled trial in a network of 14 health facilities in Neno District, Malawi, from September 2021 to November 2023. The study enrolled adults who were residing in facility catchment areas, were newly diagnosed with major depressive disorder, and were actively enrolled in integrated chronic care clinics for treatment of one or more chronic conditions. As primary outcomes, researchers measured depression severity using the PHQ-9 and functional impairment using the WHO Disability Assessment Schedule 2.0 (WHODAS).
S
hernia
health.data.ny.gov
application/rdfxml +5
Updated Sep 9, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York State Department of Health (2019). hernia [Dataset]. https://health.data.ny.gov/Health/hernia/7ita-i9h4
Explore at:
json, application/rdfxml, csv, application/rssxml, tsv, xmlAvailable download formats
Dataset updated
Sep 9, 2019
Authors
New York State Department of Health
Description
The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified dataset contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data contains basic record level detail regarding the discharge; however the data does not contain protected health information (PHI) under Health Insurance Portability and Accountability Act (HIPAA). The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed. A downloadable file with this data is available for ease of download at: https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/3m9u-ws8e. For more information check out: http://www.health.ny.gov/statistics/sparcs/ or go to the “About” tab.
d
Antibiotic Resistance Microbiology Dataset (ARMD): A de-identified resource...
datadryad.org
search.dataone.org
+1more
zip
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fateme Nateghi Haredasht; Fatemeh Amrollahi; Manoj Maddali; Nicholas Marshall; Stephen Ma; Amy Chang; Niaz Banaei; Stanley Deresinski; Steven Asch; Mary Goldstein; Jonathan Chen (2025). Antibiotic Resistance Microbiology Dataset (ARMD): A de-identified resource for studying antimicrobial resistance using electronic health records [Dataset]. http://doi.org/10.5061/dryad.jq2bvq8kp
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.jq2bvq8kp
Dataset updated
Jan 23, 2025
Dataset provided by
Dryad
Authors
Fateme Nateghi Haredasht; Fatemeh Amrollahi; Manoj Maddali; Nicholas Marshall; Stephen Ma; Amy Chang; Niaz Banaei; Stanley Deresinski; Steven Asch; Mary Goldstein; Jonathan Chen
Description
The Antibiotic Resistance Microbiology Dataset (ARMD) is a structured and de-identified resource developed using electronic health records (EHR) from Stanford Healthcare. It provides a comprehensive overview of microbiological cultures including urine, respiratory, and blood cultures. This dataset includes 283,715 unique adult patients and features detailed information on culture results, identified organisms, antibiotic susceptibility, and associated demographic and clinical data. The dataset was meticulously constructed through a multi-step process designed to enhance data quality and relevance. By enabling the study of antimicrobial resistance patterns and supporting antimicrobial stewardship efforts, ARMD offers a valuable resource for researchers and clinicians seeking to improve the management of infectious diseases and combat the growing threat of antimicrobial resistance.
u
De-identified Data from the PArTNER Study: A Pragmatic Clinical Trial to...
indigo.uic.edu
csv
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jerry Krishnan; Sai Dheeraj Illendula; Lynn Gerald; Jun Lu (2025). De-identified Data from the PArTNER Study: A Pragmatic Clinical Trial to Improve Patient Experience During Transitions from Hospital to Home [Dataset]. http://doi.org/10.25417/uic.28889918.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.25417/uic.28889918.v1
Dataset updated
May 5, 2025
Dataset provided by
University of Illinois Chicago
Authors
Jerry Krishnan; Sai Dheeraj Illendula; Lynn Gerald; Jun Lu
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
The PArTNER study was a single-center pragmatic randomized clinical trial conducted at a minority-serving hospital in Chicago. It evaluated whether a Navigator intervention—delivered by community health workers and peer coaches—could improve patient experience, health outcomes, and healthcare utilization during the transition from hospital to home among adults hospitalized with heart failure, pneumonia, myocardial infarction (MI), chronic obstructive pulmonary disease (COPD), or sickle cell disease. A total of 1,029 adults, predominantly non-Hispanic Black, participated. The intervention included in-hospital visits, a home visit, and follow-up telephone coaching. The primary outcomes were changes in anxiety and informational support at 30 days post-discharge. The study found no significant overall improvements compared to usual care, although exploratory analyses suggested potential benefits for certain subgroups.Data Description:The dataset includes de-identified information on participant demographics, clinical characteristics, social determinants of health, Patient-Reported Outcomes Measurement Information System (PROMIS) scores (e.g., anxiety, informational support), healthcare utilization outcomes (e.g., hospital readmissions, emergency department visits), and intervention engagement. Data were collected through baseline hospital assessments, telephone follow-up surveys at 30 and 60 days post-discharge, and electronic health record reviews.Publications related to data:LaBedz, Stephanie L., et al. "Pragmatic clinical trial to improve patient experience among adults during transitions from hospital to home: the PArTNER study." Journal of general internal medicine 37.16 (2022): 4103-4111.Prieto-Centurion, Valentin, et al. "Design of the patient navigator to Reduce Readmissions (PArTNER) study: a pragmatic clinical effectiveness trial." contemporary clinical trials communications 15 (2019): 100420.
mimic-iv-clinical-database-demo-2.2
kaggle.com
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Montassar bellah (2025). mimic-iv-clinical-database-demo-2.2 [Dataset]. https://www.kaggle.com/datasets/montassarba/mimic-iv-clinical-database-demo-2-2/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Montassar bellah
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Abstract The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access request.

Background The increasing adoption of digital electronic health records has led to the existence of large datasets that could be used to carry out important research across many areas of medicine. Research progress has been limited, however, due to limitations in the way that the datasets are curated and made available for research. The MIMIC datasets allow credentialed researchers around the world unprecedented access to real world clinical data, helping to reduce the barriers to conducting important medical research. The public availability of the data allows studies to be reproduced and collaboratively improved in ways that would not otherwise be possible.

Methods First, the set of individuals to include in the demo was chosen. Each person in MIMIC-IV is assigned a unique subject_id. As the subject_id is randomly generated, ordering by subject_id results in a random subset of individuals. We only considered individuals with an anchor_year_group value of 2011 - 2013 or 2014 - 2016 to ensure overlap with MIMIC-CXR v2.0.0. The first 100 subject_id who satisfied the anchor_year_group criteria were selected for the demo dataset.

All tables from MIMIC-IV were included in the demo dataset. Tables containing patient information, such as emar or labevents, were filtered using the list of selected subject_id. Tables which do not contain patient level information were included in their entirety (e.g. d_items or d_labitems). Note that all tables which do not contain patient level information are prefixed with the characters 'd_'.

Deidentification was performed following the same approach as the MIMIC-IV database. Protected health information (PHI) as listed in the HIPAA Safe Harbor provision was removed. Patient identifiers were replaced using a random cipher, resulting in deidentified integer identifiers for patients, hospitalizations, and ICU stays. Stringent rules were applied to structured columns based on the data type. Dates were shifted consistently using a random integer removing seasonality, day of the week, and year information. Text fields were filtered by manually curated allow and block lists, as well as context-specific regular expressions. For example, columns containing dose values were filtered to only contain numeric values. If necessary, a free-text deidentification algorithm was applied to remove PHI from free-text. Results of this algorithm were manually reviewed and verified to remove identified PHI.

Data Description MIMIC-IV is a relational database consisting of 26 tables. For a detailed description of the database structure, see the MIMIC-IV Clinical Database page [1] or the MIMIC-IV online documentation [2]. The demo shares an identical schema and structure to the equivalent version of MIMIC-IV.

Data files are distributed in comma separated value (CSV) format following the RFC 4180 standard [3]. The dataset is also made available on Google BigQuery. Instructions to accessing the dataset on BigQuery are provided on the online MIMIC-IV documentation, under the cloud page [2].

An additional file is included: demo_subject_id.csv. This is a list of the subject_id used to filter MIMIC-IV to the demo subset.

Usage Notes The MIMIC-IV demo provides researchers with the opportunity to better understand MIMIC-IV data.

CSV files can be opened natively using any text editor or spreadsheet program. However, as some tables are large it may be preferable to navigate the data via a relational database. We suggest either working with the data in Google BigQuery (see the "Files" section for access details) or creating an SQLite database using the CSV files. SQLite is a lightweight database format which stores all constituent tables in a single file, and SQLite databases interoperate well with a number software tools.

Code is made available for use with MIMIC-IV on the MIMIC-IV code repository [4]. Code provided includes derivation of clinical concepts, tutorials, and reproducible analyses.

Release Notes Release notes for the demo follow the release notes for the MIMIC-IV database.

Ethics This project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the pr...
All-Payer Claims Data (APD De-Identified): Prescription Drug Summary 2021 -...
healthdata.gov
application/rssxml +4
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). All-Payer Claims Data (APD De-Identified): Prescription Drug Summary 2021 - hmyc-35xf - Archive Repository [Dataset]. https://healthdata.gov/dataset/All-Payer-Claims-Data-APD-De-Identified-Prescripti/rh6d-sx9u
Explore at:
json, csv, xml, application/rssxml, tsvAvailable download formats
Dataset updated
May 15, 2025
Description
This dataset tracks the updates made on the dataset "All-Payer Claims Data (APD De-Identified): Prescription Drug Summary 2021" as a repository for previous versions of the data and metadata.
S
Thyweill
health.data.ny.gov
application/rdfxml +5
Updated Sep 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York State Department of Health (2019). Thyweill [Dataset]. https://health.data.ny.gov/Health/Thyweill/uqw2-z9tw
Explore at:
xml, json, csv, application/rdfxml, application/rssxml, tsvAvailable download formats
Dataset updated
Sep 9, 2019
Authors
New York State Department of Health
Description
The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified dataset contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data contains basic record level detail regarding the discharge; however the data does not contain protected health information (PHI) under Health Insurance Portability and Accountability Act (HIPAA). The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed. A downloadable file with this data is available for ease of download at: https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/3m9u-ws8e. For more information check out: http://www.health.ny.gov/statistics/sparcs/ or go to the “About” tab.
Data from: Epilepsy-iEEG-Multicenter-Dataset
openneuro.org
Updated Dec 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Li; Sara Inati; Kareem Zaghloul; Nathan Crone; William Anderson; Emily Johnson; Iahn Cajigas; Damian Brusko; Jonathan Jagid; Angel Claudio; Andres Kanner; Jennifer Hopp; Stephanie Chen; Jennifer Haagensen; Sridevi Sarma (2020). Epilepsy-iEEG-Multicenter-Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds003029.v1.0.2
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds003029.v1.0.2
Dataset updated
Dec 2, 2020
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Adam Li; Sara Inati; Kareem Zaghloul; Nathan Crone; William Anderson; Emily Johnson; Iahn Cajigas; Damian Brusko; Jonathan Jagid; Angel Claudio; Andres Kanner; Jennifer Hopp; Stephanie Chen; Jennifer Haagensen; Sridevi Sarma
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Fragility Multi-Center Retrospective Study

iEEG and EEG data from 5 centers is organized in our study with a total of 100 subjects. We publish 4 centers' dataset here due to data sharing issues.

Acquisitions include ECoG and SEEG. Each run specifies a different snapshot of EEG data from that specific subject's session. For seizure sessions, this means that each run is a EEG snapshot around a different seizure event.

For additional clinical metadata about each subject, refer to the clinical Excel table in the publication.

Data Availability

NIH, JHH, UMMC, and UMF agreed to share. Cleveland Clinic did not, so requires an additional DUA.

All data, except for Cleveland Clinic was approved by their centers to be de-identified and shared. All data in this dataset have no PHI, or other identifiers associated with patient. In order to access Cleveland Clinic data, please forward all requests to Amber Sours, SOURSA@ccf.org:

Amber Sours, MPH Research Supervisor | Epilepsy Center Cleveland Clinic | 9500 Euclid Ave. S3-399 | Cleveland, OH 44195 (216) 444-8638

You will need to sign a data use agreement (DUA).

Sourcedata

For each subject, there was a raw EDF file, which was converted into the BrainVision format with mne_bids. Each subject with SEEG implantation, also has an Excel table, called electrode_layout.xlsx, which outlines where the clinicians marked each electrode anatomically. Note that there is no rigorous atlas applied, so the main points of interest are: WM, GM, VENTRICLE, CSF, and OUT, which represent white-matter, gray-matter, ventricle, cerebrospinal fluid and outside the brain. WM, Ventricle, CSF and OUT were removed channels from further analysis. These were labeled in the corresponding BIDS channels.tsv sidecar file as status=bad. The dataset uploaded to openneuro.org does not contain the sourcedata since there was an extra anonymization step that occurred when fully converting to BIDS.

Derivatives

Derivatives include: * fragility analysis * frequency analysis * graph metrics analysis * figures

These can be computed by following the following paper: Neural Fragility as an EEG Marker for the Seizure Onset Zone

Events and Descriptions

Within each EDF file, there contain event markers that are annotated by clinicians, which may inform you of specific clinical events that are occuring in time, or of when they saw seizures onset and offset (clinical and electrographic).

During a seizure event, specifically event markers may follow this time course:

* eeg onset, or clinical onset - the onset of a seizure that is either marked electrographically, or by clinical behavior. Note that the clinical onset may not always be present, since some seizures manifest without clinical behavioral changes. * Marker/Mark On - these are usually annotations within some cases, where a health practitioner injects a chemical marker for use in ICTAL SPECT imaging after a seizure occurs. This is commonly done to see which portions of the brain are active metabolically. * Marker/Mark Off - This is when the ICTAL SPECT stops imaging. * eeg offset, or clinical offset - this is the offset of the seizure, as determined either electrographically, or by clinical symptoms.

Other events included may be beneficial for you to understand the time-course of each seizure. Note that ICTAL SPECT occurs in all Cleveland Clinic data. Note that seizure markers are not consistent in their description naming, so one might encode some specific regular-expression rules to consistently capture seizure onset/offset markers across all dataset. In the case of UMMC data, all onset and offset markers were provided by the clinicians on an Excel sheet instead of via the EDF file. So we went in and added the annotations manually to each EDF file.

Seizure Electrographic and Clinical Onset Annotations

For various datasets, there are seizures present within the dataset. Generally there is only one seizure per EDF file. When seizures are present, they are marked electrographically (and clinically if present) via standard approaches in the epilepsy clinical workflow.

Clinical onset are just manifestation of the seizures with clinical syndromes. Sometimes the maker may not be present.

Seizure Onset Zone Annotations

What is actually important in the evaluation of datasets is the clinical annotations of their localization hypotheses of the seizure onset zone.

These generally include:

* early onset: the earliest onset electrodes participating in the seizure that clinicians saw * early/late spread (optional): the electrodes that showed epileptic spread activity after seizure onset. Not all seizures has spread contacts annotated.

Surgical Zone (Resection or Ablation) Annotations

For patients with the post-surgical MRI available, then the segmentation process outlined above tells us which electrodes were within the surgical removed brain region.

Otherwise, clinicians give us their best estimate, of which electrodes were resected/ablated based on their surgical notes.

For surgical patients whose postoperative medical records did not explicitly indicate specific resected or ablated contacts, manual visual inspection was performed to determine the approximate contacts that were located in later resected/ablated tissue. Postoperative T1 MRI scans were compared against post-SEEG implantation CT scans or CURRY coregistrations of preoperative MRI/post SEEG CT scans. Contacts of interest in and around the area of the reported resection were selected individually and the corresponding slice was navigated to on the CT scan or CURRY coregistration. After identifying landmarks of that slice (e.g. skull shape, skull features, shape of prominent brain structures like the ventricles, central sulcus, superior temporal gyrus, etc.), the location of a given contact in relation to these landmarks, and the location of the slice along the axial plane, the corresponding slice in the postoperative MRI scan was navigated to. The resected tissue within the slice was then visually inspected and compared against the distinct landmarks identified in the CT scans, if brain tissue was not present in the corresponding location of the contact, then the contact was marked as resected/ablated. This process was repeated for each contact of interest.

References

Adam Li, Chester Huynh, Zachary Fitzgerald, Iahn Cajigas, Damian Brusko, Jonathan Jagid, Angel Claudio, Andres Kanner, Jennifer Hopp, Stephanie Chen, Jennifer Haagensen, Emily Johnson, William Anderson, Nathan Crone, Sara Inati, Kareem Zaghloul, Juan Bulacio, Jorge Gonzalez-Martinez, Sridevi V. Sarma. Neural Fragility as an EEG Marker of the Seizure Onset Zone. bioRxiv 862797; doi: https://doi.org/10.1101/862797

Appelhoff, S., Sanderson, M., Brooks, T., Vliet, M., Quentin, R., Holdgraf, C., Chaumon, M., Mikulan, E., Tavabi, K., Höchenberger, R., Welke, D., Brunner, C., Rockhill, A., Larson, E., Gramfort, A. and Jas, M. (2019). MNE-BIDS: Organizing electrophysiological data into the BIDS format and facilitating their analysis. Journal of Open Source Software 4: (1896). https://doi.org/10.21105/joss.01896

Holdgraf, C., Appelhoff, S., Bickel, S., Bouchard, K., D'Ambrosio, S., David, O., … Hermes, D. (2019). iEEG-BIDS, extending the Brain Imaging Data Structure specification to human intracranial electrophysiology. Scientific Data, 6, 102. https://doi.org/10.1038/s41597-019-0105-7

Pernet, C. R., Appelhoff, S., Gorgolewski, K. J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R. (2019). EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Scientific Data, 6, 103. https://doi.org/10.1038/s41597-019-0104-8
h
Optimum Patient Care Research Database (OPCRD)
healthdatagateway.org
unknown
Updated Sep 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Optimum Patient Care (OPC) (2024). Optimum Patient Care Research Database (OPCRD) [Dataset]. http://doi.org/10.2147/POR.S395632
Explore at:
unknownAvailable download formats
Unique identifier
https://doi.org/10.2147/POR.S395632
Dataset updated
Sep 8, 2024
Dataset provided by
Optimum Patient Care Limited
Authors
Optimum Patient Care (OPC)
License
https://opcrd.co.uk/our-database/data-requests/https://opcrd.co.uk/our-database/data-requests/
Description
About OPCRD

Optimum Patient Care Research Database (OPCRD) is a real-world, longitudinal, research database that provides anonymised data to support scientific, medical, public health and exploratory research. OPCRD is established, funded and maintained by Optimum Patient Care Limited (OPC) – which is a not-for-profit social enterprise that has been providing quality improvement programmes and research support services to general practices across the UK since 2005.

Key Features of OPCRD

OPCRD has been purposefully designed to facilitate real-world data collection and address the growing demand for observational and pragmatic medical research, both in the UK and internationally. Data held in OPCRD is representative of routine clinical care and thus enables the study of ‘real-world’ effectiveness and health care utilisation patterns for chronic health conditions.

OPCRD unique qualities which set it apart from other research data resources: • De-identified electronic medical records of more than 24.9 million patients • OPCRD covers all major UK primary care clinical systems • OPCRD covers approximately 35% of the UK population • One of the biggest primary care research networks in the world, with over 1,175 practices • Linked patient reported outcomes for over 68,000 patients including Covid-19 patient reported data • Linkage to secondary care data sources including Hospital Episode Statistics (HES)

Data Available in OPCRD

OPCRD has received data contributions from over 1,175 practices and currently holds de-identified research ready data for over 24.9 million patients or data subjects. This includes longitudinal primary care patient data and any data relevant to the management of patients in primary care, and thus covers all conditions. The data is derived from both electronic health records (EHR) data and patient reported data from patient questionnaires delivered as part of quality improvement. OPCRD currently holds over 68,000 patient reported questionnaire data on Covid-19, asthma, COPD and rare diseases.

Approvals and Governance

OPCRD has NHS research ethics committee (REC) approval to provide anonymised data for scientific and medical research since 2010, with its most recent approval in 2020 (NHS HRA REC ref: 20/EM/0148). OPCRD is governed by the Anonymised Data Ethics and Protocols Transparency committee (ADEPT). All research conducted using anonymised data from OPCRD must gain prior approval from ADEPT. Proceeds from OPCRD data access fees and detailed feasibility assessments are re-invested into OPC services for the continued free provision of patient quality improvement programmes for contributing practices and patients.

For more information on OPCRD please visit: https://opcrd.co.uk/

Facebook

Twitter

Click to copy link

Link copied

Cite

John Snow Labs (2021). New York State Hospital De-Identified Data Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/new-york-state-hospital-de-identified-data-data-package/

New York State Hospital De-Identified Data Data Package

New York Record Level Data;New York De-identified Inpatient Data;New York Inpatient Costs Data;Statewide Planning And Research Cooperative System Dataset

Explore at:

csvAvailable download formats

Dataset updated

Jan 20, 2021

Dataset authored and provided by

John Snow Labs

Area covered

New York

Description

This data package shows the information on hospital discharges at patient-level data with basic record details without showing protected health information (PHI) and was made not identifiable. The data is classified by Health Service Area and county.

Clear search

Close search

Google apps

Main menu

New York State Hospital De-Identified Data Data Package

MultiCaRe: An open-source clinical case dataset for medical image...

Data set supplementing "Determinants of Laypersons' Trust in Medical...

NIST Collaborative Research Cycle Data and Metrics Archive

Hospital Inpatient Discharges (SPARCS De-Identified): 2019

RECAP dataset: Subject, exposure, and health endpoint (blood, lipids,...

Envestnet | Yodlee's De-Identified Online Purchase Data | Row/Aggregate...

Smart Triage Jinja Data De-identification

MIMIC-IV

Intuizi's De-identified Location Data for Brazil | 6.6+mm Unique Daily...

Land Tenure (de-identified) (LGATE-457) - Datasets - data.wa.gov.au

IC3D participant dataset deidentified

hernia

Antibiotic Resistance Microbiology Dataset (ARMD): A de-identified resource...

De-identified Data from the PArTNER Study: A Pragmatic Clinical Trial to...

mimic-iv-clinical-database-demo-2.2

All-Payer Claims Data (APD De-Identified): Prescription Drug Summary 2021 -...

Thyweill

Data from: Epilepsy-iEEG-Multicenter-Dataset

Fragility Multi-Center Retrospective Study

Data Availability

Sourcedata

Derivatives

Events and Descriptions

Seizure Electrographic and Clinical Onset Annotations

Seizure Onset Zone Annotations

Surgical Zone (Resection or Ablation) Annotations

References

Optimum Patient Care Research Database (OPCRD)

New York State Hospital De-Identified Data Data Package

New York Record Level Data;New York De-identified Inpatient Data;New York Inpatient Costs Data;Statewide Planning And Research Cooperative System Dataset