https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Community Acquired Pneumonia (CAP) is the leading cause of infectious death and the third leading cause of death globally. Disease severity and outcomes are highly variable, dependent on host factors (such as age, smoking history, frailty and comorbidities), microbial factors (the causative organism) and what treatments are given. Clinical decision pathways are complex and despite guidelines, there is significant national variability in how guidelines are adhered to and patient outcomes.
For clinicians treating pneumonia in the hospital setting, care of these patients can be challenging. Key decisions include the type of antibiotics (oral or intravenous), the appropriate place of care (home, hospital or intensive care), and when it is appropriate to stop antibiotics. Decision support tools to help inform clinical management would be highly valuable to the clinical community.
This dataset is synthetic, formed from statistical modelling using real patient data, and represents a population with significant diversity in terms of patient demography, socio-economic status, CAP severity, treatments and outcomes. It can be used to develop code for deployment on real data, train data analysts and increase familiarity with this disease and its management.
PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix.
EHR. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”. This synthetic dataset has been modelled to reflect data collected from this EHR.
Scope: A synthetic dataset which has been statistically modelled on all hospitalised patients admitted to UHB with Community Acquired Pneumonia. The dataset includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care including timings, admissions, escalation of care to ITU, discharge outcomes, physiology readings (heart rate, blood pressure, AVPU score and others), blood results and drug prescribing and administration.
Available supplementary data: Matched synthetic controls; ambulance, OMOP data, real patient CAP data. Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
The National Hospital Ambulatory Medical Care Survey (NHAMCS) has been fielded annually since 1992 to collect data on the utilization and provision of ambulatory care services in hospital emergency and outpatient departments. Data collection from hospital-based ambulatory surgery centers began in 2009. And between 2010 and 2012 NHAMCS gathered data on visits to freestanding ambulatory surgery centers. In 2018, the survey began focusing on just the ambulatory visits made to emergency departments. Each emergency department is randomly assigned to a 4-week reporting period. During this period, data for a systematic random sample of visits are recorded by Census interviewers using a computerized Patient Record Form. Data are obtained on patient characteristics such as age, sex, race, and ethnicity, and visit characteristics such as patient’s reason for visit, provider’s diagnosis, services ordered or provided, and treatments, including medication therapy. In addition, data about the facility are collected as part of a survey induction interview.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains information from a cohort of 799 patients admitted in the hospital for COVID-19, characterized with sociodemographic and clinical data. Retrospectively, from November 2020 to January 2021, data was collected from the medical records of all hospital admissions that occurred from March 1st, 2020, to December 31st, 2020. The analysis of these data can contribute to the definition of the clinical and sociodemographic profile of patients with COVID-19. Understanding these data can contribute to elucidating the sociodemographic profile, clinical variables and health conditions of patients hospitalized by COVID-19. To this end, this database contains a wide range of variables, such as: Month of hospitalization Gender Age group Ethnicity Marital status Paid work Admission to clinical ward Hospitalization in the Intensive Care Unit (ICU)COVID-19 diagnosisNumber of times hospitalized by COVID-19Hospitalization time in daysRisk Classification ProtocolData is presented as a single Excel XLSX file: dataset.xlsx of clinical and sociodemographic characteristics of hospital admissions by COVID-19: retrospective cohort of patients in two hospitals in the Southern of Brazil. Researchers interested in studying the data related to patients affected by COVID-19 can extensively explore the variables described here. Approved by the Research Ethics Committee (No. 4.323.917/2020) of the Federal University of Santa Catarina.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset comprises public health records, surveillance systems, and environmental monitoring data collected from multiple regions over several years. It contains 43,689 entries that provide a comprehensive view of public health dynamics, essential for understanding disease dissemination and guiding effective control strategies. The data has been meticulously gathered from regional health departments, hospitals, laboratories, and public health organizations, ensuring a high level of quality, consistency, and completeness. Each record has been anonymized and aggregated to protect sensitive information.
This dataset serves as a vital resource for researchers, policymakers, and health professionals aiming to analyze and predict public health trends, assess the impact of environmental factors, and improve epidemic response strategies.
Features The dataset includes the following features:
Age: Age of the individual in years. Gender: Gender of the individual (Male, Female, Other). Location: Geographic location (Urban, Rural, Suburban). Ethnicity: Ethnicity of the individual. Socioeconomic Status (SES): Socioeconomic status categorized as Low, Medium, or High. Chronic Conditions: Presence of chronic health conditions. Vaccination Status: Whether the individual is vaccinated (Yes, No). Medical History: Previous medical history (None, Past Illness, Chronic). Immunity Level: Estimated level of immunity (Low, Medium, High). Reported Symptoms: Type and severity of symptoms reported. Transmission Rate: Rate of disease transmission within the population. Daily New Cases: Number of new cases reported daily. Healthcare Personnel Availability: Availability of healthcare workers in the region. Hospital Capacity: Number of beds and resources available in healthcare facilities. Environmental Factors: Data related to air quality, temperature, and other environmental variables influencing health outcomes. Hospitalization Requirement: Predicted level of hospitalization needed based on reported symptoms and medical history. This dataset is ideal for various analytical tasks, including predictive modeling, classification, and feature exploration, making it a valuable asset for advancing public health research.
This dataset includes aggregated weekly data on the percent of emergency department visits and the percent of hospital inpatient admissions due to influenza-like illness (ILI), COVID-19, influenza, RSV, and acute respiratory illness. The Illinois Department of Public Health (IDPH) collects data for Emergency Department visits to all 185 acute care hospitals in Illinois. The data are submitted from IDPH to the CDC’s BioSense Platform for access and analysis by health departments via the ESSENCE system. The CDC National Syndromic Surveillance Program (NSSP) utilizes diagnostic codes and clinical terms to create definitions for diagnosed COVID-19, influenza, RSV, and acute respiratory illness. For more information on diagnostic codes and clinical terms used, visit: https://www.cdc.gov/nssp/php/onboarding-resources/companion-guide-ed-data-respiratory-illness.html The data is characterized by selected demographic groups including age group and race/ethnicity. The dataset also includes percent of weekly outpatient visits due to ILI as reported by several outpatient clinics throughout Chicago that participate in CDC’s Influenza-like Illness Surveillance Network (ILINet). For more information on ESSENCE, see https://www.dph.illinois.gov/data-statistics/syndromic-surveillance For more information on ILINet, see https://www.cdc.gov/fluview/overview/index.html#cdc_generic_section_3-outpatient-illness-surveillance All data are provisional and subject to change. Information is updated as additional details are received. At any given time, this dataset reflects data currently known to CDPH. Numbers in this dataset may differ from other public sources.
Dataset Title: A Gold Standard Corpus for Activity Information (GoSCAI)
Dataset Curators: The Epidemiology & Biostatistics Section of the NIH Clinical Center Rehabilitation Medicine Department
Dataset Version: 1.0 (May 16, 2025)
Dataset Citation and DOI: NIH CC RMD Epidemiology & Biostatistics Section. (2025). A Gold Standard Corpus for Activity Information (GoSCAI) [Data set]. Zenodo. doi: 10.5281/zenodo.15528545
This data statement is for a gold standard corpus of de-identified clinical notes that have been annotated for human functioning information based on the framework of the WHO's International Classification of Functioning, Disability and Health (ICF). The corpus includes 484 notes from a single institution within the United States written in English in a clinical setting. This dataset was curated for the purpose of training natural language processing models to automatically identify, extract, and classify information on human functioning at the whole-person, or activity, level.
This dataset is curated to be a publicly available resource for the development and evaluation of methods for the automatic extraction and classification of activity-level functioning information as defined in the ICF. The goals of data curation are to 1) create a corpus of a size that can be manually deidentified and annotated, 2) maximize the density and diversity of functioning information of interest, and 3) allow public dissemination of the data.
Language Region: en-US
Prose Description: English as written by native and bilingual English speakers in a clinical setting
The language users represented in this dataset are medical and clinical professionals who work in a research hospital setting. These individuals hold professional degrees corresponding to their respective specialties. Specific demographic characteristics of the language users such as age, gender, or race/ethnicity were not collected.
The annotator group consisted of five people, 33 to 76 years old, including four females and one male. Socioeconomically, they came from the middle and upper-middle income classes. Regarding first language, three annotators had English as their first language, one had Chinese, and one had Spanish. Proficiency in English, the language of the data being annotated, was native for three of the annotators and bilingual for the other two. The annotation team included clinical rehabilitation domain experts with backgrounds in occupational therapy, physical therapy, and individuals with public health and data science expertise. Prior to annotation, all annotators were trained on the specific annotation process using established guidelines for the given domain, and annotators were required to achieve a specified proficiency level prior to annotating notes in this corpus.
The notes in the dataset were written as part of clinical care within a U.S. research hospital between May 2008 and November 2019. These notes were written by health professionals asynchronously following the patient encounter to document the interaction and support continuity of care. The intended audience of these notes were clinicians involved in the patients' care. The included notes come from nine disciplines - neuropsychology, occupational therapy, physical medicine (physiatry), physical therapy, psychiatry, recreational therapy, social work, speech language pathology, and vocational rehabilitation. The notes were curated to support research on natural language processing for functioning information between 2018 and 2024.
The final corpus was derived from a set of clinical notes extracted from the hospital electronic medical record (EMR) for the purpose of clinical research. The original data include character-based digital content originally. We work in ASCII 8 or UNICODE encoding, and therefore part of our pre-processing includes running encoding detection and transformation from encodings such as Windows-1252 or ISO-8859 format to our preferred format.
On the larger corpus, we applied sampling to match our curation rationale. Given the resource constraints of manual annotation, we set out to create a dataset of 500 clinical notes, which would exclude notes over 10,000 characters in length.
To promote density and diversity, we used five note characteristics as sampling criteria. We used the text length as expressed in number of characters. Next, we considered the discipline group as derived from note type metadata and describes which discipline a note originated from: occupational and vocational therapy (OT/VOC), physical therapy (PT), recreation therapy (RT), speech and language pathology (SLP), social work (SW), or miscellaneous (MISC, including psychiatry, neurology and physiatry). These disciplines were selected for collecting the larger corpus because their notes are likely to include functioning information. Existing information extraction tools were used to obtain annotation counts in four areas of functioning and provided a note’s annotation count, annotation density (annotation count divided by text length), and domain count (number of domains with at least 1 annotation).
We used stratified sampling across the 6 discipline groups to ensure discipline diversity in the corpus. Because of low availability, 50 notes were sampled from SLP with relaxed criteria, and 90 notes each from the 5 other discipline groups with stricter criteria. Sampled SLP notes were those with the highest annotation density that had an annotation count of at least 5 and a domain count of at least 2. Other notes were sampled by highest annotation count and lowest text length, with a minimum annotation count of 15 and minimum domain count of 3.
The notes in the resulting sample included certain types of PHI and PII. To prepare for public dissemination, all sensitive or potentially identifying information was manually annotated in the notes and replaced with substituted content to ensure readability and enough context needed for machine learning without exposing any sensitive information. This de-identification effort was manually reviewed to ensure no PII or PHI exposure and correct any resulting readability issues. Notes about pediatric patients were excluded. No intent was made to sample multiple notes from the same patient. No metadata is provided to group notes other than by note type, discipline, or discipline group. The dataset is not organized beyond the provided metadata, but publications about models trained on this dataset should include information on the train/test splits used.
All notes were sentence-segmented and tokenized using the spaCy en_core_web_lg model with additional rules for sentence segmentation customized to the dataset. Notes are stored in an XML format readable by the GATE annotation software (https://gate.ac.uk/family/developer.html), which stores annotations separately in annotation sets.
As the clinical notes were extracted directly from the EMR in text format, the capture quality was determined to be high. The clinical notes did not have to be converted from other data formats, which means this dataset is free from noise introduced by conversion processes such as optical character recognition.
Because of the effort required to manually deidentify and annotate notes, this corpus is limited in terms of size and representation. The curation decisions skewed note selection towards specific disciplines and note types to increase the likelihood of encountering information on functioning. Some subtypes of functioning occur infrequently in the data, or not at all. The deidentification of notes was done in a manner to preserve natural language as it would occur in the notes, but some information is lost, e.g. on rare diseases.
Information on the manual annotation process is provided in the annotation guidelines for each of the four domains:
- Communication & Cognition (https://zenodo.org/records/13910167)
- Mobility (https://zenodo.org/records/11074838)
- Self-Care & Domestic Life (SCDL) (https://zenodo.org/records/11210183)
- Interpersonal Interactions & Relationships (IPIR) (https://zenodo.org/records/13774684)
Inter-annotator agreement was established on development datasets described in the annotation guidelines prior to the annotation of this gold standard corpus.
The gold standard corpus consists of 484 documents, which include 35,147 sentences in total. The distribution of annotated information is provided in the table below.
Domain |
Number of Annotated Sentences |
% of All Sentences |
Mean Number of Annotated Sentences per Document |
Communication & Cognition |
6033 |
17.2% |
The National Hospital Ambulatory Medical Care Survey (NHAMCS) has been fielded annually since 1992 to collect data on the utilization and provision of ambulatory care services in hospital emergency and outpatient departments. Data collection from hospital-based ambulatory surgery centers began in 2009. And between 2010 and 2012 NHAMCS gathered data on visits to freestanding ambulatory surgery centers. In 2018, the survey began focusing on just the ambulatory visits made to emergency departments. Each emergency department is randomly assigned to a 4-week reporting period. During this period, data for a systematic random sample of visits are recorded by Census interviewers using a computerized Patient Record Form. Data are obtained on patient characteristics such as age, sex, race, and ethnicity, and visit characteristics such as patient’s reason for visit, provider’s diagnosis, services ordered or provided, and treatments, including medication therapy. In addition, data about the facility are collected as part of a survey induction interview.
The National Hospital Discharge Survey (NHDS), conducted from 1965 to 2010, was a national probability survey designed to meet the need for information on characteristics of inpatients discharged from non-Federal short-stay hospitals in the United States. From 1988-2007 the NHDS collected data from a sample of approximately 270,000 inpatient records acquired from a national sample of about 500 hospitals. From 2008 to 2010 the sample size was reduced to 239. Only hospitals with an average length of stay of fewer than 30 days for all patients, general hospitals, or children’s general hospitals are included in the survey. Federal, military, and Department of Veterans Affairs hospitals, as well as hospital units of institutions (such as prison hospitals), and hospitals with fewer than six beds staffed for patient use, are excluded.
Beginning in 1988, two data collection procedures have been used in the survey. The medical abstract form and the automated data tapes contain items that relate to the personal characteristics of the patient. These items include age, sex, race, ethnicity, marital status, and expected sources of payment. Administrative items such as admission and discharge dates (which allow calculation of length of stay), as well as discharge status are also included. Medical information about patients includes diagnoses and procedures coded to the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM).
The Drug Abuse Warning Network (DAWN) is a nationally representative public health surveillance system that has monitored drug related emergency department (ED) visits to hospitals since the early 1970s. First administered by the Drug Enforcement Administration (DEA) and the National Institute on Drug Abuse (NIDA), the responsibility for DAWN now rests with the Substance Abuse and Mental Health Services Administration's (SAMHSA) Center for Behavioral Health Statistics and Quality (CBHSQ). Over the years, the exact survey methodology has been adjusted to improve the quality, reliability, and generalizability of the information produced by DAWN. The current approach was first fully implemented in the 2004 data collection year. DAWN relies on a longitudinal probability sample of hospitals located throughout the United States. To be eligible for selection into the DAWN sample, a hospital must be a non-Federal, short-stay, general surgical and medical hospital located in the United States, with at least one 24-hour ED. DAWN cases are identified by the systematic review of ED medical records in participating hospitals. The unit of analysis is any ED visit involving recent drug use. DAWN captures both ED visits that are directly caused by drugs and those in which drugs are a contributing factor but not the direct cause of the ED visit. The reason a patient used a drug is not part of the criteria for considering a visit to be drug-related. Therefore, all types of drug-related events are included: drug misuse or abuse, accidental drug ingestion, drug-related suicide attempts, malicious drug poisonings, and adverse reactions. DAWN does not report medications that are unrelated to the visit. The DAWN public-use dataset provides information for all types of drugs, including illegal drugs, prescription drugs, over-the-counter medications, dietary supplements, anesthetic gases, substances that have psychoactive effects when inhaled, alcohol when used in combination with other drugs (all ages), and alcohol alone (only for patients aged 20 or younger). Public-use dataset variables describe and categorize up to 22 drugs contributing to the ED visit, including toxicology confirmation and route of administration. Administrative variables specify the type of case, case disposition, categorized episode time of day, and quarter of year. Metropolitan area is included for represented metropolitan areas. Created variables include the number of unique drugs reported and case-level indicators for alcohol, non-alcohol illicit substances, any pharmaceutical, non-medical use of pharmaceuticals, and all misuse and abuse of drugs. Demographic items include age category, sex, and race/ethnicity. Complex sample design and weighting variables are included to calculate various estimates of drug-related ED visits for the Nation as a whole, as well as for specific metropolitan areas, from the ED visits classified as DAWN cases in the selected hospitals.This study has 1 Data Set.
NOTE: This dataset has been retired and marked as historical-only.
This dataset is a companion to the COVID-19 Daily Cases and Deaths dataset (https://data.cityofchicago.org/d/naz8-j4nc). The major difference in this dataset is that the case, death, and hospitalization corresponding rates per 100,000 population are not those for the single date indicated. They are rolling averages for the seven-day period ending on that date. This rolling average is used to account for fluctuations that may occur in the data, such as fewer cases being reported on weekends, and small numbers. The intent is to give a more representative view of the ongoing COVID-19 experience, less affected by what is essentially noise in the data.
All rates are per 100,000 population in the indicated group, or Chicago, as a whole, for “Total” columns.
Only Chicago residents are included based on the home address as provided by the medical provider.
Cases with a positive molecular (PCR) or antigen test are included in this dataset. Cases are counted based on the date the test specimen was collected. Deaths among cases are aggregated by day of death. Hospitalizations are reported by date of first hospital admission. Demographic data are based on what is reported by medical providers or collected by CDPH during follow-up investigation.
Denominators are from the U.S. Census Bureau American Community Survey 1-year estimate for 2018 and can be seen in the Citywide, 2018 row of the Chicago Population Counts dataset (https://data.cityofchicago.org/d/85cm-7uqa).
All data are provisional and subject to change. Information is updated as additional details are received and it is, in fact, very common for recent dates to be incomplete and to be updated as time goes on. At any given time, this dataset reflects cases and deaths currently known to CDPH.
Numbers in this dataset may differ from other public sources due to definitions of COVID-19-related cases and deaths, sources used, how cases and deaths are associated to a specific date, and similar factors.
Data Source: Illinois National Electronic Disease Surveillance System, Cook County Medical Examiner’s Office, U.S. Census Bureau American Community Survey
https://www.icpsr.umich.edu/web/ICPSR/studies/39216/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39216/terms
These data were collected using the National Electronic Injury Surveillance System (NEISS), the primary data system of the United States Consumer Product Safety Commission (CPSC). CPSC began operating NEISS in 1972 to monitor product-related injuries treated in United States hospital emergency departments (EDs). In June 1992, the National Center for Injury Prevention and Control (NCIPC), within the Centers for Disease Control and Prevention, established an interagency agreement with CPSC to begin collecting data on nonfatal firearm-related injuries in order to monitor the incidents and the characteristics of persons with nonfatal firearm-related injuries treated in United States hospital EDs over time. This dataset represents all nonfatal firearm-related injuries (i.e., injuries associated with powder-charged guns) and all nonfatal BB and pellet gun-related injuries reported through NEISS from YYYY. The cases consist of initial ED visits for treatment of the injuries. The NEISS-FISS is designed to provide national incidence estimates of nonfatal firearm injuries treated in U.S. hospital EDs. Data on injury-related visits are obtained from a national sample of NEISS hospitals, which were selected as a stratified probability sample of hospitals in the United States and its territories with a minimum of six beds and a 24- hour ED. The sample includes separate strata for very large, large, medium, and small hospitals, defined by the number of annual ED visits per hospital, and children's hospitals. The scope of reporting goes beyond routine reporting of injuries associated with consumer- related products in CPSC's jurisdiction to include all firearm injuries. The data can be used to (1) measure the magnitude and distribution of nonfatal firearm injuries in the United States; (2) monitor unintentional and violence-related nonfatal firearm injuries over time; (3) identify emerging injury problems; (4) identify specific cases for follow-up investigations of particular injury-related problems; and (5) set national priorities. A fundamental principle of this expansion effort is that preliminary surveillance data will be made available in a timely manner to a number of different federal agencies with unique and overlapping public health responsibilities and concerns. The final edited data will be released annually as public use data files for use by other public health professionals and researchers. These public use data files provide NEISS-FISS data on nonfatal injuries collected from January through December each year. NEISS-FISS is providing data on over 100,000 estimated cases annually. Data obtained on each case include age, race/ethnicity, sex, principal diagnosis, primary body part affected, consumer products involved, disposition at ED discharge (i.e., hospitalized, transferred, treated and released, observation, died), locale where the injury occurred, work-relatedness, and a narrative description of the injury circumstances. Also, intent of injury (e.g., unintentional, assault, self-harm, legal intervention) are being coded for each case in a manner consistent with the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) coding rules and guidelines. Users are cautioned against using estimates with wide confidence intervals to make conclusions about point estimates. Firearm injuries have distinct geographic patterns and estimates can be imprecise or change over time when based on a small number of facilities. NEISS has been managed and operated by the U.S. Consumer Product Safety Commission since 1972 and is used by the Commission for identifying and monitoring consumer product-related injuries and for assessing risk to all U.S. residents. These product- related injury data are used for educating consumers about hazardous products and for identifying injury-related cases used in detailed studies of specific products and associated hazard patterns. These studies set the stage for developing both voluntary and mandatory safety standards. Since the early 1980s, CPSC has assisted other federal agencies by using NEISS to collect injury- related data of special interest to them. In 1992, an interagency agreement was established between NCIPC and CPSC to (1) collect NEISS data on nonfatal firearm- related injuries for the CDC Firearm Injury Surveillance Study; (2) publish NEISS d
This is an update to the MSSA geometries and demographics to reflect the new 2020 Census tract data. The Medical Service Study Area (MSSA) polygon layer represents the best fit mapping of all new 2020 California census tract boundaries to the original 2010 census tract boundaries used in the construction of the original 2010 MSSA file. Each of the state's new 9,129 census tracts was assigned to one of the previously established medical service study areas (excluding tracts with no land area), as identified in this data layer. The MSSA Census tract data is aggregated by HCAI, to create this MSSA data layer. This represents the final re-mapping of 2020 Census tracts to the original 2010 MSSA geometries. The 2010 MSSA were based on U.S. Census 2010 data and public meetings held throughout California.
https://www.icpsr.umich.edu/web/ICPSR/studies/31921/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/31921/terms
The Drug Abuse Warning Network (DAWN) is a nationally representative public health surveillance system that has monitored drug related emergency department (ED) visits to hospitals since the early 1970s. First administered by the Drug Enforcement Administration (DEA) and the National Institute on Drug Abuse (NIDA), the responsibility for DAWN now rests with the Substance Abuse and Mental Health Services Administration's (SAMHSA) Center for Behavioral Health Statistics and Quality (CBHSQ). Over the years, the exact survey methodology has been adjusted to improve the quality, reliability, and generalizability of the information produced by DAWN. The current approach was first fully implemented in the 2004 data collection year. DAWN relies on a longitudinal probability sample of hospitals located throughout the United States. To be eligible for selection into the DAWN sample, a hospital must be a non-Federal, short-stay, general surgical and medical hospital located in the United States, with at least one 24-hour ED. DAWN cases are identified by the systematic review of ED medical records in participating hospitals. The unit of analysis is any ED visit involving recent drug use. DAWN captures both ED visits that are directly caused by drugs and those in which drugs are a contributing factor but not the direct cause of the ED visit. The reason a patient used a drug is not part of the criteria for considering a visit to be drug-related. Therefore, all types of drug-related events are included: drug misuse or abuse, accidental drug ingestion, drug-related suicide attempts, malicious drug poisonings, and adverse reactions. DAWN does not report medications that are unrelated to the visit. The DAWN public-use dataset provides information for all types of drugs, including illegal drugs, prescription drugs, over-the-counter medications, dietary supplements, anesthetic gases, substances that have psychoactive effects when inhaled, alcohol when used in combination with other drugs (all ages), and alcohol alone (only for patients aged 20 or younger). Public-use dataset variables describe and categorize up to 22 drugs contributing to the ED visit, including toxicology confirmation and route of administration. Administrative variables specify the type of case, case disposition, categorized episode time of day, and quarter of year. Metropolitan area is included for represented metropolitan areas. Created variables include the number of unique drugs reported and case-level indicators for alcohol, non-alcohol illicit substances, any pharmaceutical, non-medical use of pharmaceuticals, and all misuse and abuse of drugs. Demographic items include age category, sex, and race/ethnicity. Complex sample design and weighting variables are included to calculate various estimates of drug-related ED visits for the Nation as a whole, as well as for specific metropolitan areas, from the ED visits classified as DAWN cases in the selected hospitals.
This dataset contains the data on which the conclusions of the study "Impact of neighbourhood-level socioeconomic status, traditional coronary risk factors, and ancestry on age at myocardial infarction onset: A population-based register study" rely. We collected data registered in the Norwegian Myocardial Infarction Register for all patients admitted to Diakonhjemmet Hospital with a non-ST elevation myocardial infarction (NSTEMI) in 2014-2017 (n=840). Using the patients' registered postal codes, we identified in which city district in Oslo, Norway the patients were residing. Patients from districts other than Frogner, Vestre Aker, Ullern, Stovner, Grorud, and Alna were excluded (n=60), and the remaining patients were grouped according to whether they were residing in the western (high neighbourhood-level socioeconomic status (SES)) or north-eastern (low neighbourhood-level SES) city districts. Using the patients' registered social security numbers and the electronic medical record system at Diakonhjemmet Hospital, patients were grouped according to whether or not they had presumed Northwest-European ancestry based on their names and other information found in their medical records. Patients with undecidable ancestry (n=2) were excluded. Furthermore, patients with type 2 myocardial infarction (n=117) were excluded since we aimed to investigate the risk for coronary heart disease (CHD). Re-admissions in the period (n=55) were excluded, and we were left with 606 patients. The dataset contains patient data on city district group, presumed ancestry group, age at hospital admission with NSTEMI, history of previous acute myocardial infarction (AMI), prior diagnosis of diabetes, prior diagnosis of hypertension, cigarette smoking status, use of statins, body mass index (BMI), and serum levels of low-density lipoprotein (LDL) cholesterol. Raw data from the Norwegian Myocardial Infarction Register, which was used to generate variables on the patients' presumed ancestry and city-district group, is not made available as it contains personal data, but can be applied for at helsedata.no. Previous AMI was defined regardless of infarction type and ECG diagnosis, prior diagnosis of diabetes was defined as known diagnosis with diabetes mellitus type 1 or 2, prior diagnosis of hypertension was defined as prior or ongoing treatment for hypertension, and cigarette smoking was defined as patients that had been smoking the last month. BMI and LDL cholesterol were measured at hospital admission. Registration of all cases of AMI in Norway in the Norwegian Myocardial Infarction Register is mandatory and does not require informed consent. The Norwegian Myocardial Infarction Register is part of the National Register of Cardiovascular Diseases and is authorized in the Section 11 h of the Norwegian Health Register Act. The study was approved by the Institutional Review Board of Diakonhjemmet Hospital and the data privacy representative for Diakonhjemmet Hospital, and all methods were in accordance with the ethical standards of the institution and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
This data collection contains detailed county and state-level ecological and descriptive data for the United States for the years 1790 to 2002. Parts 1-43 are an update to HISTORICAL, DEMOGRAPHIC, ECONOMIC, AND SOCIAL DATA: THE UNITED STATES, 1790-1970 (ICPSR 0003). Parts 1-41 contain data from the 1790-1970 censuses. They include extensive information about the social and political character of the United States, including a breakdown of population by state, race, nationality, number of families, size of the family, births, deaths, marriages, occupation, religion, and general economic condition. Parts 42 and 43 contain data from the 1840 and 1870 Censuses of Manufacturing, respectively. These files include information about the number of persons employed in various industries and the quantities of different types of manufactured products. Parts 44-50 provide county-level data from the United States Census of Agriculture for 1840 to 1900. They also include the state and national totals for the variables. The files provide data about the number, types, and prices of various agricultural products. Parts 51-57 contain data on religious bodies and church membership for 1906, 1916, 1926, 1936, and 1952, respectively. Parts 58-69 consist of data from the CITY DATA BOOKS for 1944, 1948, 1952, 1956, 1962, 1967, 1972, 1977, 1983, 1988, 1994, and 2000, respectively. These files contain information about population, climate, housing units, hotels, birth and death rates, school enrollment and education expenditures, employment in various industries, and city government finances. Parts 70-81 consist of data from the COUNTY DATA BOOKS for 1947, 1949, 1952, 1956, 1962, 1967, 1972, 1977, 1983, 1988, 1994, and 2000, respectively. These files include information about population, employment, housing, agriculture, manufacturing, retail, services, trade, banking, Social Security, local governments, school enrollment, hospitals, crime, and income. Parts 82-84 contain data from USA COUNTIES 1998. Due to the large number of variables from this source, the data were divided into into three separate data files. Data include information on population, vital statistics, school enrollment, educational attainment, Social Security, labor force, personal income, poverty, housing, trade, farms, ancestry, commercial banks, and transfer payments. Parts 85-106 provide data from the United States Census of Agriculture for 1910 to 2002. They provide data about the amount, types, and prices of various agricultural products. Also, these datasets contain extensive information on the amount, expenses, sales, values, and production of farms and machinery. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR -- https://doi.org/10.3886/ICPSR02896.v3. We highly recommend using the ICPSR version, as they made this dataset available in multiple data formats and updated the data through 2002.
https://www.qresearch.org/information/information-for-researchers/https://www.qresearch.org/information/information-for-researchers/
Hospital Episode Statistics (HES) is a database containing details of all admissions, A and E attendances and outpatient appointments at NHS hospitals in England.
Adult Critical Care (ACC) is a subset of APC data. An Intensive Care Unit (ICU) or High Dependency Unit (HDU) ward in a hospital, known as a critical care unit, provides support, monitoring and treatment for critically ill patients requiring constant support and monitoring to maintain function in at least one organ, and often in multiple organs. Medical equipment is used to take the place of patients’ organs during their recovery.
Some critical care units are attached to condition-specific treatment units, such as heart, kidney, liver, breathing, circulation or nervous disorders. Others specialise in neonatal care (babies), paediatric care (children) or patients with severe injury or trauma.
Initially this data is collected during a patient's time at hospital as part of the Commissioning Data Set (CDS). This is submitted to NHS Digital for processing and is returned to healthcare providers as the Secondary Uses Service (SUS) data set and includes information relating to payment for activity undertaken. It allows hospitals to be paid for the care they deliver.
This same data can also be processed and used for non-clinical purposes, such as research and planning health services. Because these uses are not to do with direct patient care, they are called 'secondary uses'. This is the HES data set.
HES data covers all NHS Clinical Commissioning Groups (CCGs) in England, including:
private patients treated in NHS hospitals patients resident outside of England care delivered by treatment centres (including those in the independent sector) funded by the NHS Each HES record contains a wide range of information about an individual patient admitted to an NHS hospital, including:
clinical information about diagnoses and operations patient information, such as age group, gender and ethnicity administrative information, such as dates and methods of admission and discharge geographical information such as where patients are treated and the area where they live We apply a strict statistical disclosure control in accordance with the NHS Digital protocol, to all published HES data. This suppresses small numbers to stop people identifying themselves and others, to ensure that patient confidentiality is maintained.
https://digital.nhs.uk/services/data-access-request-service-darshttps://digital.nhs.uk/services/data-access-request-service-dars
Hospital Episode Statistics (HES) is a database containing details of all admissions, A and E attendances and outpatient appointments at NHS hospitals in England.
Initially this data is collected during a patient's time at hospital as part of the Commissioning Data Set (CDS). This is submitted to NHS Digital for processing and is returned to healthcare providers as the Secondary Uses Service (SUS) data set and includes information relating to payment for activity undertaken. It allows hospitals to be paid for the care they deliver.
This same data can also be processed and used for non-clinical purposes, such as research and planning health services. Because these uses are not to do with direct patient care, they are called 'secondary uses'. This is the HES data set.
Each HES record contains a wide range of information about an individual patient admitted to an NHS hospital, including:
clinical information about diagnoses and operations
patient information, such as age group, gender and ethnicity
administrative information, such as dates and methods of admission and discharge
geographical information such as where patients are treated and the area where they live
We apply a strict statistical disclosure control in accordance with the NHS Digital protocol, to all published HES data. This suppresses small numbers to stop people identifying themselves and others, to ensure that patient confidentiality is maintained. https://digital.nhs.uk/data-and-information/publications/statistical/hospital-accident--emergency-activity
https://digital.nhs.uk/services/data-access-request-service-darshttps://digital.nhs.uk/services/data-access-request-service-dars
Hospital Episode Statistics (HES) is a database containing details of all admissions, A and E attendances and outpatient appointments at NHS hospitals in England.
Initially this data is collected during a patient's time at hospital as part of the Commissioning Data Set (CDS). This is submitted to NHS Digital for processing and is returned to healthcare providers as the Secondary Uses Service (SUS) data set and includes information relating to payment for activity undertaken. It allows hospitals to be paid for the care they deliver.
This same data can also be processed and used for non-clinical purposes, such as research and planning health services. Because these uses are not to do with direct patient care, they are called 'secondary uses'. This is the HES data set.
HES data covers all NHS Clinical Commissioning Groups (CCGs) in England, including:
Each HES record contains a wide range of information about an individual patient admitted to an NHS hospital, including:
We apply a strict statistical disclosure control in accordance with the NHS Digital protocol, to all published HES data. This suppresses small numbers to stop people identifying themselves and others, to ensure that patient confidentiality is maintained.
Timescales for dissemination can be found under 'Our Service Levels' at the following link: https://digital.nhs.uk/services/data-access-request-service-dars/data-access-request-service-dars-process
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 1997 the Kyrgyz Republic Demographic and Health Survey (KRDHS) is a nationally representative survey of 3,848 women age 15-49. Fieldwork was conducted from August to November 1997. The KRDHS was sponsored by the Ministry of Health (MOH), and was funded by the United States Agency for International Development. The Research Institute of Obstetrics and Pediatrics implemented the survey with technical assistance from the Demographic and Health Surveys (DHS) program. The purpose of the KRDHS was to provide data to the MOH on factors which determine the health status of women and children such as fertility, contraception, induced abortion, maternal care, infant mortality, nutritional status, and anemia. Some statistics presented in this report are currently available to the MOH from other sources. For example, the MOH collects and regularly publishes information on fertility, contraception, induced abortion and infant mortality. However, the survey presents information on these indices in a manner which is not currently available, i.e., by population subgroups such as those defined by age, marital duration, education, and ethnicity. Additionally, the survey provides statistics on some issues not previously available in the Kyrgyz Republic: for example, breastfeeding practices and anemia status of women and children. When considered together, existing MOH data and the KRDHS data provide a more complete picture of the health conditions in the Kyrgyz Republic than was previously available. A secondary objective of the survey was to enhance the capabilities of institutions in the Kyrgyz Republic to collect, process, and analyze population and health data. MAIN FINDINGS FERTILITY Fertility Rates. Survey results indicate a total fertility rate (TFR) for all of the Kyrgyz Republic of 3.4 children per woman. Fertility levels differ for different population groups. The TFR for women living in urban areas (2.3 children per woman) is substantially lower than for women living in rural areas (3.9). The TFR for Kyrgyz women (3.6 children per woman) is higher than for women of Russian ethnicity (1.5) but lower than Uzbek women (4.2). Among the regions of the Kyrgyz Republic, the TFR is lowest in Bishkek City (1.7 children per woman), and the highest in the East Region (4.3), and intermediate in the North and South Regions (3.1 and3.9, respectively). Time Trends. The KRDHS data show that fertility has declined in the Kyrgyz Republic in recent years. The decline in fertility from 5-9 to 0-4 years prior to the survey increases with age, from an 8 percent decline among 20-24 year olds to a 38 percent decline among 35-39 year olds. The declining trend in fertility can be seen by comparing the completed family size of women near the end of their childbearing years with the current TFR. Completed family size among women 40-49 is 4.6 children which is more than one child greater than the current TFR (3.4). Birth Intervals. Overall, 30 percent of births in the Kyrgyz Republic take place within 24 months of the previous birth. The median birth interval is 31.9 months. Age at Onset of Childbearing. The median age at which women in the Kyrgyz Republic begin childbearing has been holding steady over the past two decades at approximately 21.6 years. Most women have their first birth while in their early twenties, although about 20 percent of women give birth before age 20. Nearly half of married women in the Kyrgyz Republic (45 percent) do not want to have more children. Additional one-quarter of women (26 percent) want to delay their next birth by at least two years. These are the women who are potentially in need of some method of family planning. FAMILY PLANNING Ever Use. Among currently married women, 83 percent report having used a method of contraception at some time. The women most likely to have ever used a method of contraception are those age 30-44 (among both currently married and all women). Current Use. Overall, among currently married women, 60 percent report that they are currently using a contraceptive method. About half (49 percent) are using a modern method of contraception and another 11 percent are using a traditional method. The IUD is by far the most commonly used method; 38 percent of currently married women are using the IUD. Other modern methods of contraception account for only a small amount of use among currently married women: pills (2 percent), condoms (6 percent), and injectables and female sterilization (1 and 2 percent, respectively). Thus, the practice of family planning in the Kyrgyz Republic places high reliance on a single method, the IUD. Source of Methods. The vast majority of women obtain their contraceptives through the public sector (97 percent): 35 percent from a government hospital, and 36 percent from a women counseling center. The source of supply of the method depends on the method being used. For example, most women using IUDs obtain them at women counseling centers (42 percent) or hospitals (39 percent). Government pharmacies supply 46 percent of pill users and 75 percent of condom users. Pill users also obtain supplies from women counseling centers or (33 percent). Fertility Preferences. A majority of women in the Kyrgyz Republic (45 percent) indicated that they desire no more children. By age 25-29, 20 percent want no more children, and by age 30-34, nearly half (46 percent) want no more children. Thus, many women come to the preference to stop childbearing at relatively young ages-when they have 20 or more potential years of childbearing ahead of them. For some of these women, the most appropriate method of contraception may be a long-acting method such as female sterilization. However, there is a deficiency of use of this method in the Kyrgyz Republic. In the interests of providing a broad range of safe and effective methods, information about and access to sterilization should be increased so that individual women can make informed decisions about using this method. INDUCED ABORTION Abortion Rates. From the KRDHS data, the total abortion rate (TAR)-the number of abortions a woman will have in her lifetime based on the currently prevailing abortion rates-was calculated. For the Kyrgyz Republic, the TAR for the period from mid-1994 to mid-1997 is 1.6 abortions per woman. The TAR for the Kyrgyz Republic is lower than recent estimates of the TAR for other areas of the former Soviet Union such as Kazakhstan (1.8), and Yekaterinburg and Perm in Russia (2.3 and 2.8, respectively), but higher than for Uzbekistan (0.7). The TAR is higher in urban areas (2.1 abortions per woman) than in rural areas (1.3). The TAR in Bishkek City is 2.0 which is two times higher than in other regions of the Kyrgyz Republic. Additionally the TAR is substantially lower among ethnic Kyrgyz women (1.3) than among women of Uzbek and Russian ethnicities (1.9 and 2.2 percent, respectively). INFANT MORTALITY In the KRDHS, infant mortality data were collected based on the international definition of a live birth which, irrespective of the duration of pregnancy, is a birth that breathes or shows any sign of life (United Nations, 1992). Mortality Rates. For the five-year period before the survey (i.e., approximately mid-1992 to mid1997), infant mortality in the Kyrgyz Republic is estimated at 61 infant deaths per 1,000 births. The estimates of neonatal and postneonatal mortality are 32 and 30 per 1,000. The MOH publishes infant mortality rates annually but the definition of a live birth used by the MOH differs from that used in the survey. As is the case in most of the republics of the former Soviet Union, a pregnancy that terminates at less than 28 weeks of gestation is considered premature and is classified as a late miscarriage even if signs of life are present at the time of delivery. Thus, some events classified as late miscarriages in the MOH system would be classified as live births and infant deaths according to the definitions used in the KRDHS. Infant mortality rates based on the MOH data for the years 1983 through 1996 show a persistent declining trend throughout the period, starting at about 40 per 1,000 in the early 1980s and declining to 26 per 1,000 in 1996. This time trend is similar to that displayed by the rates estimated from the KRDHS. Thus, the estimates from both the KRDHS and the Ministry document a substantial decline in infant mortality; 25 percent over the period from 1982-87 to 1992-97 according to the KRDHS and 28 percent over the period from 1983-87 to 1993-96 according to the MOH estimates. This is strong evidence of improvements in infant survivorship in recent years in the Kyrgyz Republic. It should be noted that the rates from the survey are much higher than the MOH rates. For example, the KRDHS estimate of 61 per 1,000 for the period 1992-97 is twice the MOH estimate of 29 per 1,000 for 1993-96. Certainly, one factor leading to this difference are the differences in the definitions of a live birth and infant death in the KRDHS survey and in the MOH protocols. A thorough assessment of the difference between the two estimates would need to take into consideration the sampling variability of the survey's estimate. However, given the magnitude of the difference, it is likely that it arises from a combination of definitional and methodological differences between the survey and MOH registration system. MATERNAL AND CHILD HEALTH The Kyrgyz Republic has a well-developed health system with an extensive infrastructure of facilities that provide maternal care services. This system includes special delivery hospitals, the obstetrics and gynecology departments of general hospitals, women counseling centers, and doctor's assistant/midwife posts (FAPs). There is an extensive network of FAPs throughout the rural areas. Delivery. Virtually all births in the Kyrgyz Republic (96 percent) are delivered at health facilities: 95 percent in delivery hospitals and another 1 percent in either general hospitals
https://discover-now.co.uk/make-an-enquiry/https://discover-now.co.uk/make-an-enquiry/
Initially this data is collected during a patient's time at hospital as part of the Commissioning Data Set (CDS). This is submitted to NHS Digital for processing and is returned to healthcare providers as the Secondary Uses Service (SUS) data set and includes information relating to payment for activity undertaken. It allows hospitals to be paid for the care they deliver. This same data can also be processed and used for non-clinical purposes, such as research and planning health services. Because these uses are not to do with direct patient care, they are called 'secondary uses'. This is the SUS data set. SUS data covers all NHS Clinical Commissioning Groups (CCGs) in England, including: 1. private patients treated in NHS hospitals 2. patients resident outside of England 3. care delivered by treatment centres (including those in the independent sector) funded by the NHS
Each SUS record contains a wide range of information about an individual patient admitted to an NHS hospital, including: 1. clinical information about diagnoses and operations 2. patient information, such as age group, gender and ethnicity 3. administrative information, such as dates and methods of admission and discharge 4. geographical information such as where patients are treated and the area where they live
NHS Digital apply a strict statistical disclosure control in accordance with the NHS Digital protocol, to all published SUS data. This suppresses small numbers to stop people identifying themselves and others, to ensure that patient confidentiality is maintained.
Who SUS is for SUS provides data for the purpose of healthcare analysis to the NHS, government and others including:
The Secondary Users Service (SUS) database is made up of many data items relating to A&E care delivered by NHS hospitals in England. Many of these items form part of the national Commissioning Data Set (CDS), and are generated by the patient administration systems within each hospital. 1. national bodies and regulators, such as the Department of Health, NHS England, Public Health England, NHS Improvement and the CQC 2. local Clinical Commissioning Groups (CCGs) 3. provider organisations 4. government departments 5. researchers and commercial healthcare bodies 6. National Institute for Clinical Excellence (NICE) 7. patients, service users and carers 8. the media
Uses of the statistics
The statistics are known to be used for:
1. national policy making
2. benchmarking performance against other hospital providers or CCGs
3. academic research
4. analysing service usage and planning change
5. providing advice to ministers and answering a wide range of parliamentary questions
6. national and local press articles
7. international comparison
More information can be found at https://digital.nhs.uk/data-and-information/data-tools-and-services/data-services/hospital-episode-statistics https://digital.nhs.uk/data-and-information/publications/statistical/hospital-accident--emergency-activity"
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Community Acquired Pneumonia (CAP) is the leading cause of infectious death and the third leading cause of death globally. Disease severity and outcomes are highly variable, dependent on host factors (such as age, smoking history, frailty and comorbidities), microbial factors (the causative organism) and what treatments are given. Clinical decision pathways are complex and despite guidelines, there is significant national variability in how guidelines are adhered to and patient outcomes.
For clinicians treating pneumonia in the hospital setting, care of these patients can be challenging. Key decisions include the type of antibiotics (oral or intravenous), the appropriate place of care (home, hospital or intensive care), and when it is appropriate to stop antibiotics. Decision support tools to help inform clinical management would be highly valuable to the clinical community.
This dataset is synthetic, formed from statistical modelling using real patient data, and represents a population with significant diversity in terms of patient demography, socio-economic status, CAP severity, treatments and outcomes. It can be used to develop code for deployment on real data, train data analysts and increase familiarity with this disease and its management.
PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix.
EHR. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”. This synthetic dataset has been modelled to reflect data collected from this EHR.
Scope: A synthetic dataset which has been statistically modelled on all hospitalised patients admitted to UHB with Community Acquired Pneumonia. The dataset includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care including timings, admissions, escalation of care to ITU, discharge outcomes, physiology readings (heart rate, blood pressure, AVPU score and others), blood results and drug prescribing and administration.
Available supplementary data: Matched synthetic controls; ambulance, OMOP data, real patient CAP data. Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.