https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.
To help get you started, here are some data exploration ideas:
See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!
This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.
Here, we've processed the data to facilitate analytics. This processed version has three components:
The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.
In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:
Additionally, there are two CSV files that facilitate joining data across years:
The "database.sqlite" file contains tables corresponding to each of the processed CSV files.
The code to create the processed version of this data is available on GitHub.
The dataset has 2 populations of Synthea synthetic patients generated by Synthea tool. Each population has 15K patients with original medical records in CSV files. Because the total file size is >3GB in each population, the files are compressed in zip file. Synthea records are in domains similar to those in real EMR, including patients, encounters, conditions (diagnosis), observations, medications, and procedures. The data was first used in building ML models for lung cancer risk prediction. For more information, see the published paper in Nature Scientific Reports (https://www.nature.com/articles/s41598-022-23011-4)
The Area Health Resources Files (AHRF) provide current as well as historic data for more than 6,000 variables for each of the nation's counties, as well as state and national data. They contain information on health facilities, health professions, measures of resource scarcity, health status, economic activity, health training programs, and socioeconomic and environmental characteristics. In addition, the basic file contains geographic codes and other metadata which enable it to be linked to other files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets are concordance files that link the Geographic Classification for Health (GCH) to statistical geographies and geographic units commonly used in health research and analysis in Aotearoa New Zealand (NZ).
More information about the develppment of the GCH is available in our Open Access publication.
Our long-term aim is the comprehensive and accurate understanding of urban-rural variation in health outcomes and healthcare utilization at both national and regional levels. This is best achieved by the widespread uptake of the GCH by health researchers and health policy makers. The GCH is straightforward to use and most users will only need the relevant concordance file.
Statistical Area 1s (SA1s, small statistical areas which are the output geography for population data) were used as the building blocks for the Geographic Classification for Health (GCH) and are the preferred small areas when undertaking the analysis of health data using the GCH. It is however appreciated that a lot of health data is not available at the SA1 level and GCH concordance files are also available for Domicile (Census Area Units, CAU) and Statistical Area 2s (SA2) and Meshblock.
The following concordance files are available in excel format:
SA12018_to_GCH2018.csv This concordance file applies a GCH category to each SA1 in NZ SA22018_to_GCH2018.csv This concordance file applies a GCH category to each SA2 in NZ MoH_HDOM_to_GCH2018.csv This concordance file applies a GCH category to each Domicile in NZ. Please read the additional information below if you plan to use this concordance file. MoH_MB_to_GCH2018.csv This concordance file applies a GCH category to each Meshblock in NZ. Please read the additional information below if you plan to use this concordance file.
Additional information relating to geographic units used by the Ministry of Health:
MoH_HDOM_to_GCH2018.csv This file has been designed specifically to add GCH to the Ministry of Health (MoH) datasets containing Domicile codes. Use this file if your dataset contains only Domicile codes. If your dataset also contains Meshblock codes, then use the MoH Meshblock to GCH concordance file. This file includes 2006 and 2013 domicile codes. The 2013 domiciles are still current as of 2022, and this file will still work well with data outside those years. Domicile boundaries do not align well with SA1 boundaries, and longitudinal health data usually contains some older Domiciles which have been phased out and replaced with multiple smaller Domiciles. These deprecated Domiciles may overlap multiple SA1s. Usually, all such SA1s belong to the same GCH category. Occasionally, a Domicile will overlap more than one GCH category. When this happens, we have assigned the GCH category to which the majority of people living in that Domicile belong. By necessity, this will allocate a minority of people in those Domiciles to a GCH category to which they do not belong.
MoH_MB_to_GCH2018.csv This file has been designed specifically to add GCH to Ministry of Health (MoH) datasets containing Meshblock codes. This file includes 2018, 2013, 2006, and 2001 Meshblock codes, but will still work well with data outside those years. Meshblock boundaries from census 2018 fit perfectly and completely within the Statistics New Zealand Statistical Area 1s (SA1) boundaries on which GCH is based. However, longitudinal health data usually contains some older Meshblocks which have been phased out and replaced by multiple smaller Meshblocks. These deprecated Meshblocks may overlap multiple SA1s. Usually, all such SA1s belong to the same GCH category. Occasionally, a Meshblock will overlap more than one GCH category. When this happens, we have assigned the GCH category to which the majority of people living in that Meshblock belong. By necessity, this will allocate a minority of people in those Meshblocks to a GCH category to which they do not belong.
The U.S. Census Bureau, in collaboration with five federal agencies, launched the Household Pulse Survey to produce data on the social and economic impacts of Covid-19 on American households. The Household Pulse Survey was designed to gauge the impact of the pandemic on employment status, consumer spending, food security, housing, education disruptions, and dimensions of physical and mental wellness. The survey was designed to meet the goal of accurate and timely weekly estimates. It was conducted by an internet questionnaire, with invitations to participate sent by email and text message. The sample frame is the Census Bureau Master Address File Data. Housing units linked to one or more email addresses or cell phone numbers were randomly selected to participate, and one respondent from each housing unit was selected to respond for him or herself. Estimates are weighted to adjust for nonresponse and to match Census Bureau estimates of the population by age, gender, race and ethnicity, and educational attainment. All estimates shown meet the NCHS Data Presentation Standards for Proportions.
https://www.icpsr.umich.edu/web/ICPSR/studies/38008/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38008/terms
The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of people who do and do not use tobacco. 45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and youth along with 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete the Youth Interview after parental consent. At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled Primary Sampling Units (PSUs) and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort. At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This second replenishment sample was combined for estimation and analysis purposes with Wave 7 adult and youth respondents from the Wave 4 Cohort who were at least age 15 and in the civilian, noninstitutionalized population at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort. Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts. Dataset 0001 (DS0001) contains the data from the Public-Use File Master Linkage File (PUF-MLF). This file contains 93 variables and 82,139 cases. The file provides a master list of every person's unique identification number and what type of respondent they were in each wave for data that are available in the Public-Use Files and Special Collection Public-Use Files. Dataset 0002 (DS0002) contains the data from the Restricted-Use File Master Linkage File (RUF-MLF). This file contains 198 variables and 82,139 cases. The file provides a master list of every person's unique identification number and what type of respondent they were in each wave for data that are available in the Restricted-Use Files, Special Collection Restricted-Use Files, and Biomarker Restricted-Use Files.
By Health Data New York [source]
This dataset provides comprehensive measures to evaluate the quality of medical services provided to Medicaid beneficiaries by Health Homes, including the Centers for Medicare & Medicaid Services (CMS) Core Set and Health Home State Plan Amendment (SPA). This allows us to gain insight into how well these health homes are performing in terms of delivering high-quality care. Our data sources include the Medicaid Data Mart, QARR Member Level Files, and New York State Delivery System Inform Incentive Program (DSRIP) Data Warehouse. With this data set you can explore essential indicators such as rates for indicators within scope of Core Set Measures, sub domains, domains and measure descriptions; age categories used; denominators of each measure; level of significance for each indicator; and more! By understanding more about Health Home Quality Measures from this resource you can help make informed decisions about evidence based health practices while also promoting better patient outcomes
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains measures that evaluate the quality of care delivered by Health Homes for the Centers for Medicare & Medicaid Services (CMS). With this dataset, you can get an overview of how a health home is performing in terms of quality. You can use this data to compare different health homes and their respective service offerings.
The data used to create this dataset was collected from Medicaid Data Mart, QARR Member Level Files, and New York State Delivery System Incentive Program (DSRIP) Data Warehouse sources.
In order to use this dataset effectively, you should start by looking at the columns provided. These include: Measurement Year; Health Home Name; Domain; Sub Domain; Measure Description; Age Category; Denominator; Rate; Level of Significance; Indicator. Each column provides valuable insight into how a particular health home is performing in various measurements of healthcare quality.
When examining this data, it is important to remember that many variables are included in any given measure and that changes may have occurred over time due to varying factors such as population or financial resources available for healthcare delivery. Furthermore, changes in policy may also affect performance over time so it is important to take these things into account when evaluating the performance of any given health home from one year to the next or when comparing different health homes on a specific measure or set of indicators over time
- Using this dataset, state governments can evaluate the effectiveness of their health home programs by comparing the performance across different domains and subdomains.
- Healthcare providers and organizations can use this data to identify areas for improvement in quality of care provided by health homes and strategies to reduce disparities between individuals receiving care from health homes.
- Researchers can use this dataset to analyze how variations in cultural context, geography, demographics or other factors impact delivery of quality health home services across different locations
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: health-home-quality-measures-beginning-2013-1.csv | Column name | Description | |:--------------------------|:----------------------------------------------------| | Measurement Year | The year in which the data was collected. (Integer) | | Health Home Name | The name of the health home. (String) | | Domain | The domain of the measure. (String) | | Sub Domain | The sub domain of the measure. (String) | | Measure Description | A description of the measure. (String) | | Age Category | The age category of the patient. (String) | | Denominator | The denominator of the measure. (Integer) | | Rate | The rate of the measure. (Float) | | Level of Significance | The level of significance of the measure. (String) | | Indicator | The indicator of the measure. (String) |
...
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This product presents comparable time-series data for a range of health indicators from a number of sources including the Canadian Community Health Survey, Vital Statistics, and Canadian Cancer Registry.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456828https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456828
Abstract (en): The purpose of the Health Interview Survey is to obtain information about the amount and distribution of illness, its effects in terms of disability and chronic impairments, and the kinds of health services people receive. There are five types of records in this core survey, each in a separate data file. The variables in the Household File (Part 1) include type of living quarters, size of family, number of families in the household, presence of a telephone, number of unrelated individuals, and region. The Person File (Part 2) includes information on sex, age, race, marital status, Hispanic origin, education, veteran status, family income, family size, major activities, health status, activity limits, employment status, and industry and occupation. These variables are found in the Condition, Doctor Visit, and Hospital Episode Files as well. The Person File also supplies data on height, weight, bed days, doctor visits, hospital stays, years at residence, and region variables. The Condition File (Part 3) contains information for each reported health condition, with specifics on injury and accident reports. The Hospital Episode File (Part 4) provides information on medical conditions, hospital episodes, type of service, type of hospital ownership, date of admission and discharge, number of nights in hospital, and operations performed. The Doctor Visit File (Part 5) documents doctor visits within the time period and identifies acute or chronic conditions. A sixth file has been added, along with the five core files. The Health Insurance File (Part 6) documents basic demographic information along with medical coverage and health insurance plans, as well as differentiates between hospital, doctor visit, and surgical insurance coverage. Civilian, noninstitutionalized population of the United States. A multistage probability sample was used in selecting housing units. 2010-09-30 Frequencies and variable labels that were previously incorrect have been corrected.2010-09-09 A technical error has been found and resolved in the processing procedure, in which defined file sets did not match subsequent data sets.2010-09-02 SAS, SPSS, and Stata setup files have been added. Some corresponding documentation has been updated and pre-existing data files have been replaced. A sixth dataset has been added in place of the National Health Survey Procedure Documentation, which can now be found with all other corresponding and added documentation.2006-01-18 File CB8337.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads. face-to-face interviewThese data files contain weights that must be used in any analysis.Per agreement with NCHS, ICPSR distributes the data files and text of the technical documentation for this collection as prepared by NCHS.
The Medical Expenditure Panel Survey (MEPS) Household Component (HC) collects data from a sample of families and individuals in selected communities across the United States, drawn from a nationally representative subsample of households that participated in the prior year's National Health Interview Survey (conducted by the National Center for Health Statistics). During the household interviews, MEPS collects detailed information for each person in the household on the following: demographic characteristics, health conditions, health status, use of medical services, charges and source of payments, access to care, satisfaction with care, health insurance coverage, income, and employment. The panel design of the survey, which features several rounds of interviewing, makes it possible to determine how changes in respondents' health status, income, employment, eligibility for public and private insurance coverage, use of services, and payment for care are related. Public Use Files for Household data are available on the MEPS website.
The Provider of Services File (POS) - Internet Quality Improvement and Evaluation System (iQIES) - Home Health Agency (HHA), Ambulatory Surgical Center (ASC), and Hospice Providers data provides information on provider demographic and associated certification information. In this file you will find provider number (CMS Certification Number), name, address, and other characteristics of the participating institution providers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AHRQ's database on Social Determinants of Health (SDOH) was created under a project funded by the Patient Centered Outcomes Research (PCOR) Trust Fund. The purpose of this project is to create easy to use, easily linkable SDOH-focused data to use in PCOR research, inform approaches to address emerging health issues, and ultimately contribute to improved health outcomes.The database was developed to make it easier to find a range of well documented, readily linkable SDOH variables across domains without having to access multiple source files, facilitating SDOH research and analysis.Variables in the files correspond to five key SDOH domains: social context (e.g., age, race/ethnicity, veteran status), economic context (e.g., income, unemployment rate), education, physical infrastructure (e.g, housing, crime, transportation), and healthcare context (e.g., health insurance). The files can be linked to other data by geography (county, ZIP Code, and census tract). The database includes data files and codebooks by year at three levels of geography, as well as a documentation file.The data contained in the SDOH database are drawn from multiple sources and variables may have differing availability, patterns of missing, and methodological considerations across sources, geographies, and years. Users should refer to the data source documentation and codebooks, as well as the original data sources, to help identify these patterns
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The Medical Records Filing System market size was estimated at USD 14.8 billion in 2023, and it is projected to reach USD 28.6 billion by 2032, growing at a compound annual growth rate (CAGR) of 7.8% from 2024 to 2032. The rising demand for efficient healthcare management systems and the increasing adoption of digital solutions are key drivers for this market growth.
The medical records filing system market is experiencing significant growth primarily due to the increased emphasis on improving patient care through better data management. As healthcare systems globally strive for higher efficiency, the importance of maintaining accurate and accessible patient records has become paramount. The adoption of digital solutions, such as Electronic Health Records (EHRs), is accelerating due to their ability to streamline operations and reduce errors, leading to enhanced patient outcomes. Furthermore, legislations and regulations promoting data interoperability and secure patient information exchange are encouraging healthcare providers to upgrade their filing systems. This trend is expected to continue as healthcare institutions increasingly recognize the long-term cost benefits of digital recordkeeping systems.
Technological advancements are another significant growth driver for the medical records filing system market. Innovations in cloud computing, artificial intelligence (AI), and machine learning are transforming how patient data is stored, accessed, and analyzed. Cloud-based medical records systems offer scalable solutions that can be customized to meet the diverse needs of healthcare providers. AI and machine learning technologies, on the other hand, enable predictive analytics, helping healthcare providers make informed decisions. These technological advancements are not only enhancing the functionality of medical records filing systems but also providing a competitive edge to early adopters in the healthcare sector.
Another critical factor contributing to the market growth is the increasing prevalence of chronic diseases and the aging global population. As the number of patients with chronic conditions rises, so does the volume of medical data that needs to be managed. Efficient medical records filing systems are crucial for the ongoing management of these patients, ensuring that healthcare providers have timely access to comprehensive medical histories. This need is particularly acute in regions with older populations, where the demand for long-term care facilities and ongoing medical management is higher.
The concept of Medical Data Middle is increasingly becoming a focal point in the healthcare industry. As healthcare providers strive to enhance data management and patient care, the integration of a centralized data repository, or Medical Data Middle, facilitates seamless data sharing and interoperability. This approach not only improves the accessibility of patient information across various healthcare settings but also enhances the accuracy of diagnoses and treatment plans. By centralizing medical data, healthcare providers can ensure that patient records are up-to-date and comprehensive, leading to better-informed clinical decisions. The implementation of Medical Data Middle can also streamline administrative processes, reduce redundancies, and ultimately contribute to more efficient healthcare delivery systems.
Regionally, North America is expected to dominate the medical records filing system market, followed by Europe and the Asia Pacific. The high adoption rate of advanced healthcare technologies, well-established healthcare infrastructure, and favorable regulatory environment in North America are key factors driving the market in this region. Conversely, the Asia Pacific region is projected to witness the highest growth rate during the forecast period due to increasing healthcare expenditures, rising patient awareness, and government initiatives to digitize healthcare records. Markets in Latin America and the Middle East & Africa are also expected to grow, albeit at a slower pace, driven by improvements in healthcare infrastructure and increased investments in healthcare technology.
The medical records filing system market can be segmented by product type into paper-based filing systems, electronic filing systems, and hybrid filing systems. Paper-based filing systems, while traditional, are becoming less popular due to their limitations in storage capacity and risk
https://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm
Healthcare Data for use with GIS mapping software, databases, and web applications are from Caliper Corporation and contain point geographic files of healthcare organizations, providers, and hospitals and an boundary file of Primary Care Service Areas.
https://www.icpsr.umich.edu/web/ICPSR/studies/37519/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37519/terms
The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of people who use or do not use tobacco. 45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete an interview after parental consent. At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled primary sampling units (PSU)s and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort. At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This "second replenishment sample" was combined for estimation and analysis purposes with the Wave 7 adult and youth respondents from the Wave 4 Cohorts who were at least age 15 and in the civilian, noninstitutionalized population at the time of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort. Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts. Wave 4.5 was a special data collection for youth only who were aged 12 to 17 at the time of the Wave 4.5 interview. Wave 4.5 was the fourth annual follow-up wave for those who were members of the Wave 1 Cohort. For those who were sampled at Wave 4, Wave 4.5 was the first annual follow-up wave. Wave 5.5, conducted in 2020, was a special data collection for Wave 4 Cohort youth and young adults ages 13 to 19 at the time of the Wave 5.5 interview. Also in 2020, a subsample of Wave 4 Cohort adults ages 20 and older were interviewed via the PATH Study Adult Telephone Survey (PATH-ATS). Wave 7.5 was a special collection for Wave 4 and Wave 7 Cohort youth and young adults ages 12 to 22 at the time of the Wave 7.5 interview. For those who were sampled at Wave 7, Wave 7.5 was the first annual follow-up wave. Dataset 1002 (DS1002) contains the data from the Wave 4.5 Youth and Parent Questionnaire. This file contains 1,617 variables and 13,131 cases. Of these cases, 11,378 are continuing youth having completed a prior Youth Interview. The other 1,753 cases are "aged-up youth" having previously been sampled as "shadow youth" Datasets 1112, 1212, and 1222, (DS1112, DS1212, and DS1222) are data files comprising the weight variables for Wave 4.5. The "all-waves" weight file contains weights for participants in the Wave 1 Cohort who completed a Wave 4.5 Youth Interview and completed interviews (if old enough to do so) or verified their information with the study (if not old enough to be interviewed) in Waves 1, 2, 3, and 4. There are two separate files with "single wave" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "single-wave" weight file for the Wave 1 Cohort contains weights for youth who c
This dataset is made available under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). See LICENSE.pdf for details.
Dataset description
Parquet file, with:
The file is indexed on [participant]_[month], such that 34_12 means month 12 from participant 34. All participant IDs have been replaced with randomly generated integers and the conversion table deleted.
Column names and explanations are included as a separate tab-delimited file. Detailed descriptions of feature engineering are available from the linked publications.
File contains aggregated, derived feature matrix describing person-generated health data (PGHD) captured as part of the DiSCover Project (https://clinicaltrials.gov/ct2/show/NCT03421223). This matrix focuses on individual changes in depression status over time, as measured by PHQ-9.
The DiSCover Project is a 1-year long longitudinal study consisting of 10,036 individuals in the United States, who wore consumer-grade wearable devices throughout the study and completed monthly surveys about their mental health and/or lifestyle changes, between January 2018 and January 2020.
The data subset used in this work comprises the following:
From these input sources we define a range of input features, both static (defined once, remain constant for all samples from a given participant throughout the study, e.g. demographic features) and dynamic (varying with time for a given participant, e.g. behavioral features derived from consumer-grade wearables).
The dataset contains a total of 35,694 rows for each month of data collection from the participants. We can generate 3-month long, non-overlapping, independent samples to capture changes in depression status over time with PGHD. We use the notation ‘SM0’ (sample month 0), ‘SM1’, ‘SM2’ and ‘SM3’ to refer to relative time points within each sample. Each 3-month sample consists of: PHQ-9 survey responses at SM0 and SM3, one set of screener survey responses, LMC survey responses at SM3 (as well as SM1, SM2, if available), and wearable PGHD for SM3 (and SM1, SM2, if available). The wearable PGHD includes data collected from 8 to 14 days prior to the PHQ-9 label generation date at SM3. Doing this generates a total of 10,866 samples from 4,036 unique participants.
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
The Dataset represents the County Health Ranking of all states taking into account the various factors The County Health Rankings can be used to highlight regional variations in health, increase public understanding of the various factors that affect health, and inspire actions to improve community health. The Rankings capitalizes on our innate desire to compete by enabling comparisons across adjacent or comparable counties within states.
The CSV file contains the rankings and data details for the measures used in the 2022/23 County Health Rankings.
1) Outcomes and Factors Rankings --Ranks are all calculated and reported WITHIN states
2)**Outcomes and Factors SubRankings** --Ranks are all calculated and reported WITHIN states
3) Ranked Measure Data --The measures themselves are listed in bold.
4) Ranked Measure Sources & Years
5) Additional Measure Data --These are supplemental measures reported on the Rankings web site but not used in calculating the rankings.
6) Additional Measure Sources & Years
The Data Types of all Columns are automatically set to "Object"
To change it just use data.apply(pd.to_numeric)
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.
Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.
There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was generated using Simio simulation software. The simulations model patient flow in healthcare settings, capturing key metrics such as queue times, length of stay (LOS) for patients, and nurse utilization rates. Each CSV file contains time-series data, with measured variables including patient waiting times, resource utilization percentages, and service durations.## File Overview**CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients.**CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3).**Fill_Information.csv** - (2 KB): Fill information records for new patients.**MedicalRecord1.csv** - (10 KB): Medical record dataset for patient type 1.**MedicalRecord2.csv** - (4 KB): Medical record dataset for patient type 2.**MedicalRecord3.csv** - (2 KB): Medical record dataset for patient type 3.**MedicalRecord4.csv** - (13 KB): Medical record dataset for patient type 4.**OutPatientDepartment.csv** - (18 KB): Data related to the satisfaction and length of stay of an given patient.**Triage.csv** - (13 KB): Data related to the triage process.**README.txt** - (4 KB): Documentation of the dataset, including structure, metadata, and usage.## Common Fields Across Files**Patient ID** (Integer): Unique identifier for each patient.**Patient Type** (Integer): Classification of patient (e.g., 1, 4).**Medical Records Arrival Time** (DateTime): Timestamp of the patient's first arrival in the medical record department.**Exiting Time** (DateTime): Timestamp when the patient exits a Server.**Waiting Time (min)** (Real): Total waiting time before being attended to.**Resource Used** (String): Resource (e.g., Operator) allocated to the patient.**Utilization %** (Real): Utilization rate of the resource as a percentage.**Queue Count Before Processing** (Integer): Number of patients in the queue before processing begins.**Queue Count After Processing** (Integer): Number of patients in the queue after processing ends.**Queue Difference** (Integer): Difference between the before and after queue counts.**Length of Stay (min)** (Real): Total time spent in the simulation by the patient.**LOS without Queues (min)** (Real): Length of stay excluding any queuing time.**Satisfaction %** (Real): Patient satisfaction rating based on their experience.**New Patient?** (String): Indicates if this is a new patient or a returning one.
The Area Health Resources Files (AHRF), compiled by the Health Resources and Services Administration (HRSA), offer a comprehensive collection of data on health resources in the United States. These files integrate information from over 50 sources, providing extensive county-level, state-level, and national-level data. The dataset includes annual data releases, with files corresponding to the years 2020-2021, 2021-2022, and 2022-2023. Each release is accompanied by technical documentation and is available in various formats, including CSV and SAS.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.
To help get you started, here are some data exploration ideas:
See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!
This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.
Here, we've processed the data to facilitate analytics. This processed version has three components:
The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.
In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:
Additionally, there are two CSV files that facilitate joining data across years:
The "database.sqlite" file contains tables corresponding to each of the processed CSV files.
The code to create the processed version of this data is available on GitHub.