The All CMS Data Feeds dataset is an expansive resource offering access to 119 unique report feeds, providing in-depth insights into various aspects of the U.S. healthcare system including nursing facility owners and accountable care organization participants contact data. With over 25.8 billion rows of data meticulously collected since 2007, this dataset is invaluable for healthcare professionals, analysts, researchers, and businesses seeking to understand and analyze healthcare trends, performance metrics, and demographic shifts over time. The dataset is updated monthly, ensuring that users always have access to the most current and relevant data available.
Dataset Overview:
118 Report Feeds: - The dataset includes a wide array of report feeds, each providing unique insights into different dimensions of healthcare. These topics range from Medicare and Medicaid service metrics, patient demographics, provider information, financial data, and much more. The breadth of information ensures that users can find relevant data for nearly any healthcare-related analysis. - As CMS releases new report feeds, they are automatically added to this dataset, keeping it current and expanding its utility for users.
25.8 Billion Rows of Data:
Historical Data Since 2007: - The dataset spans from 2007 to the present, offering a rich historical perspective that is essential for tracking long-term trends and changes in healthcare delivery, policy impacts, and patient outcomes. This historical data is particularly valuable for conducting longitudinal studies and evaluating the effects of various healthcare interventions over time.
Monthly Updates:
Data Sourced from CMS:
Use Cases:
Market Analysis:
Healthcare Research:
Performance Tracking:
Compliance and Regulatory Reporting:
Data Quality and Reliability:
The All CMS Data Feeds dataset is designed with a strong emphasis on data quality and reliability. Each row of data is meticulously cleaned and aligned, ensuring that it is both accurate and consistent. This attention to detail makes the dataset a trusted resource for high-stakes applications, where data quality is critical.
Integration and Usability:
Ease of Integration:
Department of State Hospitals Patient Population Demographic (Fiscal Effective Dates: 2010-2020)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Medical Lake population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Medical Lake. The dataset can be utilized to understand the population distribution of Medical Lake by age. For example, using this dataset, we can identify the largest age group in Medical Lake.
Key observations
The largest age group in Medical Lake, WA was for the group of age 30 to 34 years years with a population of 580 (11.77%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Medical Lake, WA was the 85 years and over years with a population of 24 (0.49%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Medical Lake Population by Age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets are for a cohort of n=1540 anonymised hospitalised COVID-19 patients, and the data provide information on outcomes (i.e. patient death or discharge), demographics and biomarker measurements for two New York hospitals: State University of New York (SUNY) Downstate Health Sciences University and Maimonides Medical Center.
The file "demographics_both_hospitals.csv" contains the ultimate outcomes of hospitalisation (whether a patient was discharged or died), demographic information and known comorbidities for each of the patients.
The file "dynamics_clean_both_hospitals.csv" contains cleaned dynamic biomarker measurements for the n=1233 patients where this information was available and the data passed our various checks (see https://doi.org/10.1101/2021.11.12.21266248 for information of these checks and the cleaning process). Patients can be matched to demographic data via the "id" column.
Study approval and data collection
Study approval was obtained from the State University of New York (SUNY) Downstate Health Sciences University Institutional Review Board (IRB#1595271-1) and Maimonides Medical Center Institutional Review Board/Research Committee (IRB#2020-05-07). A retrospective query was performed among the patients who were admitted to SUNY Downstate Medical Center and Maimonides Medical Center with COVID-19-related symptoms, which was subsequently confirmed by RT PCR, from the beginning of February 2020 until the end of May 2020. Stratified randomization was used to select at least 500 patients who were discharged and 500 patients who died due to the complications of COVID-19. Patient outcome was recorded as a binary choice of “discharged” versus “COVID-19 related mortality”. Patients whose outcome was unknown were excluded. Demographic, clinical history and laboratory data was extracted from the hospital’s electronic health records.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Healthcare Dataset is a synthetic dataset designed to mimic real-world healthcare data for data science, machine learning, and data analysis purposes. It includes patient information, medical conditions, admission details, and healthcare services provided. This dataset is ideal for developing and testing healthcare predictive models, practicing data manipulation techniques, and creating data visualizations.
2) Data Utilization (1) Healthcare data has characteristics that: • It includes detailed patient information such as age, gender, blood type, medical condition, and admission details. This information can be used to analyze healthcare trends, patient demographics, and the effectiveness of medical treatments. (2) Healthcare data can be used to: • Predictive Modeling: Helps in developing models to predict patient outcomes, treatment success rates, and disease progression. • Healthcare Analytics: Assists in analyzing patient data to identify patterns, improve patient care, and optimize resource allocation. • Educational Purposes: Supports learning and teaching data science concepts in a healthcare context, providing realistic data for experimentation and practice.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This is a structured, multi-table dataset designed to simulate a hospital management system. It is ideal for practicing data analysis, SQL, machine learning, and healthcare analytics.
Dataset Overview
This dataset includes five CSV files:
patients.csv – Patient demographics, contact details, registration info, and insurance data
doctors.csv – Doctor profiles with specializations, experience, and contact information
appointments.csv – Appointment dates, times, visit reasons, and statuses
treatments.csv – Treatment types, descriptions, dates, and associated costs
billing.csv – Billing amounts, payment methods, and status linked to treatments
📁 Files & Column Descriptions
** patients.csv**
Contains patient demographic and registration details.
Column Description
patient_id -> Unique ID for each patient first_name -> Patient's first name last_name -> Patient's last name gender -> Gender (M/F) date_of_birth -> Date of birth contact_number -> Phone number address -> Address of the patient registration_date -> Date of first registration at the hospital insurance_provider -> Insurance company name insurance_number -> Policy number email -> Email address
** doctors.csv**
Details about the doctors working in the hospital.
Column Description
doctor_id -> Unique ID for each doctor first_name -> Doctor's first name last_name -> Doctor's last name specialization -> Medical field of expertise phone_number -> Contact number years_experience -> Total years of experience hospital_branch -> Branch of hospital where doctor is based email -> Official email address
appointments.csv
Records of scheduled and completed patient appointments.
Column Description
appointment_id -> Unique appointment ID patient_id -> ID of the patient doctor_id -> ID of the attending doctor appointment_date -> Date of the appointment appointment_time -> Time of the appointment reason_for_visit -> Purpose of visit (e.g., checkup) status -> Status (Scheduled, Completed, Cancelled)
treatments.csv
Information about the treatments given during appointments.
Column Description
treatment_id -> Unique ID for each treatment appointment_id -> Associated appointment ID treatment_type -> Type of treatment (e.g., MRI, X-ray) description -> Notes or procedure details cost -> Cost of treatment treatment_date -> Date when treatment was given
** billing.csv**
Billing and payment details for treatments.
Column Description
bill_id -> Unique billing ID patient_id -> ID of the billed patient treatment_id -> ID of the related treatment bill_date -> Date of billing amount -> Total amount billed payment_method -> Mode of payment (Cash, Card, Insurance) payment_status -> Status of payment (Paid, Pending, Failed)
Possible Use Cases
SQL queries and relational database design
Exploratory data analysis (EDA) and dashboarding
Machine learning projects (e.g., cost prediction, no-show analysis)
Feature engineering and data cleaning practice
End-to-end healthcare analytics workflows
Recommended Tools & Resources
SQL (joins, filters, window functions)
Pandas and Matplotlib/Seaborn for EDA
Scikit-learn for ML models
Pandas Profiling for automated EDA
Plotly for interactive visualizations
Please Note that :
All data is synthetically generated for educational and project use. No real patient information is included.
If you find this dataset helpful, consider upvoting or sharing your insights by creating a Kaggle notebook.
The Washington State Department of Health presents this information as a service to the public. This includes information on the work status, practice characteristics, education, and demographics of healthcare providers, provided in response to the Washington Health Workforce Survey.
This is a complete set of data across all of the responding professions. The data dictionary identifies questions that are specific to an individual profession and aren't common to all surveys. The dataset is provided without identifying information for the responding providers.
More information on the Washington Health Workforce Survey can be found at www.doh.wa.gov/workforcesurvey
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
The acute-care pathway (from the emergency department (ED) through acute medical units or ambulatory care and on to wards) is the most visible aspect of the hospital health-care system to most patients. Acute hospital admissions are increasing yearly and overcrowded emergency departments and high bed occupancy rates are associated with a range of adverse patient outcomes. Predicted growth in demand for acute care driven by an ageing population and increasing multimorbidity is likely to exacerbate these problems in the absence of innovation to improve the processes of care.
Key targets for Emergency Medicine services are changing, moving away from previous 4-hour targets. This will likely impact the assessment of patients admitted to hospital through Emergency Departments.
This data set provides highly granular patient level information, showing the day-to-day variation in case mix and acuity. The data includes detailed demography, co-morbidity, symptoms, longitudinal acuity scores, physiology and laboratory results, all investigations, prescriptions, diagnoses and outcomes. It could be used to develop new pathways or understand the prevalence or severity of specific disease presentations.
PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix.
Electronic Health Record: University Hospital Birmingham is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Scope: All patients with a medical emergency admitted to hospital, flowing through the acute medical unit. Longitudinal & individually linked, so that the preceding & subsequent health journey can be mapped & healthcare utilisation prior to & after admission understood. The dataset includes patient demographics, co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care (timings, admissions, wards and readmissions), physiology readings (NEWS2 score and clinical frailty scale), Charlson comorbidity index and time dimensions.
Available supplementary data: Matched controls; ambulance data, OMOP data, synthetic data.
Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
The dataset contains estimates for the number of healthcare professionals in 15 different healthcare categories (e.g., Registered Nurse, Dentist, License Clinical Social Worker, etc.) based on completion of license renewal by Race/Ethnicity. There are two timeframes: all current licenses and recent licenses (since 2017). California population estimates are also included to provide a marker for each Race/Ethnicity. Each healthcare professional category can be compared across Race/Ethnicity groups and compared to statewide population estimates, so Race/Ethnicity shortages can be identified for each healthcare professional category. For instance, a notable difference between healthcare professional category and statewide population would indicate either underrepresentation or overrepresentation for that Race/Ethnicity, depending on the direction of the difference.
This dataset describes demographic information, number of healthcare and education facilities and human resources/staff in States/Regions and Townships of Myanmar.
This dataset contains data for the Healthcare Payments Data (HPD) Healthcare Measures report. The data cover three measurement categories: Health conditions, Utilization, and Demographics. The health condition measurements quantify the prevalence of long-term illnesses and major medical events prominent in California’s communities like diabetes and heart failure. Utilization measures convey rates of healthcare system use through visits to the emergency department and different categories of inpatient stays, such as maternity or surgical stays. The demographic measures describe the health coverage and other characteristics (e.g., age) of the Californians included in the data and represented in the other measures. The data include both a count or sum of each measure and a count of the base population so that data users can calculate the percentages, rates, and averages in the visualization. Measures are grouped by year, age band, sex (assigned sex at birth), payer type, Covered California Region, and county.
https://aim-ima.be/Donnees-individuelles-realiser-l?lang=frhttps://aim-ima.be/Donnees-individuelles-realiser-l?lang=fr
IMA-AIM can provide you with detailed data on the health care system in Belgium. Their data collection includes information on the reimbursed care and medicines of the 11 million citizens insured in our country. The data is collected by the 7 health insurance funds and processed, analysed and made available for research by IMA-AIM.
The seven health insurance funds in Belgium collect a lot of data about their members in order to be able to carry out their tasks. IMA-AIM brings these data together in databases for the purpose of analysis and research. The databases contain three types of data: population data (demographic and socio-economic characteristics), information about reimbursed health care and information about reimbursed medicines.
The Permanent Sample (EPS) is a longitudinal dataset containing data from the Population, Health Care and Pharmanet databases, as well as data on hospitalisations. The data are available in separate datasets per calendar year. The aim of EPS is to make the administrative data of the health insurance funds permanently available to a number of federal and regional partners. More information about the EPS: https://metadata.ima-aim.be/nl/app/bdds/Ps
Longitudinal datasets of demographic, social, medical and economic information from a rural demographic in northern KwaZulu-Natal, South Africa where HIV prevalence is extremely high. The data may be filtered by demographics, years, or by individuals questionnaires. The datasets may be used by other researchers but the Africa Centre requests notification that anyone contact them when downloading their data. The datasets are provided in three formats: Stata11 .dta; tables in a MS-Access .accdb database; and worksheets in a MS-Excel .xlsx workbook. Datasets are generated approximately every six months containing information spanning the whole period of surveillance from 1/1/2000 to present.
The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.
The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).
The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.
A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.
Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figure 4.1 Percentage of population with a medical card by age group, 2007 and 2016
This dataset includes the following variables: client county; number, percentage, average, and age of clients served, number and percentage of adolescent client served, number and percentage of male clients served , and clients served by race and ethnicity (Latino, White, African American, Asian and Pacific Islander, Other (including Native American); and clients served by primary language (Spanish, English, Other).
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Community Acquired Pneumonia (CAP) is the leading cause of infectious death and the third leading cause of death globally. Disease severity and outcomes are highly variable, dependent on host factors (such as age, smoking history, frailty and comorbidities), microbial factors (the causative organism) and what treatments are given. Clinical decision pathways are complex and despite guidelines, there is significant national variability in how guidelines are adhered to and patient outcomes.
For clinicians treating pneumonia in the hospital setting, care of these patients can be challenging. Key decisions include the type of antibiotics (oral or intravenous), the appropriate place of care (home, hospital or intensive care), and when it is appropriate to stop antibiotics. Decision support tools to help inform clinical management would be highly valuable to the clinical community.
This dataset is synthetic, formed from statistical modelling using real patient data, and represents a population with significant diversity in terms of patient demography, socio-economic status, CAP severity, treatments and outcomes. It can be used to develop code for deployment on real data, train data analysts and increase familiarity with this disease and its management.
PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix.
EHR. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”. This synthetic dataset has been modelled to reflect data collected from this EHR.
Scope: A synthetic dataset which has been statistically modelled on all hospitalised patients admitted to UHB with Community Acquired Pneumonia. The dataset includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care including timings, admissions, escalation of care to ITU, discharge outcomes, physiology readings (heart rate, blood pressure, AVPU score and others), blood results and drug prescribing and administration.
Available supplementary data: Matched synthetic controls; ambulance, OMOP data, real patient CAP data. Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Explore the intricacies of medical costs and healthcare expenses with our meticulously curated Medical Cost Dataset. This dataset offers valuable insights into the factors influencing medical charges, enabling researchers, analysts, and healthcare professionals to gain a deeper understanding of the dynamics within the healthcare industry.
Columns: 1. ID: A unique identifier assigned to each individual record, facilitating efficient data management and analysis. 2. Age: The age of the patient, providing a crucial demographic factor that often correlates with medical expenses. 3. Sex: The gender of the patient, offering insights into potential cost variations based on biological differences. 4. BMI: The Body Mass Index (BMI) of the patient, indicating the relative weight status and its potential impact on healthcare costs. 5. Children: The number of children or dependents covered under the medical insurance, influencing family-related medical expenses. 6. Smoker: A binary indicator of whether the patient is a smoker or not, as smoking habits can significantly impact healthcare costs. 7. Region: The geographic region of the patient, helping to understand regional disparities in healthcare expenditure. 8. Charges: The medical charges incurred by the patient, serving as the target variable for analysis and predictions.
Whether you're aiming to uncover patterns in medical billing, predict future healthcare costs, or explore the relationships between different variables and charges, our Medical Cost Dataset provides a robust foundation for your research. Researchers can utilize this dataset to develop data-driven models that enhance the efficiency of healthcare resource allocation, insurers can refine pricing strategies, and policymakers can make informed decisions to improve the overall healthcare system.
Unlock the potential of healthcare data with our comprehensive Medical Cost Dataset. Gain insights, make informed decisions, and contribute to the advancement of healthcare economics and policy. Start your analysis today and pave the way for a healthier future.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data collection contains de-identified clinical health service utilisation data from Bendigo Health and the General Practitioners Practices associated with the Loddon Mallee Murray Medicare Local. The collection also includes associated population health data from the ABS, AIHW and the Municipal Health Plans. Health researchers have a major interest in how clinical data can be used to monitor population health and health care in rural and regional Australia through analysing a broad range of factors shown to impact the health of different populations. The Population Health data collection provides students, managers, clinicians and researchers the opportunity to use clinical data in the study of population health, including the analysis of health risk factors, disease trends and health care utilisation and outcomes.Temporal range (data time period):2004 to 2014Spatial coverage:Bendigo Latitude -36.758711200000010000, Bendigo Longitude 144.283745899999990000
This dataset contains demographic and personal health information for individuals, along with the corresponding medical insurance charges billed to them. It is commonly used to build predictive models for insurance costs and to explore relationships between factors such as age, BMI, smoking status, and region on medical expenses.
Features: - age: Age of the primary beneficiary (integer) - sex: Gender of the individual (male, female) - bmi: Body mass index, providing a measure of body fat based on height and weight (float) - children: Number of children/dependents covered by the insurance (integer) - smoker: Smoking status of the individual (yes, no) - region: Residential area in the US (northeast, northwest, southeast, southwest) - charges: Individual medical costs billed by health insurance (float, in USD)
Applications: This dataset is frequently used in regression modeling, cost prediction, and data visualization tasks. It is ideal for learning how lifestyle and demographic factors impact healthcare expenses and serves as a foundational dataset for applied machine learning in health economics.
The All CMS Data Feeds dataset is an expansive resource offering access to 119 unique report feeds, providing in-depth insights into various aspects of the U.S. healthcare system including nursing facility owners and accountable care organization participants contact data. With over 25.8 billion rows of data meticulously collected since 2007, this dataset is invaluable for healthcare professionals, analysts, researchers, and businesses seeking to understand and analyze healthcare trends, performance metrics, and demographic shifts over time. The dataset is updated monthly, ensuring that users always have access to the most current and relevant data available.
Dataset Overview:
118 Report Feeds: - The dataset includes a wide array of report feeds, each providing unique insights into different dimensions of healthcare. These topics range from Medicare and Medicaid service metrics, patient demographics, provider information, financial data, and much more. The breadth of information ensures that users can find relevant data for nearly any healthcare-related analysis. - As CMS releases new report feeds, they are automatically added to this dataset, keeping it current and expanding its utility for users.
25.8 Billion Rows of Data:
Historical Data Since 2007: - The dataset spans from 2007 to the present, offering a rich historical perspective that is essential for tracking long-term trends and changes in healthcare delivery, policy impacts, and patient outcomes. This historical data is particularly valuable for conducting longitudinal studies and evaluating the effects of various healthcare interventions over time.
Monthly Updates:
Data Sourced from CMS:
Use Cases:
Market Analysis:
Healthcare Research:
Performance Tracking:
Compliance and Regulatory Reporting:
Data Quality and Reliability:
The All CMS Data Feeds dataset is designed with a strong emphasis on data quality and reliability. Each row of data is meticulously cleaned and aligned, ensuring that it is both accurate and consistent. This attention to detail makes the dataset a trusted resource for high-stakes applications, where data quality is critical.
Integration and Usability:
Ease of Integration: