100+ datasets found
  1. Health Care Analytics

    • kaggle.com
    Updated Jan 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abishek Sudarshan
    Description

    Context

    Part of Janatahack Hackathon in Analytics Vidhya

    Content

    The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

    MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

    MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

    One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

    The Process:

    MedCamp employees / volunteers reach out to people and drive registrations.
    During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.
    

    Other things to note:

    Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people.
    For a few camps, there was hardware failure, so some information about date and time of registration is lost.
    MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides  
    information about several health issues through various awareness stalls.
    

    Favorable outcome:

    For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall.
    You need to predict the chances (probability) of having a favourable outcome.
    

    Train / Test split:

    Camps started on or before 31st March 2006 are considered in Train
    Test data is for all camps conducted on or after 1st April 2006.
    

    Acknowledgements

    Credits to AV

    Inspiration

    To share with the data science community to jump start their journey in Healthcare Analytics

  2. Employee Attrition for Healthcare

    • kaggle.com
    Updated Feb 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JohnM (2023). Employee Attrition for Healthcare [Dataset]. https://www.kaggle.com/datasets/jpmiller/employee-attrition-for-healthcare
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    JohnM
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Attrition of nurses in the US Healthcare system is at an all-time high. It is a major area of focus, especially for hospitals.

    This dataset contains employee and company data useful for supervised ML, unsupervised ML, and analytics. Attrition - whether an employee left or not - is included and can be used as the target variable.

    The data is synthetic and based on the IBM Watson dataset for attrition. Employee roles and departments were changed to reflect the healthcare domain. Also, known outcomes for some employees were changed to help increase the performance of ML models.

    Here's an app I use as a demo based on this dataset and an ML classification model.

    https://i.imgur.com/Aft3t1E.png"> https://i.imgur.com/QNRX2LA.png">

  3. Medical_cost_dataset

    • kaggle.com
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandita Pore (2023). Medical_cost_dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/medical-cost-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nandita Pore
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description:

    Explore the intricacies of medical costs and healthcare expenses with our meticulously curated Medical Cost Dataset. This dataset offers valuable insights into the factors influencing medical charges, enabling researchers, analysts, and healthcare professionals to gain a deeper understanding of the dynamics within the healthcare industry.

    Columns: 1. ID: A unique identifier assigned to each individual record, facilitating efficient data management and analysis. 2. Age: The age of the patient, providing a crucial demographic factor that often correlates with medical expenses. 3. Sex: The gender of the patient, offering insights into potential cost variations based on biological differences. 4. BMI: The Body Mass Index (BMI) of the patient, indicating the relative weight status and its potential impact on healthcare costs. 5. Children: The number of children or dependents covered under the medical insurance, influencing family-related medical expenses. 6. Smoker: A binary indicator of whether the patient is a smoker or not, as smoking habits can significantly impact healthcare costs. 7. Region: The geographic region of the patient, helping to understand regional disparities in healthcare expenditure. 8. Charges: The medical charges incurred by the patient, serving as the target variable for analysis and predictions.

    Whether you're aiming to uncover patterns in medical billing, predict future healthcare costs, or explore the relationships between different variables and charges, our Medical Cost Dataset provides a robust foundation for your research. Researchers can utilize this dataset to develop data-driven models that enhance the efficiency of healthcare resource allocation, insurers can refine pricing strategies, and policymakers can make informed decisions to improve the overall healthcare system.

    Unlock the potential of healthcare data with our comprehensive Medical Cost Dataset. Gain insights, make informed decisions, and contribute to the advancement of healthcare economics and policy. Start your analysis today and pave the way for a healthier future.

  4. g

    Healthcare Dataset

    • gts.ai
    json
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Healthcare Dataset [Dataset]. https://gts.ai/dataset-download/healthcare-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore our synthetic healthcare dataset designed for machine learning, data science, and healthcare analytics.

  5. c

    Healthcare Dataset

    • cubig.ai
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Healthcare Dataset [Dataset]. https://cubig.ai/store/products/176/healthcare-dataset
    Explore at:
    Dataset updated
    May 7, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Healthcare Dataset is a synthetic dataset designed to mimic real-world healthcare data for data science, machine learning, and data analysis purposes. It includes patient information, medical conditions, admission details, and healthcare services provided. This dataset is ideal for developing and testing healthcare predictive models, practicing data manipulation techniques, and creating data visualizations.

    2) Data Utilization (1) Healthcare data has characteristics that: • It includes detailed patient information such as age, gender, blood type, medical condition, and admission details. This information can be used to analyze healthcare trends, patient demographics, and the effectiveness of medical treatments. (2) Healthcare data can be used to: • Predictive Modeling: Helps in developing models to predict patient outcomes, treatment success rates, and disease progression. • Healthcare Analytics: Assists in analyzing patient data to identify patterns, improve patient care, and optimize resource allocation. • Educational Purposes: Supports learning and teaching data science concepts in a healthcare context, providing realistic data for experimentation and practice.

  6. AI medical chatbot

    • kaggle.com
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yousef Saeedian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description:

    This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

    Key Features:

    • Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.
    • Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.
    • Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.
    • Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

    Potential Use Cases:

    • NLP Model Training: Train models to understand and generate medical dialogues.
    • Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.
    • Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.
    • Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

    This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.

  7. h

    ai-medical-dataset

    • huggingface.co
    Updated May 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruslan Magana Vsevolodovna (2024). ai-medical-dataset [Dataset]. https://huggingface.co/datasets/ruslanmv/ai-medical-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 12, 2024
    Authors
    Ruslan Magana Vsevolodovna
    License

    https://choosealicense.com/licenses/creativeml-openrail-m/https://choosealicense.com/licenses/creativeml-openrail-m/

    Description

    AI Medical Dataset

      Introduction
    

    The AI Medical General Dataset is an experimental dataset designed to build a general chatbot with a strong foundation in medical knowledge. This dataset provides a large corpus of medical data, consisting of approximately 27 million rows, specifically adapted for training Large Language Models (LLMs) in the medical domain.

      Data Sources
    

    Our dataset is comprised of three primary sources:

    Source Number of Words… See the full description on the dataset page: https://huggingface.co/datasets/ruslanmv/ai-medical-dataset.

  8. m

    AHD: Arabic Healthcare Dataset

    • data.mendeley.com
    Updated Sep 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hezam Gawbah (2024). AHD: Arabic Healthcare Dataset [Dataset]. http://doi.org/10.17632/mgj29ndgrk.6
    Explore at:
    Dataset updated
    Sep 4, 2024
    Authors
    Hezam Gawbah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • Numerous language-centric research on healthcare is conducted day by day. To address shortcomings of Arabic natural language generation models, we introduce a large Arabic Healthcare Dataset (AHD) of textual data. For this motivation, we named our dataset ‘AHD’.
    • The largest Arabic Healthcare Dataset (AHD) as we know was collected from altibbi website.

    • The AHD consists of more than 808k Question and Answer into 90 variety categories. The AHD contains one file, and the file description will be discussed here. One file is the actual data which is in Arabic language.

      • AHD.xlsx file contains dataset in excel format, which includes the question, answer, and category in Arabic.

      • AHD_english.xlsx file contains dataset in excel format, which includes the question, answer, and category translated to English.

    • Distribution of Question and Answer per category.xlsex shows the distribution of the data set by category.

  9. m

    Data from: Generating Heterogeneous Big Data Set for Healthcare and...

    • data.mendeley.com
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omar Al-Obidi (2023). Generating Heterogeneous Big Data Set for Healthcare and Telemedicine Research Based on ECG, Spo2, Blood Pressure Sensors, and Text Inputs: Data set classified, Analyzed, Organized, And Presented in Excel File Format. [Dataset]. http://doi.org/10.17632/gsmjh55sfy.1
    Explore at:
    Dataset updated
    Jan 23, 2023
    Authors
    Omar Al-Obidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.

  10. P

    Medical Cost Personal Dataset Dataset

    • paperswithcode.com
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Medical Cost Personal Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/medical-cost-personal-dataset
    Explore at:
    Dataset updated
    Jun 12, 2025
    Description

    This dataset contains demographic and personal health information for individuals, along with the corresponding medical insurance charges billed to them. It is commonly used to build predictive models for insurance costs and to explore relationships between factors such as age, BMI, smoking status, and region on medical expenses.

    Features: - age: Age of the primary beneficiary (integer) - sex: Gender of the individual (male, female) - bmi: Body mass index, providing a measure of body fat based on height and weight (float) - children: Number of children/dependents covered by the insurance (integer) - smoker: Smoking status of the individual (yes, no) - region: Residential area in the US (northeast, northwest, southeast, southwest) - charges: Individual medical costs billed by health insurance (float, in USD)

    Applications: This dataset is frequently used in regression modeling, cost prediction, and data visualization tasks. It is ideal for learning how lifestyle and demographic factors impact healthcare expenses and serves as a foundational dataset for applied machine learning in health economics.

  11. h

    Data from: healthcare-dataset

    • huggingface.co
    Updated May 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gowtham Ravichandran (2024). healthcare-dataset [Dataset]. https://huggingface.co/datasets/gowthamrvc/healthcare-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 16, 2024
    Authors
    Gowtham Ravichandran
    Description

    gowthamrvc/healthcare-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. Electronic Medical Record Service

    • catalog.data.gov
    • data.va.gov
    • +5more
    Updated Apr 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Veterans Affairs (2021). Electronic Medical Record Service [Dataset]. https://catalog.data.gov/dataset/electronic-medical-record-service
    Explore at:
    Dataset updated
    Apr 21, 2021
    Dataset provided by
    United States Department of Veterans Affairshttp://va.gov/
    Description

    This service provides web services used to obtain clinical data for patients. There are three service methods that allow write functionality signNote, writeNote and writeSimpleOrder all of the other functionality exposed by this service is read only access. The service supports multiple Vista sites data access. Users of this service are intended to be healthcare providers

  13. d

    Pixta AI | Imagery Data | Global | High volume | Annotation and Labelling...

    • datarade.ai
    .json, .xml, .csv
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pixta AI (2023). Pixta AI | Imagery Data | Global | High volume | Annotation and Labelling Services Provided | Multimodal Medical Images OTS Datasets for AI and ML [Dataset]. https://datarade.ai/data-products/multimodal-medical-image-ots-datasets-pixta-ai
    Explore at:
    .json, .xml, .csvAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset authored and provided by
    Pixta AI
    Area covered
    French Polynesia, Guernsey, Montenegro, Maldives, Pitcairn, Uruguay, Haiti, Malaysia, Serbia, Lebanon
    Description
    1. Overview This dataset is a collection of multimodal high quality image sets of medical data that are ready to use for optimizing the accuracy of computer vision models. All of the contents are sourced from Pixta AI's partner network with high quality & full data compliance.

    2. Data subject The datasets consist of various models

    3. X-ray datasets

    4. CT datasets

    5. MRI datasets

    6. Mammography datasets

    7. Segmentation datasets

    8. Classification datasets

    9. Regression datasets

    10. Use case The dataset could be used for various Healthcare & Medical models:

    11. Medical Image Analysis

    12. Remote Diagnosis

    13. Medical Record Keeping ... Each data set is supported by both AI and expert doctors review process to ensure labelling consistency and accuracy. Contact us for more custom datasets.

    14. About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands. Visit us at https://www.pixta.ai/ or contact via our email admin.bi@pixta.co.jp.

  14. F

    Arabic Conversation Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Arabic Conversation Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/arabic-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The dataset comprises over 10,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.

    Participants Details: 150+ native Arabic participants from the FutureBeeAI community.
    Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

    Topic Diversity

    The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.

    Inbound Chats:
    Appointment Scheduling
    New Patient Registration
    Surgery Consultation
    Consultation regarding Diet, and many more
    Outbound Chats:
    Appointment Reminder
    Health & Wellness Subscription Programs
    Lab Test Results
    Health Risk Assessments
    Preventive Care Reminders, and many more

    Language Variety & Nuances

    The conversations in this dataset capture the diverse language styles and expressions prevalent in Arabic Healthcare interactions. This diversity ensures the dataset accurately represents the language used by Arabic speakers in Healthcare contexts.

    The dataset encompasses a wide array of language elements, including:

    Naming Conventions: Chats include a variety of Arabic personal and business names.
    Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Arabic-speaking regions.
    Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Arabic forms, adhering to local conventions.
    Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Arabic Healthcare conversations.

    This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Arabic Healthcare interactions.

    Conversational Flow and Interaction Types

    The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.

    Simple Inquiries
    Detailed Discussions
    Transactional Interactions
    Problem-Solving Dialogues
    Advisory Sessions
    Routine Checks and Follow-Ups

    Each of these conversations contains various aspects of conversation flow like:

    Greetings
    Authentication
    Information gathering
    Resolution identification
    Solution Delivery
    Closing and Follow-ups
    Feedback, etc

    This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.

    Data Format and Structure

    The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat messages,

  15. i

    IoT Healthcare Security Dataset

    • ieee-dataport.org
    Updated Aug 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal Hussain (2021). IoT Healthcare Security Dataset [Dataset]. https://ieee-dataport.org/documents/iot-healthcare-security-dataset
    Explore at:
    Dataset updated
    Aug 16, 2021
    Authors
    Faisal Hussain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    smart city

  16. R

    Synthetic Dataset of Emergency Healthcare Services

    • datarepositorium.uminho.pt
    • zenodo.org
    csv, txt
    Updated Jan 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repositório de Dados da Universidade do Minho (2025). Synthetic Dataset of Emergency Healthcare Services [Dataset]. http://doi.org/10.34622/datarepositorium/AKSZQG
    Explore at:
    csv(1259), txt(4064)Available download formats
    Dataset updated
    Jan 17, 2025
    Dataset provided by
    Repositório de Dados da Universidade do Minho
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Synthetic dataset of emergency services comprised of several CSV files that we have generated using a simulation software. This dataset is open for public use; please cite our work if used in research or applications. File Overview CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients. CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3). Fill_Information.csv - (2 KB): Fill information records for new patients. MedicalRecord1.csv - (10 KB): Medical record dataset for patient type 1. MedicalRecord2.csv - (4 KB): Medical record dataset for patient type 2. MedicalRecord3.csv - (2 KB): Medical record dataset for patient type 3. MedicalRecord4.csv - (13 KB): Medical record dataset for patient type 4. OutPatientDepartment.csv - (18 KB): Data related to the satisfaction and length of stay of an given patient. Triage.csv - (13 KB): Data related to the triage process. README.txt - (4 KB): Documentation of the dataset, including structure, metadata, and usage. Common Fields Across Files Patient ID (Integer): Unique identifier for each patient. Patient Type (Integer): Classification of patient (e.g., 1, 4). Medical Records Arrival Time (DateTime): Timestamp of the patient's first arrival in the medical record department. Exiting Time (DateTime): Timestamp when the patient exits a Server. Waiting Time (min) (Real): Total waiting time before being attended to. Resource Used (String): Resource (e.g., Operator) allocated to the patient. Utilization % (Real): Utilization rate of the resource as a percentage. Queue Count Before Processing (Integer): Number of patients in the queue before processing begins. Queue Count After Processing (Integer): Number of patients in the queue after processing ends. Queue Difference (Integer): Difference between the before and after queue counts. Length of Stay (min) (Real): Total time spent in the simulation by the patient. LOS without Queues (min) (Real): Length of stay excluding any queuing time. Satisfaction % (Real): Patient satisfaction rating based on their experience. New Patient? (String): Indicates if this is a new patient or a returning one.

  17. o

    Healthcare Analytics Training Dataset

    • opendatabay.com
    .undefined
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Healthcare Analytics Training Dataset [Dataset]. https://www.opendatabay.com/data/dataset/953c80ef-162d-467b-ae1c-867d0f9c490d
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Healthcare Insurance & Costs
    Description

    This synthetic healthcare dataset serves as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practise, develop, and showcase their data manipulation and analysis skills within the healthcare industry. The inspiration behind this dataset stems from the need for practical and diverse healthcare data for educational and research purposes, addressing the challenge of accessing sensitive real-world healthcare information. Generated using Python's Faker library, it mirrors the structure and attributes commonly found in healthcare records, aiming to foster innovation, learning, and knowledge sharing in healthcare analytics.

    Columns

    • Name: Represents the name of the patient associated with the healthcare record.
    • Age: The age of the patient at the time of admission, expressed in years.
    • Gender: Indicates the gender of the patient, either "Male" or "Female."
    • Blood Type: The patient's blood type, such as "A+" or "O-."
    • Medical Condition: Specifies the primary medical condition or diagnosis, for example, "Diabetes," "Hypertension," or "Asthma."
    • Date of Admission: The date on which the patient was admitted to the healthcare facility.
    • Doctor: The name of the doctor responsible for the patient's care during their admission.
    • Hospital: Identifies the healthcare facility or hospital where the patient was admitted.
    • Insurance Provider: Indicates the patient's insurance provider, such as "Aetna," "Blue Cross," "Cigna," "UnitedHealthcare," or "Medicare."
    • Billing Amount: The monetary amount billed for the patient's healthcare services during their admission, expressed as a floating-point number.
    • Room Number: The room number where the patient was accommodated.
    • Admission Type: Specifies the type of admission, which can be "Emergency," "Elective," or "Urgent."
    • Discharge Date: The date on which the patient was discharged, based on the admission date and a realistic range of days.
    • Medication: Identifies a medication prescribed or administered to the patient, including examples like "Aspirin," "Ibuprofen," "Penicillin," "Paracetamol," and "Lipitor."
    • Test Results: Describes the results of a medical test conducted during admission, with possible values being "Normal," "Abnormal," or "Inconclusive."

    Distribution

    This dataset is typically provided as a data file in CSV format. It is structured with columns providing specific information about the patient, their admission, and the healthcare services received. While the exact number of rows or records is not specified, it is designed to be a synthetic dataset suitable for various data analysis and modelling tasks in the healthcare domain.

    Usage

    This dataset is ideal for a wide range of applications, including: * Developing and testing healthcare predictive models. * Practising data cleaning, transformation, and analysis techniques. * Creating data visualisations to gain insights into healthcare trends. * Learning and teaching data science and machine learning concepts in a healthcare context. It can specifically be treated as a Multi-Class Classification Problem for predicting 'Test Results', which contains three categories: Normal, Abnormal, and Inconclusive.

    Coverage

    The dataset has a global geographic region. The time range for admissions and discharges, as indicated by the 'Date of Admission' and 'Discharge Date' columns, spans across several years, with examples observed from 2019 to 2024. Demographic scope is covered by patient 'Name', 'Age', 'Gender', and 'Blood Type' information. As this is a synthetic dataset, it does not contain real patient information and is created to mirror common healthcare record structures.

    License

    CCO

    Who Can Use It

    This dataset is intended for data science, machine learning, and data analysis enthusiasts. It is particularly useful for those looking to engage in learning and experimentation within the healthcare analytics domain. The dataset encourages exploration, analysis, and sharing of findings within communities like Kaggle.

    Dataset Name Suggestions

    • Healthcare Dataset
    • Healthcare Insurance & Costs Data
    • Synthetic Patient Records
    • Medical Admissions Data for Analytics
    • Healthcare Analytics Training Dataset

    Attributes

    Original Data Source: Healthcare Dataset

  18. h

    Recurv-Clinical-Dataset

    • huggingface.co
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Recurv-Clinical-Dataset [Dataset]. https://huggingface.co/datasets/RecurvAI/Recurv-Clinical-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Recurv AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🩺 Recurv-Clinical-Dataset:

    The Recurv Clinical Dataset is a comprehensive resource containing 12,631 high-quality question-answer pairs specifically designed for training and fine-tuning medical AI models. Curated from trusted medical sources, this dataset focuses on real-world scenarios, including patient history, diagnostics, and treatment recommendations. It sets a new benchmark for advancing conversational AI in the healthcare field.

      📈 Dataset Statistics… See the full description on the dataset page: https://huggingface.co/datasets/RecurvAI/Recurv-Clinical-Dataset.
    
  19. Medical Assistance Dataset

    • kaggle.com
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balaji Kartheek (2024). Medical Assistance Dataset [Dataset]. https://www.kaggle.com/datasets/balajikartheek/medical-assistance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 23, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Balaji Kartheek
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset is a JSON file containing the Intents Information: 1. Greetings 2. Introduction to the Diseases 3. Types of Diseases 4. Symptoms of Disease 5. Prevention of Disease

  20. P

    MedVidCL (Medical Video Classification) Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Feb 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Gupta; Kush Attal; Dina Demner-Fushman (2022). MedVidCL (Medical Video Classification) Dataset [Dataset]. https://paperswithcode.com/dataset/medvidcl
    Explore at:
    Dataset updated
    Feb 3, 2022
    Authors
    Deepak Gupta; Kush Attal; Dina Demner-Fushman
    Description

    The MedVidCL dataset contains a collection of 6, 617 videos annotated into ‘medical instructional’, ‘medical non-instructional' and ‘non-medical’ classes. A two-step approach is used to construct the MedVidCL dataset. In the first step, the videos annotated by health informatics experts are used to train a machine learning model that predicts the given video to one of the three aforementioned classes. In the second step, only the high-confidence videos are used and health informatics experts assess the model’s predicted video category and update the category wherever needed.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics
Organization logo

Health Care Analytics

Predicting Patient Outcome

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abishek Sudarshan
Description

Context

Part of Janatahack Hackathon in Analytics Vidhya

Content

The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

The Process:

MedCamp employees / volunteers reach out to people and drive registrations.
During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.

Other things to note:

Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people.
For a few camps, there was hardware failure, so some information about date and time of registration is lost.
MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides  
information about several health issues through various awareness stalls.

Favorable outcome:

For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall.
You need to predict the chances (probability) of having a favourable outcome.

Train / Test split:

Camps started on or before 31st March 2006 are considered in Train
Test data is for all camps conducted on or after 1st April 2006.

Acknowledgements

Credits to AV

Inspiration

To share with the data science community to jump start their journey in Healthcare Analytics

Search
Clear search
Close search
Google apps
Main menu