100+ datasets found

Health Care Analytics
kaggle.com
Updated Jan 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abishek Sudarshan
Description
Context

Part of Janatahack Hackathon in Analytics Vidhya

Content

The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

The Process:

MedCamp employees / volunteers reach out to people and drive registrations. During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.

Other things to note:

Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people. For a few camps, there was hardware failure, so some information about date and time of registration is lost. MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides information about several health issues through various awareness stalls.

Favorable outcome:

For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall. You need to predict the chances (probability) of having a favourable outcome.

Train / Test split:

Camps started on or before 31st March 2006 are considered in Train Test data is for all camps conducted on or after 1st April 2006.

Acknowledgements

Credits to AV

Inspiration

To share with the data science community to jump start their journey in Healthcare Analytics
AI medical chatbot
kaggle.com
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yousef Saeedian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description:

This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

Key Features:

Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.

Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.

Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.

Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

Potential Use Cases:

NLP Model Training: Train models to understand and generate medical dialogues.

Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.

Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.

Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.
Medical_cost_dataset
kaggle.com
Updated Aug 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandita Pore (2023). Medical_cost_dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/medical-cost-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nandita Pore
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description:

Explore the intricacies of medical costs and healthcare expenses with our meticulously curated Medical Cost Dataset. This dataset offers valuable insights into the factors influencing medical charges, enabling researchers, analysts, and healthcare professionals to gain a deeper understanding of the dynamics within the healthcare industry.

Columns: 1. ID: A unique identifier assigned to each individual record, facilitating efficient data management and analysis. 2. Age: The age of the patient, providing a crucial demographic factor that often correlates with medical expenses. 3. Sex: The gender of the patient, offering insights into potential cost variations based on biological differences. 4. BMI: The Body Mass Index (BMI) of the patient, indicating the relative weight status and its potential impact on healthcare costs. 5. Children: The number of children or dependents covered under the medical insurance, influencing family-related medical expenses. 6. Smoker: A binary indicator of whether the patient is a smoker or not, as smoking habits can significantly impact healthcare costs. 7. Region: The geographic region of the patient, helping to understand regional disparities in healthcare expenditure. 8. Charges: The medical charges incurred by the patient, serving as the target variable for analysis and predictions.

Whether you're aiming to uncover patterns in medical billing, predict future healthcare costs, or explore the relationships between different variables and charges, our Medical Cost Dataset provides a robust foundation for your research. Researchers can utilize this dataset to develop data-driven models that enhance the efficiency of healthcare resource allocation, insurers can refine pricing strategies, and policymakers can make informed decisions to improve the overall healthcare system.

Unlock the potential of healthcare data with our comprehensive Medical Cost Dataset. Gain insights, make informed decisions, and contribute to the advancement of healthcare economics and policy. Start your analysis today and pave the way for a healthier future.
g
Healthcare Dataset
gts.ai
json
Updated Oct 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Healthcare Dataset [Dataset]. https://gts.ai/dataset-download/healthcare-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Oct 19, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our synthetic healthcare dataset designed for machine learning, data science, and healthcare analytics.
Employee Attrition for Healthcare
kaggle.com
Updated Feb 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JohnM (2023). Employee Attrition for Healthcare [Dataset]. https://www.kaggle.com/datasets/jpmiller/employee-attrition-for-healthcare
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
JohnM
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Attrition of nurses in the US Healthcare system is at an all-time high. It is a major area of focus, especially for hospitals.

This dataset contains employee and company data useful for supervised ML, unsupervised ML, and analytics. Attrition - whether an employee left or not - is included and can be used as the target variable.

The data is synthetic and based on the IBM Watson dataset for attrition. Employee roles and departments were changed to reflect the healthcare domain. Also, known outcomes for some employees were changed to help increase the performance of ML models.

Here's an app I use as a demo based on this dataset and an ML classification model.

https://i.imgur.com/Aft3t1E.png"> https://i.imgur.com/QNRX2LA.png">
c
Healthcare Dataset
cubig.ai
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Healthcare Dataset [Dataset]. https://cubig.ai/store/products/176/healthcare-dataset
Explore at:
Dataset updated
May 7, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Healthcare Dataset is a synthetic dataset designed to mimic real-world healthcare data for data science, machine learning, and data analysis purposes. It includes patient information, medical conditions, admission details, and healthcare services provided. This dataset is ideal for developing and testing healthcare predictive models, practicing data manipulation techniques, and creating data visualizations.

2) Data Utilization (1) Healthcare data has characteristics that: • It includes detailed patient information such as age, gender, blood type, medical condition, and admission details. This information can be used to analyze healthcare trends, patient demographics, and the effectiveness of medical treatments. (2) Healthcare data can be used to: • Predictive Modeling: Helps in developing models to predict patient outcomes, treatment success rates, and disease progression. • Healthcare Analytics: Assists in analyzing patient data to identify patterns, improve patient care, and optimize resource allocation. • Educational Purposes: Supports learning and teaching data science concepts in a healthcare context, providing realistic data for experimentation and practice.
h
ai-medical-dataset
huggingface.co
Updated May 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruslan Magana Vsevolodovna (2024). ai-medical-dataset [Dataset]. https://huggingface.co/datasets/ruslanmv/ai-medical-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 12, 2024
Authors
Ruslan Magana Vsevolodovna
License
https://choosealicense.com/licenses/creativeml-openrail-m/https://choosealicense.com/licenses/creativeml-openrail-m/
Description
AI Medical Dataset

Introduction

The AI Medical General Dataset is an experimental dataset designed to build a general chatbot with a strong foundation in medical knowledge. This dataset provides a large corpus of medical data, consisting of approximately 27 million rows, specifically adapted for training Large Language Models (LLMs) in the medical domain.

Data Sources

Our dataset is comprised of three primary sources:

Source Number of Words… See the full description on the dataset page: https://huggingface.co/datasets/ruslanmv/ai-medical-dataset.
m
AHD: Arabic Healthcare Dataset
data.mendeley.com
Updated Sep 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hezam Gawbah (2024). AHD: Arabic Healthcare Dataset [Dataset]. http://doi.org/10.17632/mgj29ndgrk.6
Explore at:
Unique identifier
https://doi.org/10.17632/mgj29ndgrk.6
Dataset updated
Sep 4, 2024
Authors
Hezam Gawbah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Numerous language-centric research on healthcare is conducted day by day. To address shortcomings of Arabic natural language generation models, we introduce a large Arabic Healthcare Dataset (AHD) of textual data. For this motivation, we named our dataset ‘AHD’.

The largest Arabic Healthcare Dataset (AHD) as we know was collected from altibbi website.

The AHD consists of more than 808k Question and Answer into 90 variety categories. The AHD contains one file, and the file description will be discussed here. One file is the actual data which is in Arabic language.

AHD.xlsx file contains dataset in excel format, which includes the question, answer, and category in Arabic.

AHD_english.xlsx file contains dataset in excel format, which includes the question, answer, and category translated to English.

Distribution of Question and Answer per category.xlsex shows the distribution of the data set by category.
m
Data from: Generating Heterogeneous Big Data Set for Healthcare and...
data.mendeley.com
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Al-Obidi (2023). Generating Heterogeneous Big Data Set for Healthcare and Telemedicine Research Based on ECG, Spo2, Blood Pressure Sensors, and Text Inputs: Data set classified, Analyzed, Organized, And Presented in Excel File Format. [Dataset]. http://doi.org/10.17632/gsmjh55sfy.1
Explore at:
Unique identifier
https://doi.org/10.17632/gsmjh55sfy.1
Dataset updated
Jan 23, 2023
Authors
Omar Al-Obidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.
Electronic Medical Record Service
catalog.data.gov
data.va.gov
+5more
Updated Apr 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Veterans Affairs (2021). Electronic Medical Record Service [Dataset]. https://catalog.data.gov/dataset/electronic-medical-record-service
Explore at:
Dataset updated
Apr 21, 2021
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description
This service provides web services used to obtain clinical data for patients. There are three service methods that allow write functionality signNote, writeNote and writeSimpleOrder all of the other functionality exposed by this service is read only access. The service supports multiple Vista sites data access. Users of this service are intended to be healthcare providers
h
Data from: healthcare-dataset
huggingface.co
Updated May 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gowtham Ravichandran (2024). healthcare-dataset [Dataset]. https://huggingface.co/datasets/gowthamrvc/healthcare-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2024
Authors
Gowtham Ravichandran
Description
gowthamrvc/healthcare-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
P
MedVidCL (Medical Video Classification) Dataset
paperswithcode.com
opendatalab.com
Updated Feb 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepak Gupta; Kush Attal; Dina Demner-Fushman (2022). MedVidCL (Medical Video Classification) Dataset [Dataset]. https://paperswithcode.com/dataset/medvidcl
Explore at:
Dataset updated
Feb 3, 2022
Authors
Deepak Gupta; Kush Attal; Dina Demner-Fushman
Description
The MedVidCL dataset contains a collection of 6, 617 videos annotated into ‘medical instructional’, ‘medical non-instructional' and ‘non-medical’ classes. A two-step approach is used to construct the MedVidCL dataset. In the first step, the videos annotated by health informatics experts are used to train a machine learning model that predicts the given video to one of the three aforementioned classes. In the second step, only the high-confidence videos are used and health informatics experts assess the model’s predicted video category and update the category wherever needed.
The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...
zenodo.org
bin, csv, zip
Updated Jan 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux (2024). The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases, Labeled Images and Captions from Open Access PMC Articles [Dataset]. http://doi.org/10.5281/zenodo.10079370
Explore at:
zip, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10079370
Dataset updated
Jan 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains multi-modal data from over 75,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

Almost 100,000 patients and almost 400,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.

For a detailed insight about the contents of this dataset, please refer to this data article published in Data In Brief.
Synthetic Healthcare Database for Research (SyH-DR)
catalog.data.gov
healthdata.gov
+1more
Updated Sep 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
Description
The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.
s
Electronic Health Records (EHR) Datasets
shaip.com
json
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2022). Electronic Health Records (EHR) Datasets [Dataset]. https://www.shaip.com/offerings/electronic-health-records-ehr-medical-data-catalog/
Explore at:
jsonAvailable download formats
Dataset updated
Apr 8, 2022
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Get premium quality off-the-shelf EHR dataset to develop better performing machine learning models. Speak to our experts for Electronic Health Records data needs.
d
Pixta AI | Imagery Data | Global | High volume | Annotation and Labelling...
datarade.ai
.json, .xml, .csv
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pixta AI (2023). Pixta AI | Imagery Data | Global | High volume | Annotation and Labelling Services Provided | Multimodal Medical Images OTS Datasets for AI and ML [Dataset]. https://datarade.ai/data-products/multimodal-medical-image-ots-datasets-pixta-ai
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Jul 19, 2023
Dataset authored and provided by
Pixta AI
Area covered
Guernsey, Pitcairn, Uruguay, Montenegro, Haiti, Malaysia, French Polynesia, Serbia, Lebanon, Maldives
Description
Overview This dataset is a collection of multimodal high quality image sets of medical data that are ready to use for optimizing the accuracy of computer vision models. All of the contents are sourced from Pixta AI's partner network with high quality & full data compliance.

Data subject The datasets consist of various models

X-ray datasets

CT datasets

MRI datasets

Mammography datasets

Segmentation datasets

Classification datasets

Regression datasets

Use case The dataset could be used for various Healthcare & Medical models:

Medical Image Analysis

Remote Diagnosis

Medical Record Keeping ... Each data set is supported by both AI and expert doctors review process to ensure labelling consistency and accuracy. Contact us for more custom datasets.

About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands. Visit us at https://www.pixta.ai/ or contact via our email admin.bi@pixta.co.jp.
F
Arabic Conversation Chat Dataset for Healthcare Domain
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Arabic Conversation Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/arabic-healthcare-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The dataset comprises over 10,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
•
Participants Details: 150+ native Arabic participants from the FutureBeeAI community.

•
Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

Topic Diversity
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
•Inbound Chats:
•Appointment Scheduling
•New Patient Registration
•Surgery Consultation
•Consultation regarding Diet, and many more
•Outbound Chats:
•Appointment Reminder
•Health & Wellness Subscription Programs
•Lab Test Results
•Health Risk Assessments
•Preventive Care Reminders, and many more
Language Variety & Nuances
The conversations in this dataset capture the diverse language styles and expressions prevalent in Arabic Healthcare interactions. This diversity ensures the dataset accurately represents the language used by Arabic speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
•
Naming Conventions: Chats include a variety of Arabic personal and business names.

•
Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Arabic-speaking regions.

•
Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Arabic forms, adhering to local conventions.

•
Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Arabic Healthcare conversations.

This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Arabic Healthcare interactions.
Conversational Flow and Interaction Types
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
•Simple Inquiries
•Detailed Discussions
•Transactional Interactions
•Problem-Solving Dialogues
•Advisory Sessions
•Routine Checks and Follow-Ups
Each of these conversations contains various aspects of conversation flow like:
•Greetings
•Authentication
•Information gathering
•Resolution identification
•Solution Delivery
•Closing and Follow-ups
•Feedback, etc
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
Data Format and Structure
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat messages,
h
Recurv-Clinical-Dataset
huggingface.co
Updated Feb 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Recurv AI (2025). Recurv-Clinical-Dataset [Dataset]. https://huggingface.co/datasets/RecurvAI/Recurv-Clinical-Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2025
Dataset authored and provided by
Recurv AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
🩺 Recurv-Clinical-Dataset:

The Recurv Clinical Dataset is a comprehensive resource containing 12,631 high-quality question-answer pairs specifically designed for training and fine-tuning medical AI models. Curated from trusted medical sources, this dataset focuses on real-world scenarios, including patient history, diagnostics, and treatment recommendations. It sets a new benchmark for advancing conversational AI in the healthcare field.

📈 Dataset Statistics… See the full description on the dataset page: https://huggingface.co/datasets/RecurvAI/Recurv-Clinical-Dataset.
f
Dataset: Multinational attitudes towards AI in healthcare and diagnostics...
figshare.com
pdf
Updated Sep 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felix Busch; Keno K. Bressem; COMFORT consortium (2024). Dataset: Multinational attitudes towards AI in healthcare and diagnostics among hospital patients: Cross-sectional evidence from the COMFORT study [Dataset]. http://doi.org/10.6084/m9.figshare.24964488.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24964488.v1
Dataset updated
Sep 1, 2024
Dataset provided by
figshare
Authors
Felix Busch; Keno K. Bressem; COMFORT consortium
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset and data dictionary for the manuscript entitled "Multinational attitudes towards AI in healthcare and diagnostics among hospital patients: Cross-sectional evidence from the COMFORT study." Please cite the corresponding publication as a reference.
i
IoT Healthcare Security Dataset
ieee-dataport.org
Updated Aug 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Faisal Hussain (2021). IoT Healthcare Security Dataset [Dataset]. https://ieee-dataport.org/documents/iot-healthcare-security-dataset
Explore at:
Dataset updated
Aug 16, 2021
Authors
Faisal Hussain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
smart city

Facebook

Twitter

Click to copy link

Link copied

Cite

Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics

Health Care Analytics

Predicting Patient Outcome

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 10, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Abishek Sudarshan

Description

Context

Part of Janatahack Hackathon in Analytics Vidhya

Content

The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

The Process:

MedCamp employees / volunteers reach out to people and drive registrations.
During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.

Other things to note:

Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people.
For a few camps, there was hardware failure, so some information about date and time of registration is lost.
MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides  
information about several health issues through various awareness stalls.

Favorable outcome:

For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall.
You need to predict the chances (probability) of having a favourable outcome.

Train / Test split:

Camps started on or before 31st March 2006 are considered in Train
Test data is for all camps conducted on or after 1st April 2006.

Acknowledgements

Credits to AV

Inspiration

To share with the data science community to jump start their journey in Healthcare Analytics

Clear search

Close search

Google apps

Main menu

Health Care Analytics

Context

Content

Acknowledgements

Inspiration

AI medical chatbot

Medical_cost_dataset

Description:

Healthcare Dataset

Employee Attrition for Healthcare

Healthcare Dataset

ai-medical-dataset

AHD: Arabic Healthcare Dataset

Data from: Generating Heterogeneous Big Data Set for Healthcare and...

Electronic Medical Record Service

Data from: healthcare-dataset

MedVidCL (Medical Video Classification) Dataset

The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...

Synthetic Healthcare Database for Research (SyH-DR)

Electronic Health Records (EHR) Datasets

Pixta AI | Imagery Data | Global | High volume | Annotation and Labelling...

Arabic Conversation Chat Dataset for Healthcare Domain

Introduction

Topic Diversity

Language Variety & Nuances

Conversational Flow and Interaction Types

Data Format and Structure

Recurv-Clinical-Dataset

Dataset: Multinational attitudes towards AI in healthcare and diagnostics...

IoT Healthcare Security Dataset

Health Care Analytics

Predicting Patient Outcome

Context

Content

Acknowledgements

Inspiration