100+ datasets found

g
Healthcare Dataset
gts.ai
json
Updated Oct 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Healthcare Dataset [Dataset]. https://gts.ai/dataset-download/healthcare-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Oct 19, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our synthetic healthcare dataset designed for machine learning, data science, and healthcare analytics.
m
Data from: Generating Heterogeneous Big Data Set for Healthcare and...
data.mendeley.com
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Al-Obidi (2023). Generating Heterogeneous Big Data Set for Healthcare and Telemedicine Research Based on ECG, Spo2, Blood Pressure Sensors, and Text Inputs: Data set classified, Analyzed, Organized, And Presented in Excel File Format. [Dataset]. http://doi.org/10.17632/gsmjh55sfy.1
Explore at:
Unique identifier
https://doi.org/10.17632/gsmjh55sfy.1
Dataset updated
Jan 23, 2023
Authors
Omar Al-Obidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.
s
Transcribed Medical Records datasets for Machine Learning
shaip.com
json
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2025). Transcribed Medical Records datasets for Machine Learning [Dataset]. https://www.shaip.com/offerings/transcribed-medical-records-medical-data-catalog/
Explore at:
jsonAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Get premium quality Off-the-shelf transcribed medical records dataset to develop better performing machine learning models. Deep domain expertise. Fast & Cost-effective.
Health Care Analytics
kaggle.com
Updated Jan 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abishek Sudarshan
Description
Context

Part of Janatahack Hackathon in Analytics Vidhya

Content

The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

The Process:

MedCamp employees / volunteers reach out to people and drive registrations. During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.

Other things to note:

Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people. For a few camps, there was hardware failure, so some information about date and time of registration is lost. MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides information about several health issues through various awareness stalls.

Favorable outcome:

For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall. You need to predict the chances (probability) of having a favourable outcome.

Train / Test split:

Camps started on or before 31st March 2006 are considered in Train Test data is for all camps conducted on or after 1st April 2006.

Acknowledgements

Credits to AV

Inspiration

To share with the data science community to jump start their journey in Healthcare Analytics
m
Data for: A hybrid machine learning approach to cerebral stroke prediction...
data.mendeley.com
Updated Nov 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tianyu Liu (2019). Data for: A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical-datasets [Dataset]. http://doi.org/10.17632/x8ygrw87jw.1
Explore at:
Unique identifier
https://doi.org/10.17632/x8ygrw87jw.1
Dataset updated
Nov 11, 2019
Authors
Tianyu Liu
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
basic dataset of stroke prediction
s
Electronic Health Records (EHR) Datasets
shaip.com
json
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2022). Electronic Health Records (EHR) Datasets [Dataset]. https://www.shaip.com/offerings/electronic-health-records-ehr-medical-data-catalog/
Explore at:
jsonAvailable download formats
Dataset updated
Apr 8, 2022
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Get premium quality off-the-shelf EHR dataset to develop better performing machine learning models. Speak to our experts for Electronic Health Records data needs.
i
MedCD: A Medical Clinical Dataset
ieee-dataport.org
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ye Chen (2025). MedCD: A Medical Clinical Dataset [Dataset]. https://ieee-dataport.org/documents/medcd-medical-clinical-dataset
Explore at:
Dataset updated
Feb 10, 2025
Authors
Ye Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
namely MedCD
p
A multimodal dental dataset facilitating machine learning research and...
physionet.org
Updated Oct 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenjing Liu; Yunyou Huang; Suqin Tang (2024). A multimodal dental dataset facilitating machine learning research and clinic services [Dataset]. http://doi.org/10.13026/h1tt-fc69
Explore at:
Unique identifier
https://doi.org/10.13026/h1tt-fc69
Dataset updated
Oct 11, 2024
Authors
Wenjing Liu; Yunyou Huang; Suqin Tang
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Oral diseases affect nearly 3.5 billion people, with the majority residing in low- and middle-income countries. Due to limited healthcare resources, many individuals are unable to access proper oral healthcare services. Image-based machine learning technology is one of the most promising approaches to improving oral healthcare services and reducing patient costs. Openly accessible datasets play a crucial role in facilitating the development of machine learning techniques. However, existing dental datasets have limitations such as a scarcity of Cone Beam Computed Tomography (CBCT) data, lack of matched multi-modal data, and insufficient complexity and diversity of the data. This project addresses these challenges by providing a dataset that includes 329 CBCT images from 169 patients, multi-modal data with matching modalities, and images representing various oral health conditions.
Deep Learning in Healthcare Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Deep Learning in Healthcare Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-deep-learning-in-healthcare-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Jan 7, 2025
Dataset provided by
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Deep Learning in Healthcare Market Outlook

The global deep learning in healthcare market size was valued at approximately $2.8 billion in 2023 and is projected to reach around $13.7 billion by 2032, growing at a robust compound annual growth rate (CAGR) of 19.4% during the forecast period. The rapid integration of artificial intelligence (AI) and machine learning technologies in healthcare systems, alongside advancements in computational power and data availability, are significant growth drivers for the market.

One of the primary growth factors for the deep learning in healthcare market is the increasing demand for efficient and accurate diagnostic tools. Deep learning algorithms have demonstrated superior performance in interpreting medical images, detecting anomalies, and predicting outcomes compared to traditional methods. This has led to widespread adoption in medical imaging, significantly enhancing diagnostic precision and reducing the burden on healthcare professionals. The ever-increasing volume of healthcare data, coupled with the need for quick and accurate decision-making, further propels the market forward. By leveraging large datasets, deep learning can achieve a level of precision and speed unattainable by human capabilities alone.

Another significant driver is the growing emphasis on personalized medicine. Deep learning enables the analysis of complex biological data, aiding in the development of personalized treatment plans tailored to individual patient profiles. This shift towards precision medicine is transforming patient care, allowing for more effective treatment protocols and better patient outcomes. The pharmaceutical industry, in particular, is investing heavily in deep learning technologies to expedite drug discovery and development processes, thereby reducing time-to-market and costs associated with bringing new drugs to consumers.

The adoption of electronic health records (EHRs) and the integration of AI in healthcare administration are also crucial growth factors. Deep learning algorithms can process vast amounts of patient data stored in EHRs to identify patterns and predict disease outbreaks, optimize resource allocation, and enhance patient management. The demand for streamlined operations and improved patient care is driving healthcare providers to incorporate these advanced technologies. Furthermore, the ongoing advancements in computational power and the availability of high-quality healthcare datasets are crucial enablers for the application of deep learning technologies in various healthcare domains.

Computer Vision in Healthcare is revolutionizing the way medical professionals approach diagnostics and treatment planning. By leveraging advanced image processing algorithms, computer vision can analyze medical images with remarkable accuracy, identifying patterns and anomalies that might be missed by the human eye. This technology is not only enhancing the precision of medical imaging but also enabling the development of automated systems that assist radiologists in interpreting complex datasets. The integration of computer vision in healthcare is streamlining workflows, reducing diagnostic errors, and ultimately improving patient outcomes. As the technology continues to evolve, its applications are expanding beyond imaging to include areas such as surgery, pathology, and patient monitoring, offering a comprehensive toolset for modern healthcare delivery.

On the regional front, North America holds the largest share of the deep learning in healthcare market, driven by substantial investments in AI technology, well-established healthcare infrastructure, and supportive government initiatives. The region's focus on technological innovation and its robust research ecosystem are key factors contributing to market growth. Moreover, the presence of leading AI and healthcare companies in North America accelerates the adoption of deep learning technologies. Europe and Asia Pacific are also witnessing significant growth, with the latter expected to exhibit the highest CAGR during the forecast period due to increasing healthcare digitization and rising investments in AI-driven healthcare solutions.

Component Analysis

The deep learning in healthcare market is segmented by component into software, hardware, and services. The software segment is anticipated to dominate the market owing to continuous advancements in AI algorithms and the development of sophisticated software solutions tailored for healthcar
Hindi, English and Punjabi Healthcare Datasets
zenodo.org
bin, csv
Updated Jan 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kajol Bagga Bagga; Kajol Bagga Bagga (2025). Hindi, English and Punjabi Healthcare Datasets [Dataset]. http://doi.org/10.62762/tis.2024.585616
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.62762/tis.2024.585616
Dataset updated
Jan 4, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kajol Bagga Bagga; Kajol Bagga Bagga
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Nov 11, 2024
Description
This repository contains two healthcare datasets in Hindi and Punjabi, translated from English. The datasets cover medical diagnoses, disease names, and related healthcare information. The data has been carefully cleaned and formatted to ensure accuracy and usability for various applications, including machine learning, NLP, and healthcare analysis.

The purpose of these datasets is to facilitate research and development in regional language processing, especially in the healthcare sector.
AI medical chatbot
kaggle.com
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yousef Saeedian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description:

This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

Key Features:

Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.

Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.

Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.

Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

Potential Use Cases:

NLP Model Training: Train models to understand and generate medical dialogues.

Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.

Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.

Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.
M
Machine Learning in Medicine Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Machine Learning in Medicine Report [Dataset]. https://www.archivemarketresearch.com/reports/machine-learning-in-medicine-57296
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Mar 14, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Machine Learning in Medicine market is experiencing robust growth, projected to reach $[Estimated 2025 Market Size in Millions] in 2025 and expand at a Compound Annual Growth Rate (CAGR) of 5% from 2025 to 2033. This significant expansion is fueled by several key drivers. The increasing availability of large, high-quality medical datasets, coupled with advancements in computing power and algorithm development, is enabling the creation of sophisticated machine learning models capable of enhancing diagnostic accuracy, accelerating drug discovery, and personalizing patient care. Furthermore, the rising prevalence of chronic diseases and the increasing demand for efficient and cost-effective healthcare solutions are bolstering the adoption of machine learning across various medical applications. Key trends within the market include the growing integration of AI-powered diagnostic tools, the rise of federated learning for protecting patient privacy while leveraging diverse datasets, and the expansion of machine learning applications into areas like personalized medicine and preventive healthcare. While data privacy and regulatory concerns pose challenges, the transformative potential of machine learning in improving healthcare outcomes is driving significant investment and innovation in this rapidly evolving market. The market segmentation reveals a strong focus on supervised learning techniques due to their effectiveness in tackling specific medical problems with labeled data. However, unsupervised learning and reinforcement learning are gaining traction, offering the potential for identifying novel patterns and optimizing treatment strategies, respectively. Application-wise, diagnosis and drug discovery currently lead the market, although other applications, including predictive modeling for risk assessment and personalized treatment plans, are showing considerable promise. Leading companies like Google, BioBeats, Jvion, and others are actively shaping the market landscape through their advanced technologies and strategic partnerships. Geographical distribution shows strong growth in North America and Europe, driven by advanced healthcare infrastructure and regulatory frameworks. However, emerging markets in Asia-Pacific are rapidly gaining ground due to increasing healthcare investment and a rising prevalence of diseases. The forecast period suggests continued expansion, particularly driven by the ongoing improvements in AI algorithms and the wider adoption across healthcare settings. We anticipate substantial growth across all segments driven by technological breakthroughs and a growing awareness of the clinical benefits.
Disease Prediction Using Machine Learning
dataandsons.com
csv, zip
Updated Oct 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
test test (2022). Disease Prediction Using Machine Learning [Dataset]. https://www.dataandsons.com/categories/machine-learning/disease-prediction-using-machine-learning
Explore at:
csv, zipAvailable download formats
Dataset updated
Oct 31, 2022
Dataset provided by
Authors
test test
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
About this Dataset

This dataset will help you apply your existing knowledge to great use. This dataset has 132 parameters on which 42 different types of diseases can be predicted. This dataset consists of 2 CSV files. One of them is for training and the other is for testing your model. Each CSV file has 133 columns. 132 of these columns are symptoms that a person experiences and the last column is the prognosis. These symptoms are mapped to 42 diseases you can classify these sets of symptoms. You are required to train your model on training data and test it on testing data.

Category

Machine Learning

Keywords

medicine,disease,Healthcare,ML,Machine Learning

Row Count

4962

Price

$109.00
EMRBots: a 10,000-patient database
figshare.com
zip
Updated Sep 3, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uri Kartoun (2018). EMRBots: a 10,000-patient database [Dataset]. http://doi.org/10.6084/m9.figshare.7040060.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7040060.v3
Dataset updated
Sep 3, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Uri Kartoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A 10,000-patient database that contains in total 10,000 virtual patients, 36,143 admissions, and 10,726,505 lab observations.
g
Image Segmentation for Medical Imaging
gts.ai
json
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2023). Image Segmentation for Medical Imaging [Dataset]. https://gts.ai/case-study/medical-imaging-enhanced-by-image-segmentation/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 20, 2023
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Transform healthcare diagnostics with image segmentation. Dive into advanced techniques for detailed medical imaging, aiding patient care.
m
Tuberculosis Dataset for Intelligent and Adaptive Medical Diagnostic System
data.mendeley.com
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steve Ohwo (2023). Tuberculosis Dataset for Intelligent and Adaptive Medical Diagnostic System [Dataset]. http://doi.org/10.17632/ndxdx54xxx.1
Explore at:
Unique identifier
https://doi.org/10.17632/ndxdx54xxx.1
Dataset updated
Sep 22, 2023
Authors
Steve Ohwo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Tuberculosis is a communicable chronic disease and one of the top ten causes of death worldwide according to World Health Organization (WHO). With availability of clean and well encoded clinical data from tuberculosis patients, artificial intelligence and machine learning algorithms would be able to transform the management of tuberculosis patients through intelligent prediction and intervention. This dataset contains four hundred and thirty (430) clinical data from patients with tuberculosis at Tuberculosis and Leprosy Hospital, Eku, Delta State, Nigeria. The dataset was gathered through validated and structured questionnaire administered using random sampling after obtaining the patients' consent. The collated dataset was pre-processed and encoded with variables (features) for prediction which include cough, night sweat, breathing difficulty, fever, chest pain, sputum, immune suppression, loss of pleasure, chill, lack of concentration, irritation, loss of appetite, loss of energy, lymph node enlargement, systolic blood pressure and BMI. Prediction of tuberculosis based on the clinical data from patients' features would play an essential role in diagnosis, intervention and management of tuberculosis patient.
MedMNIST: Standardized Biomedical Images
kaggle.com
Updated Feb 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Möbius (2024). MedMNIST: Standardized Biomedical Images [Dataset]. https://www.kaggle.com/datasets/arashnic/standardized-biomedical-images-medmnist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Möbius
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
"'https://www.nature.com/articles/s41597-022-01721-8'">MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification https://www.nature.com/articles/s41597-022-01721-8

A large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning.Providers benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools.

MedMNIST Landscape :

https://storage.googleapis.com/kagglesdsdata/datasets/4390240/7539891/medmnistlandscape.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240202%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240202T132716Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=479c8d80a4c6f28bf9532fea037969292a4f963662b022484a79c139297cfa1afc82db06c9b5275d6c52d5555d7fb178701d3ad7ebb036c9cf3d076fcf41014c05a6230d293f39dd320303efaa81d18e9c5888c23fe19884148a3be618e3e7c041383119a4c5547f0fa6cb1ddb5f3bf4dc1330a6fd5c693f32280e90fde5735e02052f2fc5b0003085d9ea70039903439814154dc39980dce3bace422d0672a69c4f4cefbe6bcebaacd2c5192a60172143667b14ba050a8383d0a7c6c639526c820ae58bbad99b4afc84e97bc87b2da6002d6faf181d4138e2a33961514370578892409b1e1a662424051573a3392273b00132a4f39becff877dff16a594848f" alt="medmnistlandscape">

About MedMNIST Landscape figure: The horizontal axis denotes the base-10 logarithm of the dataset scale, and the vertical axis denotes base-10 logarithm of imaging resolution. The upward and downward triangles are used to distinguish between 2D datasets and 3D datasets, and the 4 different colors represent different tasks

Key Features

###

Diverse: It covers diverse data modalities, dataset scales (from 100 to 100,000), and tasks (binary/multi-class, multi-label, and ordinal regression). It is as diverse as the VDD and MSD to fairly evaluate the generalizable performance of machine learning algorithms in different settings, but both 2D and 3D biomedical images are provided.

Standardized: Each sub-dataset is pre-processed into the same format, which requires no background knowledge for users. As an MNIST-like dataset collection to perform classification tasks on small images, it primarily focuses on the machine learning part rather than the end-to-end system. Furthermore, we provide standard train-validation-test splits for all datasets in MedMNIST, therefore algorithms could be easily compared.

User-Friendly: The small size of 28×28 (2D) or 28×28×28 (3D) is lightweight and ideal for evaluating machine learning algorithms. We also offer a larger-size version, MedMNIST+: 64x64 (2D), 128x128 (2D), 224x224 (2D), and 64x64x64 (3D). Serving as a complement to the 28-size MedMNIST, this could be a standardized resource for developing medical foundation models. All these datasets are accessible via the same API.

Educational: As an interdisciplinary research area, biomedical image analysis is difficult to hand on for researchers from other communities, as it requires background knowledge from computer vision, machine learning, biomedical imaging, and clinical science. Our data with the Creative Commons (CC) License is easy to use for educational purposes.

Refer to the paper to learn more about data : https://www.nature.com/articles/s41597-022-01721-8

Starter Code: download more data and training

Github Page: https://github.com/MedMNIST/MedMNIST

My Kaggle Starter Notebook: https://www.kaggle.com/code/arashnic/medmnist-download-and-use-data?scriptVersionId=161421937

Acknowledgements

Jiancheng Yang,Rui Shi,Donglai Wei,Zequan Liu,Lin Zhao,Bilian Ke,Hanspeter Pfister,Bingbing Ni Shanghai Jiao Tong University, Shanghai, China, Boston College, Chestnut Hill, MA RWTH Aachen University, Aachen, Germany, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Harvard University, Cambridge, MA

License and Citation

The code is under Apache-2.0 License.

The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)...
m
AHD: Arabic Healthcare Dataset
data.mendeley.com
Updated Sep 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hezam Gawbah (2024). AHD: Arabic Healthcare Dataset [Dataset]. http://doi.org/10.17632/mgj29ndgrk.6
Explore at:
Unique identifier
https://doi.org/10.17632/mgj29ndgrk.6
Dataset updated
Sep 4, 2024
Authors
Hezam Gawbah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Numerous language-centric research on healthcare is conducted day by day. To address shortcomings of Arabic natural language generation models, we introduce a large Arabic Healthcare Dataset (AHD) of textual data. For this motivation, we named our dataset ‘AHD’.

The largest Arabic Healthcare Dataset (AHD) as we know was collected from altibbi website.

The AHD consists of more than 808k Question and Answer into 90 variety categories. The AHD contains one file, and the file description will be discussed here. One file is the actual data which is in Arabic language.

AHD.xlsx file contains dataset in excel format, which includes the question, answer, and category in Arabic.

AHD_english.xlsx file contains dataset in excel format, which includes the question, answer, and category translated to English.

Distribution of Question and Answer per category.xlsex shows the distribution of the data set by category.
o
A living catalogue of artificial intelligence datasets and benchmarks for...
explore.openaire.eu
zenodo.org
+1more
Updated Apr 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blagec Kathrin; Kraiger Jakob; Samwald Matthias (2021). A living catalogue of artificial intelligence datasets and benchmarks for medical decision making [Dataset]. http://doi.org/10.5281/zenodo.4647823
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4647823
Dataset updated
Apr 6, 2021
Authors
Blagec Kathrin; Kraiger Jakob; Samwald Matthias
Description
We provide a comprehensive curated catalogue of artificial intelligence datasets and benchmarks for medical decision making. At the time of first release (April 2021), the dataset contains more than 400 biomedical and clinical datasets of which 252 are publicly available or available upon request. The dataset was compiled based on a systematic literature review covering both biomedical and computer science literature and grey literature data sources. All datasets were manually systematized and annotated for meta-information, such as: Availability and licensing information Type of source data Links to source publications, main references or dataset repositories Benchmark dataset were additionally annotated for the following information: Associated task Performance metrics commonly used for evaluation Clinical relevance The availability of data splits In addition to the versioned TSV file on Zenodo, the dataset can also be explored live via this Google Spreadsheet. The dataset is intended as a living, extendable resource. Edit suggestions and additions are encouraged and can be submitted via the comment function of the Google sheet. File descriptions annotated-datasets.tsv -- contains the annotated datasets arXiv-literature-export.tsv -- contains the original literature record export from arXiv pubmed-literature-export.tsv -- contains the original literature record export from PubMed README.md -- contains a detailed description of all annotation fields
m
An Extensive Dataset for the Heart Disease Classification System
data.mendeley.com
Updated Feb 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sozan S. Maghdid (2022). An Extensive Dataset for the Heart Disease Classification System [Dataset]. http://doi.org/10.17632/65gxgy2nmg.2
Explore at:
Unique identifier
https://doi.org/10.17632/65gxgy2nmg.2
Dataset updated
Feb 17, 2022
Authors
Sozan S. Maghdid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70. A comprehensive database for factors that contribute to a heart attack has been constructed. The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. The size of the dataset is 1319 samples, which have nine fields, where eight fields are for input fields and one field for an output field. Age, gender, heart rate (impulse), systolic BP (pressurehight), diastolic BP (pressurelow), blood sugar(glucose), CK-MB (kcm), and Test-Troponin (troponin) are representing the input fields, while the output field pertains to the presence of heart attack (class), which is divided into two categories (negative and positive); negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.