100+ datasets found
  1. g

    Healthcare Dataset

    • gts.ai
    json
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Healthcare Dataset [Dataset]. https://gts.ai/dataset-download/healthcare-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore our synthetic healthcare dataset designed for machine learning, data science, and healthcare analytics.

  2. m

    Data from: Generating Heterogeneous Big Data Set for Healthcare and...

    • data.mendeley.com
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omar Al-Obidi (2023). Generating Heterogeneous Big Data Set for Healthcare and Telemedicine Research Based on ECG, Spo2, Blood Pressure Sensors, and Text Inputs: Data set classified, Analyzed, Organized, And Presented in Excel File Format. [Dataset]. http://doi.org/10.17632/gsmjh55sfy.1
    Explore at:
    Dataset updated
    Jan 23, 2023
    Authors
    Omar Al-Obidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.

  3. s

    Transcribed Medical Records datasets for Machine Learning

    • shaip.com
    json
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2025). Transcribed Medical Records datasets for Machine Learning [Dataset]. https://www.shaip.com/offerings/transcribed-medical-records-medical-data-catalog/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 15, 2025
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Get premium quality Off-the-shelf transcribed medical records dataset to develop better performing machine learning models. Deep domain expertise. Fast & Cost-effective.

  4. Health Care Analytics

    • kaggle.com
    Updated Jan 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abishek Sudarshan
    Description

    Context

    Part of Janatahack Hackathon in Analytics Vidhya

    Content

    The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

    MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

    MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

    One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

    The Process:

    MedCamp employees / volunteers reach out to people and drive registrations.
    During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.
    

    Other things to note:

    Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people.
    For a few camps, there was hardware failure, so some information about date and time of registration is lost.
    MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides  
    information about several health issues through various awareness stalls.
    

    Favorable outcome:

    For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall.
    You need to predict the chances (probability) of having a favourable outcome.
    

    Train / Test split:

    Camps started on or before 31st March 2006 are considered in Train
    Test data is for all camps conducted on or after 1st April 2006.
    

    Acknowledgements

    Credits to AV

    Inspiration

    To share with the data science community to jump start their journey in Healthcare Analytics

  5. m

    Data for: A hybrid machine learning approach to cerebral stroke prediction...

    • data.mendeley.com
    Updated Nov 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tianyu Liu (2019). Data for: A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical-datasets [Dataset]. http://doi.org/10.17632/x8ygrw87jw.1
    Explore at:
    Dataset updated
    Nov 11, 2019
    Authors
    Tianyu Liu
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    basic dataset of stroke prediction

  6. s

    Electronic Health Records (EHR) Datasets

    • shaip.com
    json
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2022). Electronic Health Records (EHR) Datasets [Dataset]. https://www.shaip.com/offerings/electronic-health-records-ehr-medical-data-catalog/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Apr 8, 2022
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Get premium quality off-the-shelf EHR dataset to develop better performing machine learning models. Speak to our experts for Electronic Health Records data needs.

  7. i

    MedCD: A Medical Clinical Dataset

    • ieee-dataport.org
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ye Chen (2025). MedCD: A Medical Clinical Dataset [Dataset]. https://ieee-dataport.org/documents/medcd-medical-clinical-dataset
    Explore at:
    Dataset updated
    Feb 10, 2025
    Authors
    Ye Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    namely MedCD

  8. p

    A multimodal dental dataset facilitating machine learning research and...

    • physionet.org
    Updated Oct 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenjing Liu; Yunyou Huang; Suqin Tang (2024). A multimodal dental dataset facilitating machine learning research and clinic services [Dataset]. http://doi.org/10.13026/h1tt-fc69
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Wenjing Liu; Yunyou Huang; Suqin Tang
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Oral diseases affect nearly 3.5 billion people, with the majority residing in low- and middle-income countries. Due to limited healthcare resources, many individuals are unable to access proper oral healthcare services. Image-based machine learning technology is one of the most promising approaches to improving oral healthcare services and reducing patient costs. Openly accessible datasets play a crucial role in facilitating the development of machine learning techniques. However, existing dental datasets have limitations such as a scarcity of Cone Beam Computed Tomography (CBCT) data, lack of matched multi-modal data, and insufficient complexity and diversity of the data. This project addresses these challenges by providing a dataset that includes 329 CBCT images from 169 patients, multi-modal data with matching modalities, and images representing various oral health conditions.

  9. Deep Learning in Healthcare Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Deep Learning in Healthcare Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-deep-learning-in-healthcare-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Authors
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Deep Learning in Healthcare Market Outlook



    The global deep learning in healthcare market size was valued at approximately $2.8 billion in 2023 and is projected to reach around $13.7 billion by 2032, growing at a robust compound annual growth rate (CAGR) of 19.4% during the forecast period. The rapid integration of artificial intelligence (AI) and machine learning technologies in healthcare systems, alongside advancements in computational power and data availability, are significant growth drivers for the market.



    One of the primary growth factors for the deep learning in healthcare market is the increasing demand for efficient and accurate diagnostic tools. Deep learning algorithms have demonstrated superior performance in interpreting medical images, detecting anomalies, and predicting outcomes compared to traditional methods. This has led to widespread adoption in medical imaging, significantly enhancing diagnostic precision and reducing the burden on healthcare professionals. The ever-increasing volume of healthcare data, coupled with the need for quick and accurate decision-making, further propels the market forward. By leveraging large datasets, deep learning can achieve a level of precision and speed unattainable by human capabilities alone.



    Another significant driver is the growing emphasis on personalized medicine. Deep learning enables the analysis of complex biological data, aiding in the development of personalized treatment plans tailored to individual patient profiles. This shift towards precision medicine is transforming patient care, allowing for more effective treatment protocols and better patient outcomes. The pharmaceutical industry, in particular, is investing heavily in deep learning technologies to expedite drug discovery and development processes, thereby reducing time-to-market and costs associated with bringing new drugs to consumers.



    The adoption of electronic health records (EHRs) and the integration of AI in healthcare administration are also crucial growth factors. Deep learning algorithms can process vast amounts of patient data stored in EHRs to identify patterns and predict disease outbreaks, optimize resource allocation, and enhance patient management. The demand for streamlined operations and improved patient care is driving healthcare providers to incorporate these advanced technologies. Furthermore, the ongoing advancements in computational power and the availability of high-quality healthcare datasets are crucial enablers for the application of deep learning technologies in various healthcare domains.



    Computer Vision in Healthcare is revolutionizing the way medical professionals approach diagnostics and treatment planning. By leveraging advanced image processing algorithms, computer vision can analyze medical images with remarkable accuracy, identifying patterns and anomalies that might be missed by the human eye. This technology is not only enhancing the precision of medical imaging but also enabling the development of automated systems that assist radiologists in interpreting complex datasets. The integration of computer vision in healthcare is streamlining workflows, reducing diagnostic errors, and ultimately improving patient outcomes. As the technology continues to evolve, its applications are expanding beyond imaging to include areas such as surgery, pathology, and patient monitoring, offering a comprehensive toolset for modern healthcare delivery.



    On the regional front, North America holds the largest share of the deep learning in healthcare market, driven by substantial investments in AI technology, well-established healthcare infrastructure, and supportive government initiatives. The region's focus on technological innovation and its robust research ecosystem are key factors contributing to market growth. Moreover, the presence of leading AI and healthcare companies in North America accelerates the adoption of deep learning technologies. Europe and Asia Pacific are also witnessing significant growth, with the latter expected to exhibit the highest CAGR during the forecast period due to increasing healthcare digitization and rising investments in AI-driven healthcare solutions.



    Component Analysis



    The deep learning in healthcare market is segmented by component into software, hardware, and services. The software segment is anticipated to dominate the market owing to continuous advancements in AI algorithms and the development of sophisticated software solutions tailored for healthcar

  10. Hindi, English and Punjabi Healthcare Datasets

    • zenodo.org
    bin, csv
    Updated Jan 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kajol Bagga Bagga; Kajol Bagga Bagga (2025). Hindi, English and Punjabi Healthcare Datasets [Dataset]. http://doi.org/10.62762/tis.2024.585616
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jan 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kajol Bagga Bagga; Kajol Bagga Bagga
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Nov 11, 2024
    Description

    This repository contains two healthcare datasets in Hindi and Punjabi, translated from English. The datasets cover medical diagnoses, disease names, and related healthcare information. The data has been carefully cleaned and formatted to ensure accuracy and usability for various applications, including machine learning, NLP, and healthcare analysis.

    The purpose of these datasets is to facilitate research and development in regional language processing, especially in the healthcare sector.

  11. AI medical chatbot

    • kaggle.com
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yousef Saeedian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description:

    This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

    Key Features:

    • Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.
    • Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.
    • Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.
    • Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

    Potential Use Cases:

    • NLP Model Training: Train models to understand and generate medical dialogues.
    • Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.
    • Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.
    • Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

    This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.

  12. M

    Machine Learning in Medicine Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Machine Learning in Medicine Report [Dataset]. https://www.archivemarketresearch.com/reports/machine-learning-in-medicine-57296
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Machine Learning in Medicine market is experiencing robust growth, projected to reach $[Estimated 2025 Market Size in Millions] in 2025 and expand at a Compound Annual Growth Rate (CAGR) of 5% from 2025 to 2033. This significant expansion is fueled by several key drivers. The increasing availability of large, high-quality medical datasets, coupled with advancements in computing power and algorithm development, is enabling the creation of sophisticated machine learning models capable of enhancing diagnostic accuracy, accelerating drug discovery, and personalizing patient care. Furthermore, the rising prevalence of chronic diseases and the increasing demand for efficient and cost-effective healthcare solutions are bolstering the adoption of machine learning across various medical applications. Key trends within the market include the growing integration of AI-powered diagnostic tools, the rise of federated learning for protecting patient privacy while leveraging diverse datasets, and the expansion of machine learning applications into areas like personalized medicine and preventive healthcare. While data privacy and regulatory concerns pose challenges, the transformative potential of machine learning in improving healthcare outcomes is driving significant investment and innovation in this rapidly evolving market. The market segmentation reveals a strong focus on supervised learning techniques due to their effectiveness in tackling specific medical problems with labeled data. However, unsupervised learning and reinforcement learning are gaining traction, offering the potential for identifying novel patterns and optimizing treatment strategies, respectively. Application-wise, diagnosis and drug discovery currently lead the market, although other applications, including predictive modeling for risk assessment and personalized treatment plans, are showing considerable promise. Leading companies like Google, BioBeats, Jvion, and others are actively shaping the market landscape through their advanced technologies and strategic partnerships. Geographical distribution shows strong growth in North America and Europe, driven by advanced healthcare infrastructure and regulatory frameworks. However, emerging markets in Asia-Pacific are rapidly gaining ground due to increasing healthcare investment and a rising prevalence of diseases. The forecast period suggests continued expansion, particularly driven by the ongoing improvements in AI algorithms and the wider adoption across healthcare settings. We anticipate substantial growth across all segments driven by technological breakthroughs and a growing awareness of the clinical benefits.

  13. Disease Prediction Using Machine Learning

    • dataandsons.com
    csv, zip
    Updated Oct 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    test test (2022). Disease Prediction Using Machine Learning [Dataset]. https://www.dataandsons.com/categories/machine-learning/disease-prediction-using-machine-learning
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    Authors
    test test
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    About this Dataset

    This dataset will help you apply your existing knowledge to great use. This dataset has 132 parameters on which 42 different types of diseases can be predicted. This dataset consists of 2 CSV files. One of them is for training and the other is for testing your model. Each CSV file has 133 columns. 132 of these columns are symptoms that a person experiences and the last column is the prognosis. These symptoms are mapped to 42 diseases you can classify these sets of symptoms. You are required to train your model on training data and test it on testing data.

    Category

    Machine Learning

    Keywords

    medicine,disease,Healthcare,ML,Machine Learning

    Row Count

    4962

    Price

    $109.00

  14. EMRBots: a 10,000-patient database

    • figshare.com
    zip
    Updated Sep 3, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uri Kartoun (2018). EMRBots: a 10,000-patient database [Dataset]. http://doi.org/10.6084/m9.figshare.7040060.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 3, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Uri Kartoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A 10,000-patient database that contains in total 10,000 virtual patients, 36,143 admissions, and 10,726,505 lab observations.

  15. g

    Image Segmentation for Medical Imaging

    • gts.ai
    json
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2023). Image Segmentation for Medical Imaging [Dataset]. https://gts.ai/case-study/medical-imaging-enhanced-by-image-segmentation/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Transform healthcare diagnostics with image segmentation. Dive into advanced techniques for detailed medical imaging, aiding patient care.

  16. m

    Tuberculosis Dataset for Intelligent and Adaptive Medical Diagnostic System

    • data.mendeley.com
    Updated Sep 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Ohwo (2023). Tuberculosis Dataset for Intelligent and Adaptive Medical Diagnostic System [Dataset]. http://doi.org/10.17632/ndxdx54xxx.1
    Explore at:
    Dataset updated
    Sep 22, 2023
    Authors
    Steve Ohwo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tuberculosis is a communicable chronic disease and one of the top ten causes of death worldwide according to World Health Organization (WHO). With availability of clean and well encoded clinical data from tuberculosis patients, artificial intelligence and machine learning algorithms would be able to transform the management of tuberculosis patients through intelligent prediction and intervention. This dataset contains four hundred and thirty (430) clinical data from patients with tuberculosis at Tuberculosis and Leprosy Hospital, Eku, Delta State, Nigeria. The dataset was gathered through validated and structured questionnaire administered using random sampling after obtaining the patients' consent. The collated dataset was pre-processed and encoded with variables (features) for prediction which include cough, night sweat, breathing difficulty, fever, chest pain, sputum, immune suppression, loss of pleasure, chill, lack of concentration, irritation, loss of appetite, loss of energy, lymph node enlargement, systolic blood pressure and BMI. Prediction of tuberculosis based on the clinical data from patients' features would play an essential role in diagnosis, intervention and management of tuberculosis patient.

  17. MedMNIST: Standardized Biomedical Images

    • kaggle.com
    Updated Feb 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Möbius (2024). MedMNIST: Standardized Biomedical Images [Dataset]. https://www.kaggle.com/datasets/arashnic/standardized-biomedical-images-medmnist
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Möbius
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    "'https://www.nature.com/articles/s41597-022-01721-8'">MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification https://www.nature.com/articles/s41597-022-01721-8

    A large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning.Providers benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools.

    MedMNIST Landscape :

    https://storage.googleapis.com/kagglesdsdata/datasets/4390240/7539891/medmnistlandscape.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240202%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240202T132716Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=479c8d80a4c6f28bf9532fea037969292a4f963662b022484a79c139297cfa1afc82db06c9b5275d6c52d5555d7fb178701d3ad7ebb036c9cf3d076fcf41014c05a6230d293f39dd320303efaa81d18e9c5888c23fe19884148a3be618e3e7c041383119a4c5547f0fa6cb1ddb5f3bf4dc1330a6fd5c693f32280e90fde5735e02052f2fc5b0003085d9ea70039903439814154dc39980dce3bace422d0672a69c4f4cefbe6bcebaacd2c5192a60172143667b14ba050a8383d0a7c6c639526c820ae58bbad99b4afc84e97bc87b2da6002d6faf181d4138e2a33961514370578892409b1e1a662424051573a3392273b00132a4f39becff877dff16a594848f" alt="medmnistlandscape">

    About MedMNIST Landscape figure: The horizontal axis denotes the base-10 logarithm of the dataset scale, and the vertical axis denotes base-10 logarithm of imaging resolution. The upward and downward triangles are used to distinguish between 2D datasets and 3D datasets, and the 4 different colors represent different tasks

    Key Features

    ###

    Diverse: It covers diverse data modalities, dataset scales (from 100 to 100,000), and tasks (binary/multi-class, multi-label, and ordinal regression). It is as diverse as the VDD and MSD to fairly evaluate the generalizable performance of machine learning algorithms in different settings, but both 2D and 3D biomedical images are provided.

    Standardized: Each sub-dataset is pre-processed into the same format, which requires no background knowledge for users. As an MNIST-like dataset collection to perform classification tasks on small images, it primarily focuses on the machine learning part rather than the end-to-end system. Furthermore, we provide standard train-validation-test splits for all datasets in MedMNIST, therefore algorithms could be easily compared.

    User-Friendly: The small size of 28×28 (2D) or 28×28×28 (3D) is lightweight and ideal for evaluating machine learning algorithms. We also offer a larger-size version, MedMNIST+: 64x64 (2D), 128x128 (2D), 224x224 (2D), and 64x64x64 (3D). Serving as a complement to the 28-size MedMNIST, this could be a standardized resource for developing medical foundation models. All these datasets are accessible via the same API.

    Educational: As an interdisciplinary research area, biomedical image analysis is difficult to hand on for researchers from other communities, as it requires background knowledge from computer vision, machine learning, biomedical imaging, and clinical science. Our data with the Creative Commons (CC) License is easy to use for educational purposes.

    Refer to the paper to learn more about data : https://www.nature.com/articles/s41597-022-01721-8

    Starter Code: download more data and training

    Github Page: https://github.com/MedMNIST/MedMNIST

    My Kaggle Starter Notebook: https://www.kaggle.com/code/arashnic/medmnist-download-and-use-data?scriptVersionId=161421937

    Acknowledgements

    Jiancheng Yang,Rui Shi,Donglai Wei,Zequan Liu,Lin Zhao,Bilian Ke,Hanspeter Pfister,Bingbing Ni Shanghai Jiao Tong University, Shanghai, China, Boston College, Chestnut Hill, MA RWTH Aachen University, Aachen, Germany, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Harvard University, Cambridge, MA

    License and Citation

    The code is under Apache-2.0 License.

    The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)...

  18. m

    AHD: Arabic Healthcare Dataset

    • data.mendeley.com
    Updated Sep 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hezam Gawbah (2024). AHD: Arabic Healthcare Dataset [Dataset]. http://doi.org/10.17632/mgj29ndgrk.6
    Explore at:
    Dataset updated
    Sep 4, 2024
    Authors
    Hezam Gawbah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • Numerous language-centric research on healthcare is conducted day by day. To address shortcomings of Arabic natural language generation models, we introduce a large Arabic Healthcare Dataset (AHD) of textual data. For this motivation, we named our dataset ‘AHD’.
    • The largest Arabic Healthcare Dataset (AHD) as we know was collected from altibbi website.

    • The AHD consists of more than 808k Question and Answer into 90 variety categories. The AHD contains one file, and the file description will be discussed here. One file is the actual data which is in Arabic language.

      • AHD.xlsx file contains dataset in excel format, which includes the question, answer, and category in Arabic.

      • AHD_english.xlsx file contains dataset in excel format, which includes the question, answer, and category translated to English.

    • Distribution of Question and Answer per category.xlsex shows the distribution of the data set by category.

  19. o

    A living catalogue of artificial intelligence datasets and benchmarks for...

    • explore.openaire.eu
    • zenodo.org
    • +1more
    Updated Apr 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blagec Kathrin; Kraiger Jakob; Samwald Matthias (2021). A living catalogue of artificial intelligence datasets and benchmarks for medical decision making [Dataset]. http://doi.org/10.5281/zenodo.4647823
    Explore at:
    Dataset updated
    Apr 6, 2021
    Authors
    Blagec Kathrin; Kraiger Jakob; Samwald Matthias
    Description

    We provide a comprehensive curated catalogue of artificial intelligence datasets and benchmarks for medical decision making. At the time of first release (April 2021), the dataset contains more than 400 biomedical and clinical datasets of which 252 are publicly available or available upon request. The dataset was compiled based on a systematic literature review covering both biomedical and computer science literature and grey literature data sources. All datasets were manually systematized and annotated for meta-information, such as: Availability and licensing information Type of source data Links to source publications, main references or dataset repositories Benchmark dataset were additionally annotated for the following information: Associated task Performance metrics commonly used for evaluation Clinical relevance The availability of data splits In addition to the versioned TSV file on Zenodo, the dataset can also be explored live via this Google Spreadsheet. The dataset is intended as a living, extendable resource. Edit suggestions and additions are encouraged and can be submitted via the comment function of the Google sheet. File descriptions annotated-datasets.tsv -- contains the annotated datasets arXiv-literature-export.tsv -- contains the original literature record export from arXiv pubmed-literature-export.tsv -- contains the original literature record export from PubMed README.md -- contains a detailed description of all annotation fields

  20. m

    An Extensive Dataset for the Heart Disease Classification System

    • data.mendeley.com
    Updated Feb 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sozan S. Maghdid (2022). An Extensive Dataset for the Heart Disease Classification System [Dataset]. http://doi.org/10.17632/65gxgy2nmg.2
    Explore at:
    Dataset updated
    Feb 17, 2022
    Authors
    Sozan S. Maghdid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70. A comprehensive database for factors that contribute to a heart attack has been constructed. The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. The size of the dataset is 1319 samples, which have nine fields, where eight fields are for input fields and one field for an output field. Age, gender, heart rate (impulse), systolic BP (pressurehight), diastolic BP (pressurelow), blood sugar(glucose), CK-MB (kcm), and Test-Troponin (troponin) are representing the input fields, while the output field pertains to the presence of heart attack (class), which is divided into two categories (negative and positive); negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
GTS (2024). Healthcare Dataset [Dataset]. https://gts.ai/dataset-download/healthcare-dataset/

Healthcare Dataset

Explore at:
jsonAvailable download formats
Dataset updated
Oct 19, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Explore our synthetic healthcare dataset designed for machine learning, data science, and healthcare analytics.

Search
Clear search
Close search
Google apps
Main menu