Search
Clear search
Close search
Main menu
Google apps
100+ datasets found
  1. Medical Diagnosis Dataset

    • kaggle.com
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srestha Jain (2025). Medical Diagnosis Dataset [Dataset]. https://www.kaggle.com/datasets/sresthajain/medical-diagnosis-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Srestha Jain
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context: This synthetic healthcare dataset has been created to serve as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practice, develop, and showcase their data manipulation and analysis skills in the context of the healthcare industry.

    Inspiration: The inspiration behind this dataset is rooted in the need for practical and diverse healthcare data for educational and research purposes. Healthcare data is often sensitive and subject to privacy regulations, making it challenging to access for learning and experimentation. To address this gap, I have leveraged Python's Faker library to generate a dataset that mirrors the structure and attributes commonly found in healthcare records. By providing this synthetic data, I hope to foster innovation, learning, and knowledge sharing in the healthcare analytics domain.

    Dataset Information: Each column provides specific information about the patient, their admission, and the healthcare services provided, making this dataset suitable for various data analysis and modeling tasks in the healthcare domain. Here's a brief explanation of each column in the dataset -

    Name: This column represents the name of the patient associated with the healthcare record. Age: The age of the patient at the time of admission, expressed in years. Gender: Indicates the gender of the patient, either "Male" or "Female." Blood Type: The patient's blood type, which can be one of the common blood types (e.g., "A+", "O-", etc.). Medical Condition: This column specifies the primary medical condition or diagnosis associated with the patient, such as "Diabetes," "Hypertension," "Asthma," and more. Date of Admission: The date on which the patient was admitted to the healthcare facility. Doctor: The name of the doctor responsible for the patient's care during their admission. Hospital: Identifies the healthcare facility or hospital where the patient was admitted. Insurance Provider: This column indicates the patient's insurance provider, which can be one of several options, including "Aetna," "Blue Cross," "Cigna," "UnitedHealthcare," and "Medicare." Billing Amount: The amount of money billed for the patient's healthcare services during their admission. This is expressed as a floating-point number. Room Number: The room number where the patient was accommodated during their admission. Admission Type: Specifies the type of admission, which can be "Emergency," "Elective," or "Urgent," reflecting the circumstances of the admission. Discharge Date: The date on which the patient was discharged from the healthcare facility, based on the admission date and a random number of days within a realistic range. Medication: Identifies a medication prescribed or administered to the patient during their admission. Examples include "Aspirin," "Ibuprofen," "Penicillin," "Paracetamol," and "Lipitor." Test Results: Describes the results of a medical test conducted during the patient's admission. Possible values include "Normal," "Abnormal," or "Inconclusive," indicating the outcome of the test. Usage Scenarios: This dataset can be utilized for a wide range of purposes, including:

    Developing and testing healthcare predictive models. Practicing data cleaning, transformation, and analysis techniques. Creating data visualizations to gain insights into healthcare trends. Learning and teaching data science and machine learning concepts in a healthcare context. You can treat it as a Multi-Class Classification Problem and solve it for Test Results which contains 3 categories(Normal, Abnormal, and Inconclusive). Acknowledgments: I acknowledge the importance of healthcare data privacy and security and emphasize that this dataset is entirely synthetic. It does not contain any real patient information or violate any privacy regulations. I hope that this dataset contributes to the advancement of data science and healthcare analytics and inspires new ideas. Feel free to explore, analyze, and share your findings with the Kaggle community.

  2. m

    Data from: Generating Heterogeneous Big Data Set for Healthcare and...

    • data.mendeley.com
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omar Al-Obidi (2023). Generating Heterogeneous Big Data Set for Healthcare and Telemedicine Research Based on ECG, Spo2, Blood Pressure Sensors, and Text Inputs: Data set classified, Analyzed, Organized, And Presented in Excel File Format. [Dataset]. http://doi.org/10.17632/gsmjh55sfy.1
    Explore at:
    Dataset updated
    Jan 23, 2023
    Authors
    Omar Al-Obidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.

  3. g

    Medical Data Collection

    • gts.ai
    json
    Updated Jul 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2022). Medical Data Collection [Dataset]. https://gts.ai/case-study/medical-data-collection/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 19, 2022
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Guide to Medical Data Collection Key techniques, ethics, and tech advancements reshaping healthcare data management for improved care.

  4. m

    Explainable AI (XAI) and Interpretable Machine Learning (IML) in Healthcare...

    • data.mendeley.com
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shatha Alghamdi (2025). Explainable AI (XAI) and Interpretable Machine Learning (IML) in Healthcare Dataset [Dataset]. http://doi.org/10.17632/5tcdzzsmx8.1
    Explore at:
    Dataset updated
    Jan 20, 2025
    Authors
    Shatha Alghamdi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Explainable AI (XAI) and Interpretable Machine Learning (IML) in Healthcare Dataset was compiled from the Web of Science database, encompassing a comprehensive collection of 5,083 research articles focused on AI explainability in healthcare. The dataset includes various document types, such as Articles, Review Articles, Early Access Papers, and Proceeding Papers, all in English. Spanning from 1986 to April 2023, the dataset features key attributes, including Authors, Author Full Names, Article Title, Document Type, Author Keywords, Keywords Plus, Abstract, Publication Year, DOI Link, WoS Categories, and Research Areas.

    This dataset has been created to provide essential insights into the utilization of XAI and IML in healthcare contexts. In our work [1], we utilized this dataset to perform a systematic analysis, resulting in the identification and categorization of 13 key parameters across three macro-parameters: Research Methods, Health Disorders, and Disease Prevention. Informed by a focused review of over 200 articles, this analysis illuminates specific applications and highlights challenges in XAI, showcasing its impact on enhancing diagnostic accuracy, treatment efficacy, and preventive strategies. We then developed the FIXAIH framework to transform these insights into actionable guidelines, enhancing the interpretability, explainability, and accountability of AI systems in healthcare. Designed to ensure that AI technologies are ethically sound and comprehensible to healthcare professionals, the FIXAIH framework bridges the gap between technical proficiency and clinical utility, promoting the practical application of AI for a more reliable and patient-centric approach.

    This dataset and its analysis are integral to our broader research and development strategy, focusing on multiperspective parameter discovery and the advancement of autonomous systems [2]. Our approach leverages big data, deep learning, and digital media to explore and analyze cross-sectional, multi-perspective insights, supporting improved decision-making and more effective governance frameworks. These perspectives span academic, public, industrial, and governmental domains. We have applied this approach across various fields and sectors, including AI governance [3], energy [4], education [5], healthcare [6–8], transportation [9,10], labor markets [11,12], tourism [13], service industries [14], and others.

    References 1. doi:10.2139/SSRN.5086713. 2. doi: 10.54377/95e5-08b3 3. doi:10.3389/FNINF.2024.1472653/BIBTEX. 4. doi:10.3389/FENRG.2023.1071291. 5. doi:10.3389/FRSC.2022.871171/BIBTEX. 6. doi:10.3390/SU14063313. 7. doi:10.3390/TOXICS11030287. 8. doi:10.3390/app10041398. 9. doi:10.3390/SU14095711. 10. doi:10.3390/s21092993. 11. doi:10.3390/JOURNALMEDIA4010010. 12. doi:10.1177/00368504231213788. 13. doi:10.3390/SU15054166. 14. doi:10.3390/SU152216003.

  5. MedQuAD: Medical Question-Answer Dataset

    • kaggle.com
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afroz (2024). MedQuAD: Medical Question-Answer Dataset [Dataset]. https://www.kaggle.com/datasets/pythonafroz/medquad-medical-question-answer-for-ai-research
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Afroz
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Medical Questions: Unveiling the MedQuAD Dataset

    Have you ever wondered where medical chatbots or intelligent search engines for health information get their knowledge? The answer lies in large datasets like MedQuAD! This rich resource provides a treasure trove of real-world medical questions and informative answers, paving the way for advancements in Natural Language Processing (NLP) and Information Retrieval (IR) within the healthcare domain.

    What is MedQuAD?

    MedQuAD, short for Medical Question Answering Dataset, is a collection of question-answer pairs meticulously curated from 12 trusted National Institutes of Health (NIH) websites. These websites cover a wide range of health topics, from cancer.gov to GARD (Genetic and Rare Diseases Information Resource).

    What makes MedQuAD unique?

    Beyond the sheer volume of data, MedQuAD offers unique features that empower researchers and developers:

    1. Diversity of Questions: MedQuAD encompasses a spectrum of 37 question types, ranging from treatment options and diagnosis inquiries to understanding side effects. This variety reflects the diverse needs of individuals seeking medical information.
    2. Focus on Specific Entities: MedQuAD goes beyond just questions and answers. It delves deeper by associating each question with the entity it focuses on, such as diseases, drugs, or other medical tests. This targeted approach facilitates more focused research and NLP applications.
    3. Rich Annotations: While the answers from MedlinePlus collections are excluded due to copyright restrictions, MedQuAD retains valuable annotations within its XML files. These annotations include question type, synonyms, unique identifiers (CUI) for medical concepts, and semantic types. This additional information opens doors for more sophisticated NLP tasks.

    The Power of MedQuAD

    MedQuAD serves as a valuable springboard for various applications in the medical NLP and IR field. Here are some potential uses:

    1. Training Chatbots and Virtual Assistants: AI-powered medical chatbots can leverage MedQuAD to learn how to respond accurately and informatively to a wide range of health inquiries from users.
    2. Developing Intelligent Search Engines: Search engines can be enhanced to provide more relevant and accurate health information by drawing insights from the question types and focuses presented in MedQuAD.
    3. Studying User Concerns in Healthcare: Analyzing the types of questions within MedQuAD can reveal valuable insights into what information users are most interested in and what areas require clearer explanations.

    In essence, MedQuAD is a powerful tool for unlocking the potential of NLP and IR in the medical domain. By leveraging this rich dataset, researchers and developers are paving the way for a future where individuals can access accurate and comprehensive health information with increasing ease and efficiency.

    Reference:

    If you use the MedQuAD dataset or the associated QA test collection, please cite the following paper: Ben Abacha, A., & Demner-Fushman, D. (2019). A Question-Entailment Approach to Question Answering. BMC Bioinformatics, 20(1), 511. https://doi.org/10.1186/s12859-019-3119-4

  6. m

    AHD: Arabic Healthcare Dataset

    • data.mendeley.com
    Updated Sep 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hezam Gawbah (2024). AHD: Arabic Healthcare Dataset [Dataset]. http://doi.org/10.17632/mgj29ndgrk.6
    Explore at:
    Dataset updated
    Sep 4, 2024
    Authors
    Hezam Gawbah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • Numerous language-centric research on healthcare is conducted day by day. To address shortcomings of Arabic natural language generation models, we introduce a large Arabic Healthcare Dataset (AHD) of textual data. For this motivation, we named our dataset ‘AHD’.
    • The largest Arabic Healthcare Dataset (AHD) as we know was collected from altibbi website.

    • The AHD consists of more than 808k Question and Answer into 90 variety categories. The AHD contains one file, and the file description will be discussed here. One file is the actual data which is in Arabic language.

      • AHD.xlsx file contains dataset in excel format, which includes the question, answer, and category in Arabic.

      • AHD_english.xlsx file contains dataset in excel format, which includes the question, answer, and category translated to English.

    • Distribution of Question and Answer per category.xlsex shows the distribution of the data set by category.

  7. i

    IoT Healthcare Security Dataset

    • ieee-dataport.org
    • outspacevarieties.store
    Updated Aug 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal Hussain (2021). IoT Healthcare Security Dataset [Dataset]. http://doi.org/10.21227/9w13-2t13
    Explore at:
    Dataset updated
    Aug 16, 2021
    Dataset provided by
    IEEE Dataport
    Authors
    Faisal Hussain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Internet of things (IoT) has emerged as a topic of intense interest among the research and industrial community as it has had a revolutionary impact on human life. The rapid growth of IoT technology has revolutionized human life by inaugurating the concept of smart devices, smart healthcare, smart industry, smart city, smart grid, among others. IoT devices’ security has become a serious concern nowadays, especially for the healthcare domain, where recent attacks exposed damaging IoT security vulnerabilities. Traditional network security solutions are well established. However, due to the resource constraint property of IoT devices and the distinct behavior of IoT protocols, the existing security mechanisms cannot be deployed directly for securing the IoT devices and network from the cyber-attacks. To enhance the level of security for IoT, researchers need IoT-specific tools, methods, and datasets. To address the mentioned problem, we provide a framework for developing IoT context-aware security solutions to detect malicious traffic in IoT use cases. The proposed framework consists of a newly created, open-source IoT data generator tool named IoT-Flock. The IoT-Flock tool allows researchers to develop an IoT use-case comprised of both normal and malicious IoT devices and generate traffic. Additionally, the proposed framework provides an open-source utility for converting the captured traffic generated by IoT-Flock into an IoT dataset. Using the proposed framework in this research, we first generated an IoT healthcare dataset which comprises both normal and IoT attack traffic. Afterwards, we applied different machine learning techniques to the generated dataset to detect the cyber-attacks and protect the healthcare system from cyber-attacks. The proposed framework will help in developing the context-aware IoT security solutions, especially for a sensitive use case like IoT healthcare environment.

  8. g

    Skin Cancer Binary Classification Dataset.

    • gts.ai
    json
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2023). Skin Cancer Binary Classification Dataset. [Dataset]. https://gts.ai/dataset-download/skin-cancer-binary-dataset-ai-data-collection-company/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A skin cancer binary classification dataset is a collection of images or data related to the detection and classification of skin lesions as either benign (non-cancerous) or malignant (cancerous)..

  9. s

    Transcribed Medical Records datasets for Machine Learning

    • lb.shaip.com
    • nl.shaip.com
    • +74more
    json
    Updated Dec 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). Transcribed Medical Records datasets for Machine Learning [Dataset]. https://lb.shaip.com/offerings/transcribed-medical-records-medical-data-catalog/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Dec 24, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Transcription of 257,977 hours of Real-world Physician Dictation from 31 specialties to train Healthcare Speech models. Transcribed medical records refers to transcription of physician and patient conversation, transcription of medical reports and medical assessment. It helps in mapping the medical history of the patient for future visits and also acts as a refence point for the doctors. It helps the doctor to evaluate the present condition of the patient and suggest a suitable treatment.

  10. d

    FileMarket | 20,000 pictures | Object Detection Data | AI Training Data |...

    • datarade.ai
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2024). FileMarket | 20,000 pictures | Object Detection Data | AI Training Data | Deep Learning (DL) Data| Gesture Recognition / Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/filemarket-ai-training-data-gesture-recognition-machine-filemarket
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset authored and provided by
    FileMarket
    Area covered
    French Polynesia, South Africa, State of, Switzerland, Equatorial Guinea, Bhutan, Korea (Democratic People's Republic of), Moldova (Republic of), Cook Islands, Peru
    Description

    FileMarket offers premium Machine Learning (ML) Data tailored for gesture recognition and various AI applications. Our globally sourced datasets are meticulously curated to ensure high quality and accuracy, providing a solid foundation for training robust and reliable ML models. In addition to ML data, we also specialize in Object Detection Data, Medical Imaging Data, Large Language Model (LLM) Data, and Deep Learning (DL) Data. Each category is crafted with precision to meet the diverse needs of cutting-edge AI and machine learning projects.

    Use cases of our Machine Learning (ML) Data:

    Gesture recognition Computer vision Natural language processing (NLP) Predictive analysis Autonomous systems Why work with our data:

    Global coverage: Our datasets are sourced from a worldwide network, ensuring diversity and inclusiveness. Scalability: We offer scalable solutions that grow with your project. Customization: Datasets can be tailored to fit your specific requirements, whether it’s for ML, object detection, medical imaging, or any other AI application. Enhance model performance: High-quality data that boosts the reliability and accuracy of your models. Versatility: Our datasets are applicable across various domains, from healthcare to autonomous systems. Empower your AI projects with FileMarket’s top-tier Machine Learning (ML) Data, Object Detection Data, Medical Imaging Data, Large Language Model (LLM) Data, and Deep Learning (DL) Data.

  11. f

    Table_2_Applying machine-learning to rapidly analyze large qualitative text...

    • figshare.com
    docx
    Updated Oct 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lauren Towler; Paulina Bondaronek; Trisevgeni Papakonstantinou; Richard Amlôt; Tim Chadborn; Ben Ainsworth; Lucy Yardley (2023). Table_2_Applying machine-learning to rapidly analyze large qualitative text datasets to inform the COVID-19 pandemic response: comparing human and machine-assisted topic analysis techniques.DOCX [Dataset]. http://doi.org/10.3389/fpubh.2023.1268223.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Lauren Towler; Paulina Bondaronek; Trisevgeni Papakonstantinou; Richard Amlôt; Tim Chadborn; Ben Ainsworth; Lucy Yardley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionMachine-assisted topic analysis (MATA) uses artificial intelligence methods to help qualitative researchers analyze large datasets. This is useful for researchers to rapidly update healthcare interventions during changing healthcare contexts, such as a pandemic. We examined the potential to support healthcare interventions by comparing MATA with “human-only” thematic analysis techniques on the same dataset (1,472 user responses from a COVID-19 behavioral intervention).MethodsIn MATA, an unsupervised topic-modeling approach identified latent topics in the text, from which researchers identified broad themes. In human-only codebook analysis, researchers developed an initial codebook based on previous research that was applied to the dataset by the team, who met regularly to discuss and refine the codes. Formal triangulation using a “convergence coding matrix” compared findings between methods, categorizing them as “agreement”, “complementary”, “dissonant”, or “silent”.ResultsHuman analysis took much longer than MATA (147.5 vs. 40 h). Both methods identified key themes about what users found helpful and unhelpful. Formal triangulation showed both sets of findings were highly similar. The formal triangulation showed high similarity between the findings. All MATA codes were classified as in agreement or complementary to the human themes. When findings differed slightly, this was due to human researcher interpretations or nuance from human-only analysis.DiscussionResults produced by MATA were similar to human-only thematic analysis, with substantial time savings. For simple analyses that do not require an in-depth or subtle understanding of the data, MATA is a useful tool that can support qualitative researchers to interpret and analyze large datasets quickly. This approach can support intervention development and implementation, such as enabling rapid optimization during public health emergencies.

  12. p

    A multimodal dental dataset facilitating machine learning research and...

    • physionet.org
    Updated Oct 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenjing Liu; Yunyou Huang; Suqin Tang (2024). A multimodal dental dataset facilitating machine learning research and clinic services [Dataset]. http://doi.org/10.13026/h1tt-fc69
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Wenjing Liu; Yunyou Huang; Suqin Tang
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Oral diseases affect nearly 3.5 billion people, with the majority residing in low- and middle-income countries. Due to limited healthcare resources, many individuals are unable to access proper oral healthcare services. Image-based machine learning technology is one of the most promising approaches to improving oral healthcare services and reducing patient costs. Openly accessible datasets play a crucial role in facilitating the development of machine learning techniques. However, existing dental datasets have limitations such as a scarcity of Cone Beam Computed Tomography (CBCT) data, lack of matched multi-modal data, and insufficient complexity and diversity of the data. This project addresses these challenges by providing a dataset that includes 329 CBCT images from 169 patients, multi-modal data with matching modalities, and images representing various oral health conditions.

  13. i

    Medical Imaging Datasets for Multimodal Disease Detection and Diagnosis...

    • ieee-dataport.org
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nur Rusyidah Azri (2025). Medical Imaging Datasets for Multimodal Disease Detection and Diagnosis Research [Dataset]. http://doi.org/10.21227/kebq-7h82
    Explore at:
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    IEEE Dataport
    Authors
    Nur Rusyidah Azri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This collection of medical image datasets is a valuable resource for anyone involved in medical imaging and disease research. It includes a variety of images from different medical fields, all designed to support research in diagnosis and treatment. The datasets cover chest CT-scans, lung radiography, brain MRI, retinal imaging, and gastrointestinal tract imaging. The chest CT-scan dataset includes 867 images of normal lungs and three types of lung cancer—adenocarcinoma, large cell carcinoma, and squamous cell carcinoma—providing essential data for understanding lung cancer. The lung radiography dataset offers 1,198 X-ray images that include normal lungs and conditions like COVID-19, lung opacity, and viral pneumonia, making it useful for comparing COVID-19 with other respiratory diseases. The brain MRI dataset contains 253 scans of both normal brains and those with tumors, ideal for studying brain tumor detection. The retinal imaging dataset features 2,757 images covering normal retinas and seven types of retinal conditions, such as diabetic retinopathy and glaucoma, offering a comprehensive resource for eye disease research. Lastly, the gastrointestinal tract dataset contains 2,000 images of normal and diseased colons, including conditions like esophagitis, polyps, and ulcerative colitis, important for studying gastrointestinal health. Each dataset is carefully curated and resampled to ensure accuracy and prevent data leakage, making them reliable tools for research and machine learning.

  14. m

    MID: Medicines Information Dataset

    • data.mendeley.com
    Updated Nov 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hezam Gawbah (2024). MID: Medicines Information Dataset [Dataset]. http://doi.org/10.17632/2vk5khfn6v.3
    Explore at:
    Dataset updated
    Nov 26, 2024
    Authors
    Hezam Gawbah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Numerous studies on medicines are conducted day by day. To address shortcomings of medicines information generation, prediction, and classification models, the authors introduce a large medicines information dataset of textual data. For this motivation, the authors named the medicines information dataset ‘MID’ .

    • Value of the data - The dataset comprises extensive medicines information, featuring over 192k rows distributed across 22 diverse therapeutic classes. - The dataset can be beneficial to the classification of therapeutic classes and robust for the prediction and generation of medicines information such as indications or interactions for enhancing efficiencies in clinical trial management, facilitating a detailed analysis of the risk affecting participants in clinical trials. - The dataset includes the name, link, contains, introduction, uses, benefits, side effects, how to use, how the drug works, quick tips, chemical class, habit forming, therapeutic class, action class, safety advice to alcohol, safety advice to pregnancy, safety advice to breastfeeding, safety advice to driving, safety advice to kidney, and safety advice to the liver. - The dataset is big data, making it a suitable corpus for implementing both classical as well as deep learning models. - The dataset provides a useful resource for medical researchers, healthcare professionals, drug manufacturers, data scientists, and enthusiasts interested in exploring the world of medicines and healthcare products preclinical for drug development and design.

    • MID.xlsx provides the raw data, including medicine information. The data collected to ensure an acceleration and save experimental efforts for medicines through help in predicting or generating or classifying of medicine information preclinically.

    • Therapeutic_class_counts.xlsx is summarize distribution of medicines per therapeutic class.

  15. Reproducibility in Machine Learning and Healthcare Paper Annotation Datasets...

    • zenodo.org
    csv
    Updated Mar 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew B.A. McDermott; Matthew B.A. McDermott; Wang; Marinsek; Ranganath; Foschini; Ghassemi; Ghassemi; Wang; Marinsek; Ranganath; Foschini (2021). Reproducibility in Machine Learning and Healthcare Paper Annotation Datasets [Dataset]. http://doi.org/10.5281/zenodo.4574378
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 3, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matthew B.A. McDermott; Matthew B.A. McDermott; Wang; Marinsek; Ranganath; Foschini; Ghassemi; Ghassemi; Wang; Marinsek; Ranganath; Foschini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Paper Annotations for an extended version of https://arxiv.org/abs/1907.01463

    Artificial intelligence (AI) and machine learning (ML) for healthcare (ML4H)must be reproducible for reliable clinical use. We evaluate over 200 ML4Hresearch papers and find that health compares poorly to other application areas for AI and ML, particularly concerning data and code accessibility, and propose recommendations for reproducible research

  16. m

    Data from: Prediction model in medical science and health care

    • data.mendeley.com
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Horas Veryady Purba (2023). Prediction model in medical science and health care [Dataset]. http://doi.org/10.17632/8cnj532383.1
    Explore at:
    Dataset updated
    May 16, 2023
    Authors
    Jan Horas Veryady Purba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This research relates to technology in health, to assist in making decisions and to extract useful knowledge, these instigating specialists apply all special developments such as predictive analytics, learning algorithms, machine learning and predictive analytics. In medical science, to determine the risk of developing a disease, a prediction model is used so that it can enable early treatment or prevention of the disease.

  17. D

    Deep Learning in Healthcare Market Research Report 2032

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Deep Learning in Healthcare Market Research Report 2032 [Dataset]. https://dataintelo.com/report/global-deep-learning-in-healthcare-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Deep Learning in Healthcare Market Outlook



    The global deep learning in healthcare market size was valued at approximately $2.8 billion in 2023 and is projected to reach around $13.7 billion by 2032, growing at a robust compound annual growth rate (CAGR) of 19.4% during the forecast period. The rapid integration of artificial intelligence (AI) and machine learning technologies in healthcare systems, alongside advancements in computational power and data availability, are significant growth drivers for the market.



    One of the primary growth factors for the deep learning in healthcare market is the increasing demand for efficient and accurate diagnostic tools. Deep learning algorithms have demonstrated superior performance in interpreting medical images, detecting anomalies, and predicting outcomes compared to traditional methods. This has led to widespread adoption in medical imaging, significantly enhancing diagnostic precision and reducing the burden on healthcare professionals. The ever-increasing volume of healthcare data, coupled with the need for quick and accurate decision-making, further propels the market forward. By leveraging large datasets, deep learning can achieve a level of precision and speed unattainable by human capabilities alone.



    Another significant driver is the growing emphasis on personalized medicine. Deep learning enables the analysis of complex biological data, aiding in the development of personalized treatment plans tailored to individual patient profiles. This shift towards precision medicine is transforming patient care, allowing for more effective treatment protocols and better patient outcomes. The pharmaceutical industry, in particular, is investing heavily in deep learning technologies to expedite drug discovery and development processes, thereby reducing time-to-market and costs associated with bringing new drugs to consumers.



    The adoption of electronic health records (EHRs) and the integration of AI in healthcare administration are also crucial growth factors. Deep learning algorithms can process vast amounts of patient data stored in EHRs to identify patterns and predict disease outbreaks, optimize resource allocation, and enhance patient management. The demand for streamlined operations and improved patient care is driving healthcare providers to incorporate these advanced technologies. Furthermore, the ongoing advancements in computational power and the availability of high-quality healthcare datasets are crucial enablers for the application of deep learning technologies in various healthcare domains.



    Computer Vision in Healthcare is revolutionizing the way medical professionals approach diagnostics and treatment planning. By leveraging advanced image processing algorithms, computer vision can analyze medical images with remarkable accuracy, identifying patterns and anomalies that might be missed by the human eye. This technology is not only enhancing the precision of medical imaging but also enabling the development of automated systems that assist radiologists in interpreting complex datasets. The integration of computer vision in healthcare is streamlining workflows, reducing diagnostic errors, and ultimately improving patient outcomes. As the technology continues to evolve, its applications are expanding beyond imaging to include areas such as surgery, pathology, and patient monitoring, offering a comprehensive toolset for modern healthcare delivery.



    On the regional front, North America holds the largest share of the deep learning in healthcare market, driven by substantial investments in AI technology, well-established healthcare infrastructure, and supportive government initiatives. The region's focus on technological innovation and its robust research ecosystem are key factors contributing to market growth. Moreover, the presence of leading AI and healthcare companies in North America accelerates the adoption of deep learning technologies. Europe and Asia Pacific are also witnessing significant growth, with the latter expected to exhibit the highest CAGR during the forecast period due to increasing healthcare digitization and rising investments in AI-driven healthcare solutions.



    Component Analysis



    The deep learning in healthcare market is segmented by component into software, hardware, and services. The software segment is anticipated to dominate the market owing to continuous advancements in AI algorithms and the development of sophisticated software solutions tailored for healthcar

  18. Medical Students Dataset

    • kaggle.com
    Updated Jul 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salem S. (2023). Medical Students Dataset [Dataset]. https://www.kaggle.com/datasets/slmsshk/medical-students-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Salem S.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Medical Student Dataset

    The Medical Student Dataset is a simulated dataset containing 100,000 rows and 12 columns. The dataset is designed to mimic real-world data commonly encountered in medical education and research. It includes various preprocessing issues commonly observed in data, such as missing values, duplicates, and inconsistencies.

    Dataset Description

    The dataset consists of the following columns:

    1. StudentID: Unique identifier for each medical student.
    2. Gender: Gender of the student (e.g., Male, Female).
    3. Age: Age of the student in years.
    4. Ethnicity: Ethnicity of the student.
    5. Year: Academic year of the student.
    6. University: Name of the university where the student is enrolled.
    7. GPA: Grade Point Average of the student.
    8. MCAT Score: Medical College Admission Test (MCAT) score of the student.
    9. Clinical Experience: Indicator of whether the student has previous clinical experience (Yes/No).
    10. Research Experience: Indicator of whether the student has previous research experience (Yes/No).
    11. Publication Count: Number of publications attributed to the student.
    12. Exam Score: Performance score on a standardized medical examination.

    Data Preprocessing Issues

    The dataset has been intentionally created to include various preprocessing issues, such as:

    • Missing values: Some columns may have missing values represented as NaN.
    • Duplicates: Duplicate records may exist in the dataset, representing identical student entries.
    • Inconsistencies: The dataset may contain inconsistent or erroneous values in certain columns.

    Data Usage

    This dataset can be used for various purposes, including data cleaning and preprocessing exercises, exploring data analysis techniques, and evaluating machine learning algorithms. It provides an opportunity to practice handling real-world data challenges often encountered in the field of medical education and research.

  19. M

    Machine Learning in Medicine Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Machine Learning in Medicine Report [Dataset]. https://www.archivemarketresearch.com/reports/machine-learning-in-medicine-57296
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    AMA Research & Media LLP
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Machine Learning in Medicine market is experiencing robust growth, projected to reach $[Estimated 2025 Market Size in Millions] in 2025 and expand at a Compound Annual Growth Rate (CAGR) of 5% from 2025 to 2033. This significant expansion is fueled by several key drivers. The increasing availability of large, high-quality medical datasets, coupled with advancements in computing power and algorithm development, is enabling the creation of sophisticated machine learning models capable of enhancing diagnostic accuracy, accelerating drug discovery, and personalizing patient care. Furthermore, the rising prevalence of chronic diseases and the increasing demand for efficient and cost-effective healthcare solutions are bolstering the adoption of machine learning across various medical applications. Key trends within the market include the growing integration of AI-powered diagnostic tools, the rise of federated learning for protecting patient privacy while leveraging diverse datasets, and the expansion of machine learning applications into areas like personalized medicine and preventive healthcare. While data privacy and regulatory concerns pose challenges, the transformative potential of machine learning in improving healthcare outcomes is driving significant investment and innovation in this rapidly evolving market. The market segmentation reveals a strong focus on supervised learning techniques due to their effectiveness in tackling specific medical problems with labeled data. However, unsupervised learning and reinforcement learning are gaining traction, offering the potential for identifying novel patterns and optimizing treatment strategies, respectively. Application-wise, diagnosis and drug discovery currently lead the market, although other applications, including predictive modeling for risk assessment and personalized treatment plans, are showing considerable promise. Leading companies like Google, BioBeats, Jvion, and others are actively shaping the market landscape through their advanced technologies and strategic partnerships. Geographical distribution shows strong growth in North America and Europe, driven by advanced healthcare infrastructure and regulatory frameworks. However, emerging markets in Asia-Pacific are rapidly gaining ground due to increasing healthcare investment and a rising prevalence of diseases. The forecast period suggests continued expansion, particularly driven by the ongoing improvements in AI algorithms and the wider adoption across healthcare settings. We anticipate substantial growth across all segments driven by technological breakthroughs and a growing awareness of the clinical benefits.

  20. i

    Uganda HealthCare Facilities Features and Services

    • ieee-dataport.org
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rashid Kisejjere (2024). Uganda HealthCare Facilities Features and Services [Dataset]. http://doi.org/10.21227/q51p-v543
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    IEEE Dataport
    Authors
    Rashid Kisejjere
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Uganda
    Description

    All the healthcare facilites in this dataset were collected from the MOH 2018 list of Uganda healthcare facilites (https://library.health.go.ug/sites/default/files/resources/National%20Health%20Facility%20MasterLlist%202017.pdf) Additional features were scraped using the Google Maps API and additionally from some of the websites of the healthcare facilities themselves. PLEASE NOTE The data collected isn't verified at and might be completely outdated but none the less for research purposes this dataset can help in understanding recommendation systems. This dataset was funded by Pollicy, more about Pollicy can be found here https://pollicy.org/

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Srestha Jain (2025). Medical Diagnosis Dataset [Dataset]. https://www.kaggle.com/datasets/sresthajain/medical-diagnosis-dataset/discussion
Organization logo

Medical Diagnosis Dataset

Medical Diagnosis Dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Srestha Jain
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Context: This synthetic healthcare dataset has been created to serve as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practice, develop, and showcase their data manipulation and analysis skills in the context of the healthcare industry.

Inspiration: The inspiration behind this dataset is rooted in the need for practical and diverse healthcare data for educational and research purposes. Healthcare data is often sensitive and subject to privacy regulations, making it challenging to access for learning and experimentation. To address this gap, I have leveraged Python's Faker library to generate a dataset that mirrors the structure and attributes commonly found in healthcare records. By providing this synthetic data, I hope to foster innovation, learning, and knowledge sharing in the healthcare analytics domain.

Dataset Information: Each column provides specific information about the patient, their admission, and the healthcare services provided, making this dataset suitable for various data analysis and modeling tasks in the healthcare domain. Here's a brief explanation of each column in the dataset -

Name: This column represents the name of the patient associated with the healthcare record. Age: The age of the patient at the time of admission, expressed in years. Gender: Indicates the gender of the patient, either "Male" or "Female." Blood Type: The patient's blood type, which can be one of the common blood types (e.g., "A+", "O-", etc.). Medical Condition: This column specifies the primary medical condition or diagnosis associated with the patient, such as "Diabetes," "Hypertension," "Asthma," and more. Date of Admission: The date on which the patient was admitted to the healthcare facility. Doctor: The name of the doctor responsible for the patient's care during their admission. Hospital: Identifies the healthcare facility or hospital where the patient was admitted. Insurance Provider: This column indicates the patient's insurance provider, which can be one of several options, including "Aetna," "Blue Cross," "Cigna," "UnitedHealthcare," and "Medicare." Billing Amount: The amount of money billed for the patient's healthcare services during their admission. This is expressed as a floating-point number. Room Number: The room number where the patient was accommodated during their admission. Admission Type: Specifies the type of admission, which can be "Emergency," "Elective," or "Urgent," reflecting the circumstances of the admission. Discharge Date: The date on which the patient was discharged from the healthcare facility, based on the admission date and a random number of days within a realistic range. Medication: Identifies a medication prescribed or administered to the patient during their admission. Examples include "Aspirin," "Ibuprofen," "Penicillin," "Paracetamol," and "Lipitor." Test Results: Describes the results of a medical test conducted during the patient's admission. Possible values include "Normal," "Abnormal," or "Inconclusive," indicating the outcome of the test. Usage Scenarios: This dataset can be utilized for a wide range of purposes, including:

Developing and testing healthcare predictive models. Practicing data cleaning, transformation, and analysis techniques. Creating data visualizations to gain insights into healthcare trends. Learning and teaching data science and machine learning concepts in a healthcare context. You can treat it as a Multi-Class Classification Problem and solve it for Test Results which contains 3 categories(Normal, Abnormal, and Inconclusive). Acknowledgments: I acknowledge the importance of healthcare data privacy and security and emphasize that this dataset is entirely synthetic. It does not contain any real patient information or violate any privacy regulations. I hope that this dataset contributes to the advancement of data science and healthcare analytics and inspires new ideas. Feel free to explore, analyze, and share your findings with the Kaggle community.