24 datasets found
  1. m

    Disease and symptoms dataset 2023

    • data.mendeley.com
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1
    Explore at:
    Dataset updated
    Mar 3, 2025
    Authors
    Bran Stark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.

  2. h

    Diseases_Dataset

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kamruzzaman Asif (2025). Diseases_Dataset [Dataset]. https://huggingface.co/datasets/kamruzzaman-asif/Diseases_Dataset
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Kamruzzaman Asif
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🩺 Diseases Dataset

    A consolidated medical dataset combining disease names, symptoms, and treatments collected from multiple public datasets across Hugging Face and Kaggle. This dataset can be used for building disease prediction, symptom clustering, and medical assistant models.

      📦 Dataset Summary
    

    Field Type Description

    Disease string Name of the disease or condition

    Symptoms string List of symptoms or Description of symptomps

    Treatments string (Optional)… See the full description on the dataset page: https://huggingface.co/datasets/kamruzzaman-asif/Diseases_Dataset.

  3. o

    Synthetic Respiratory Symptoms Dataset

    • opendatabay.com
    .csv
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Respiratory Symptoms Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/38f90264-2312-43c5-9666-e1b18e7ce7e2
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    The Synthetic Respiratory Symptoms Dataset is created for educational and research use to analyze associations between respiratory symptoms, underlying diseases, treatment types, and the severity of the condition. This synthetic dataset helps simulate real-world scenarios involving respiratory illnesses while ensuring privacy compliance through full anonymization.

    Dataset Features

    • Symptoms: Descriptions of reported respiratory symptoms (e.g., yellow cough, chest congestion).
    • Age: Age of the individual (in years).
    • Sex: Biological sex of the individual (Male/Female).
    • Disease: Diagnosed respiratory condition (e.g., Mesothelioma, bronchitis).
    • Treatment: Medical intervention applied (e.g., Chemotherapy, Antibiotic, or none).
    • Nature: Severity of the case (e.g., low, medium, high).

    Distribution

    https://storage.googleapis.com/opendatabay_public/38f90264-2312-43c5-9666-e1b18e7ce7e2/4fb39ac546e8_disease_count.png" alt="Synthetic Respiratory Symptoms Dataset Distribution by Disease Count">

    https://storage.googleapis.com/opendatabay_public/38f90264-2312-43c5-9666-e1b18e7ce7e2/9b0990ccf2df_age_by_nature.png" alt="Synthetic Respiratory Symptoms Dataset Distribution by Nature">

    Usage

    This dataset can be used for the following applications:

    • Respiratory Disease Research: Study the correlation between symptoms, severity, and diagnosed diseases in synthetic patients.
    • Predictive Modelling: Train machine learning models to predict disease types or severity based on symptoms and demographic data.
    • Clinical Insights: Analyse how different symptoms align with specific respiratory conditions and the common treatments used.
    • Educational Purposes: Serve as a teaching tool for learners and educators in public health, respiratory medicine, and medical data science.

    Coverage

    This synthetic dataset is fully anonymised and complies with modern data privacy standards. It incorporates a diverse array of symptom profiles, diseases, and treatments relevant to respiratory illnesses, enabling a broad range of analytical and educational applications.

    License

    CC0 (Public Domain)

    Who Can Use It

    • Medical Researchers: To investigate symptom-disease-treatment relationships in respiratory medicine.
    • Pulmonologists and Healthcare Professionals: To explore treatment trends and symptom patterns in respiratory conditions.
    • Data Scientists and Machine Learning Engineers: To build and evaluate predictive models for diagnosis or severity classification.
    • Educators and Students: As a practice dataset for courses in health data analytics and medical informatics.
  4. H

    Healthcare Data Industry Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Healthcare Data Industry Report [Dataset]. https://www.datainsightsmarket.com/reports/healthcare-data-industry-8463
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Healthcare Data Industry market was valued at USD XX Million in 2023 and is projected to reach USD XXX Million by 2032, with an expected CAGR of 16.20% during the forecast period. Data in healthcare signifies all the information that is created or gathered in the healthcare industry. These include patient records, electronic health records, genomic data, health insurance claims, medical images, and all other clinical trial data. All this stands at the back of modern healthcare and could support many critical applications. First and foremost, health data improves patient care. Pattern analysis for patient records is simplified by health care providers in ensuring accurate disease diagnosis and application of personalized treatment plans. Medical field images, such as X-rays and MRIs, are helpful in finding abnormalities and useful in surgical methods. Genomic data insights comprise susceptibility from a genetic view point, which therefore enables coming up with a customised treatment plan for diseases such as cancer. Then, the health information data is very crucial in conducting research and developing new medical knowledge. Researchers analyze epidemiology of diseases by adopting massive datasets, manufacture new drugs and treatments, and analyze effectiveness of health care programs by such datasets. For instance, the medical trials dataset helps in the development of evidence about the safety and efficiency of new treatment options. The health insurance claims dataset can help assess healthcare utilization patterns so as to identify areas in need of improvement. Therefore, health care data also enables administrative and operational functions of health care organizations. EHRs allow easy maintenance of the patient data, enable sound communications among healthcare providers, and minimize errors. Apart from this, analytics on health insurance claims are performed to make possible billing and reimbursement services to ensure the payment of the healthcare provider in the right amount of their rendered service. Further, analytics data could be used for optimization of resource utilization, in identifying potential cost savings, and making health care organizations efficient as a whole. Healthcare information is one of those precious assets that propel innovation, promote better patient outcomes, and support the coherent functioning of the healthcare system. Therefore, improving the quality and efficiency in which care delivery is offered can be achieved through the effective use of healthcare information by healthcare providers, researchers, and administrators for a better state of health among individuals and communities. Recent developments include: March 2022: Microsoft launched Azure Health Data Services in the United States. It is a platform as a service (PAAS) offering designed exclusively to support protected health information (PHI) in the cloud., March 2022: The government of Thailand launched a big data portal for healthcare facilities. The National Reforms Committee on Public Health recently joined hands with 12 government agencies to improve the quality of healthcare services by implementing digital technologies.. Key drivers for this market are: Increase in Demand for Analytics Solutions for Population Health Management, Rise in Need for Business Intelligence to Optimize Health Administration and Strategy; Surge in Adoption of Big Data in the Healthcare Industry. Potential restraints include: Security Concerns Related to Sensitive Patients Medical Data, High Cost of Implementation and Deployment. Notable trends are: Cloud Segment is Expected to Register a High Growth Rate Over the Forecast Period.

  5. P

    MedQuAD Dataset

    • paperswithcode.com
    Updated Feb 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asma Ben Abacha; Dina Demner-Fushman (2024). MedQuAD Dataset [Dataset]. https://paperswithcode.com/dataset/medquad
    Explore at:
    Dataset updated
    Feb 16, 2024
    Authors
    Asma Ben Abacha; Dina Demner-Fushman
    Description

    MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.

  6. Covid19_Dataset

    • kaggle.com
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somin Ramchiary (2024). Covid19_Dataset [Dataset]. https://www.kaggle.com/datasets/somin69/covid19-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Somin Ramchiary
    Description

    The dataset contains 199999 Rows and 21 Variables of various medical Description & health indication.

    USMER (Medical Unit Code): An identifier for the medical unit where the patient was treated or admitted. MEDICAL_UNIT: The name or code of the medical unit where the patient received treatment. SEX: The gender of the patient (e.g., Male, Female). PATIENT_TYPE: Indicates whether the patient is an outpatient or hospitalized (e.g., Outpatient, Hospitalized). DATE_DIED: The date of death of the patient, if applicable. This variable may be null for surviving patients. INTUBED: Indicates whether the patient was intubated (i.e., received mechanical ventilation) during treatment (e.g., Yes, No). PNEUMONIA: Indicates whether the patient had pneumonia as a complication of COVID-19 infection (e.g., Yes, No). AGE: The age of the patient at the time of diagnosis or admission. PREGNANT: Indicates whether the patient was pregnant at the time of diagnosis or admission (e.g., Yes, No). DIABETES: Indicates whether the patient had diabetes as a pre-existing condition (e.g., Yes, No). COPD (Chronic Obstructive Pulmonary Disease): Indicates whether the patient had COPD as a pre-existing condition (e.g., Yes, No). ASTHMA: Indicates whether the patient had asthma as a pre-existing condition (e.g., Yes, No). INMSUPR (Immunosuppression): Indicates whether the patient had immunosuppression (weakened immune system) as a pre-existing condition (e.g., Yes, No). HIPERTENSION (Hypertension): Indicates whether the patient had hypertension (high blood pressure) as a pre-existing condition (e.g., Yes, No). OTHER_DISEASE: Indicates whether the patient had other pre-existing diseases or medical conditions not specifically listed (e.g., Yes, No). CARDIOVASCULAR: Indicates whether the patient had cardiovascular disease as a pre-existing condition (e.g., Yes, No). OBESITY: Indicates whether the patient had obesity as a pre-existing condition (e.g., Yes, No). RENAL_CHRONIC (Chronic Renal Insufficiency): Indicates whether the patient had chronic renal insufficiency (chronic kidney disease) as a pre-existing condition (e.g., Yes, No). TOBACCO: Indicates whether the patient was a tobacco user (e.g., Yes, No). CLASIFFICATION_FINAL (Final Classification): The final classification of the patient's COVID-19 case based on severity or outcome (e.g., Mild, Severe, Deceased). ICU (Intensive Care Unit): Indicates whether the patient was admitted to the intensive care unit (ICU) during treatment (e.g., Yes, No).

  7. o

    Synthetic Dry Eye Disease Patient Records

    • opendatabay.com
    .csv
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Dry Eye Disease Patient Records [Dataset]. https://www.opendatabay.com/data/synthetic/f4e9ad52-5d13-4d2e-ac19-207a5b71522e
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 15, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    The Dry Eye Disease Patient Records (Synthetic) is designed for educational and research purposes to analyze patterns in sleep behavior, stress levels, lifestyle factors, and their potential links to dry eye disease. It provides anonymized, synthetic data on various health conditions and behavioral habits.

    Dataset Features

    • Gender: Gender of the individual (Male/Female).
    • Age: Age of the individual.
    • Sleep Duration: Average sleep duration in hours.
    • Sleep Quality: Subjective assessment of sleep quality (scale-based).
    • Stress Level: Measured stress level (scale-based).
    • Heart Rate: Resting heart rate (bpm).
    • Daily Steps: Number of steps taken per day.
    • Physical Activity: Minutes of physical activity per day.
    • Height & Weight: Individual’s height (cm) and weight (kg).
    • Sleep Disorder: Presence of a diagnosed sleep disorder (Yes/No).
    • Wake Up During Night: Frequency of waking up during the night (Yes/No).
    • Feel Sleepy During Day: Self-reported daytime sleepiness (Yes/No).
    • Caffeine Consumption: Frequency of caffeine intake (Yes/No).
    • Alcohol Consumption: Frequency of alcohol intake (Yes/No).
    • Smoking: Smoking habits (Yes/No).
    • Medical Issue: Presence of any medical conditions (Yes/No).
    • Ongoing Medication: Use of any ongoing medication (Yes/No).
    • Smart Device Before Bed: Usage of smart devices before sleeping (Yes/No).
    • Average Screen Time: Daily screen time in hours.
    • Blue-Light Filter: Use of blue-light filters on devices (Yes/No).
    • Eye Discomfort & Strain: Presence of discomfort and eye strain (Yes/No).
    • Redness in Eye: Occurrence of eye redness (Yes/No).
    • Itchiness/Irritation in Eye: Symptoms of eye itchiness or irritation (Yes/No).
    • Dry Eye Disease: Diagnosis of Dry Eye Disease (Yes/No).

    Distribution

    https://storage.googleapis.com/opendatabay_public/f4e9ad52-5d13-4d2e-ac19-207a5b71522e/2e2e949519d7_eye.png" alt="Dry Eye Disease Patient Records Synthetic Data">

    Usage

    This dataset can be used for the following applications:

    • Healthcare Analytics: Identify patterns between lifestyle factors and dry eye disease.
    • Predictive Modeling: Develop machine learning models to predict eye health risks.
    • Clinical Research: Investigate associations between screen time, sleep habits, and eye conditions.
    • Educational Purposes: Provide a dataset for students in medical, data science, and public health fields to analyze real-world health trends.

    Coverage

    This synthetic dataset is fully anonymized and complies with data privacy standards. It includes a variety of demographic and lifestyle factors to support a broad range of research and analysis.

    License

    CC0 (Public Domain)

    Who Can Use It

    • Healthcare Researchers: To explore correlations between lifestyle habits and dry eye disease.
    • Clinicians and Medical Practitioners: To analyze factors contributing to eye health issues.
    • Data Scientists and Machine Learning Practitioners: To develop predictive models for eye-related conditions.
    • Educators and Students: As a resource for studying health analytics and medical research.
  8. Data from: Discovering health topics in social media using topic models

    • figshare.com
    txt
    Updated Jan 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Paul (2016). Discovering health topics in social media using topic models [Dataset]. http://doi.org/10.6084/m9.figshare.1007712.v5
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 18, 2016
    Dataset provided by
    figshare
    Authors
    Michael Paul
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data set for M. Paul and M. Dredze, "Discovering health topics in social media using topic models". This includes the set of tweets used in the experiments, and the words associatedwith ailments discovered by the Ailment Topic Aspect Model (ATAM). Contact: Michael Paul (mpaul39@gmail.com)Released June 26, 2014 atam.topwords.csv- The most probable words for each ailment. The first column is the ailment ID.The second column indicates if it is a general (G), symptom (S), or treatment (T) word.The third column is the word. The fourth column is the probability. Words are shownin descending order of probability until 90% of the probability mass is accumulatedfor each ailment or until probabilities drop below 1.0e-4. atam.tweets.x.csv (for x=[0-9])- The tweets used in the study. The first column is the tweet ID. The second columnindicates the ailment ID for the ailment sampled for that tweet.(See the atam.topwords.csv file for the most probable words associated with each ailment ID.)Full tweets can be downloaded using the tweet ID through the Twitter API(https://dev.twitter.com/docs/api/1.1). keywords.txt - The set of 269 health-related keywords used in our keyword-filtered Twitter stream as part of our dataset. keywords_x.txt (for x={diseases,symptoms,treatments})- The set of approximately 20,000 keyphrases crawled from wrongdiagnosis.com describingthe names of diseases, symptoms, and treatments and medications. These keyword lists areused to create input for ATAM (which requires phrases to be labeled as symptoms or treatments),and also to initially filter our dataset when constructing our health classifiers.

  9. o

    Synthetic Cardiovascular Disease Prediction Dataset

    • opendatabay.com
    .csv
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Cardiovascular Disease Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/fd585f02-456f-4aa1-b7d1-4ea4499c7824
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Apr 26, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    This Synthetic Cardiovascular Disease Dataset is created for educational and research purposes in cardiology, public health, and data science. It provides demographic, medical, and diagnostic details related to cardiovascular diseases, enabling analysis of risk factors, disease progression, and treatment outcomes. The dataset can be utilized for building predictive models and exploring disease management strategies.

    Dataset Features

    • Age: Age of the individual in years.
    • Gender: Biological sex of the individual (Male/Female).
    • Smoking: Current smoking status (Yes/No).
    • Hx Smoking: History of smoking (Yes/No).
    • Hx Hypertension: History of hypertension (Yes/No).
    • Cholesterol Level: Cholesterol level in mg/dL (e.g., Normal, Borderline, High).
    • Blood Pressure: Blood pressure reading in mmHg (e.g., Normal, Elevated, Hypertension Stage 1).
    • Diabetes: Presence of diabetes (Yes/No).
    • BMI: Body Mass Index, a measure of body fat based on height and weight.
    • Physical Activity: Physical activity level (e.g., Sedentary, Moderate, Active).
    • Risk Score: Cardiovascular risk score (Low, Moderate, High).
    • Angina: Presence of chest pain or discomfort (Yes/No).
    • Heart Attack: Indicates if the patient has suffered a myocardial infarction (Yes/No).
    • Heart Failure: Indicates the presence of heart failure (Yes/No).
    • Treatment Type: Type of treatment administered (e.g., Medication, Surgery, Lifestyle Changes).
    • Response: Patient's response to treatment (e.g., Excellent, Partial, Poor).
    • Recurred: Indicates whether cardiovascular issues have recurred (Yes/No).

    Distribution

    https://storage.googleapis.com/opendatabay_public/fd585f02-456f-4aa1-b7d1-4ea4499c7824/73db4d04d538_cardio.png" alt="Synthetic Cardiovascular Disease Prediction Data Distribution">

    Usage

    This dataset is suited for the following applications:

    • Risk Prediction: Develop predictive models to identify patients at risk of cardiovascular complications or recurrence.
    • Treatment Outcome Analysis: Evaluate the effectiveness of treatments based on response and recurrence data.
    • Disease Progression Modeling: Study the progression of cardiovascular diseases using features like cholesterol, hypertension, and risk scores.
    • Public Health Research: Analyze demographic and clinical patterns to inform cardiovascular disease management strategies.
    • Predictive Modeling: Build machine learning models to predict disease outcomes based on demographic and clinical features. ### Coverage This synthetic dataset is anonymized and adheres to data privacy standards. It is designed for research and learning purposes, with diverse cases representing varying levels of cardiovascular risk, treatment responses, and clinical conditions.

    License

    CC0 (Public Domain)

    Who Can Use It

    • Data Science Practitioners: For practicing data preprocessing, classification, and regression tasks related to cardiovascular diseases.
    • Healthcare Professionals and Researchers: To explore relationships between clinical metrics and cardiovascular disease outcomes.
    • Public Health Analysts: To understand trends and design strategies for managing cardiovascular diseases.
    • Policy Makers and Regulators: For data-driven decision-making in cardiovascular disease prevention and treatment policies.
  10. Lung Disease

    • kaggle.com
    Updated Mar 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatemeh Mehrparvar (2024). Lung Disease [Dataset]. https://www.kaggle.com/datasets/fatemehmehrparvar/lung-disease
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 9, 2024
    Dataset provided by
    Kaggle
    Authors
    Fatemeh Mehrparvar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    Lung disease encompasses a wide range of conditions that affect the lungs and their ability to function effectively. These conditions can be caused by various factors, including infections, environmental factors, genetic predispositions, and lifestyle choices. Lung diseases can result in symptoms such as coughing, shortness of breath, chest pain, and reduced lung function. Detecting and diagnosing lung diseases is crucial for patient care, as they can have a significant impact on an individual's health and quality of life.

    Global Impact:

    Lung diseases have a substantial global impact. According to the World Health Organization (WHO), respiratory diseases, including lung diseases, are responsible for a significant portion of global mortality. In 2016, respiratory diseases were the fourth leading cause of death worldwide, with an estimated 3.0 million deaths attributed to them. Conditions like pneumonia, chronic obstructive pulmonary disease (COPD), and lung cancer contribute to this high mortality rate. Early detection and accurate diagnosis are essential for reducing the burden of lung diseases on public health.

    The Need to Detect Lung Diseases:

    Detecting lung diseases is vital for several reasons:

    Early Intervention: Early detection allows for timely medical intervention and treatment, increasing the chances of successful management and recovery.

    Disease Classification: Differentiating between various lung diseases, such as pneumonia, tuberculosis, and lung cancer, is crucial for appropriate treatment planning.

    Public Health: Effective disease detection and management can have a positive impact on public health by reducing the overall disease burden.

    Lung X-Ray Image Dataset:

    The "Lung X-Ray Image Dataset" is a comprehensive collection of X-ray images that plays a pivotal role in the detection and diagnosis of lung diseases. This dataset contains a large number of high-quality X-ray images, meticulously collected from diverse sources, including hospitals, clinics, and healthcare institutions.

    Dataset Contents:

    Total Number of Images: The dataset comprises a total of 3,475 X-ray images. Classes within the Dataset:

    Normal (1250 Images): These images represent healthy lung conditions, serving as a reference for comparison in diagnostic procedures.

    Lung Opacity (1125 Images): This class includes X-ray images depicting various degrees of lung abnormalities, providing a diverse set of cases for analysis.

    Viral Pneumonia (1100 Images): Images in this category are associated with viral pneumonia cases, contributing to the understanding and identification of this specific lung infection.

    In conclusion, the "Lung X-Ray Image Dataset" plays a crucial role in the healthcare sector by providing a diverse and well-documented collection of X-ray images that support the detection, classification, and understanding of lung diseases. This resource is instrumental in advancing the field of respiratory medicine and improving patient outcomes.

  11. i

    Heart Disease Dataset (Comprehensive)

    • ieee-dataport.org
    Updated Oct 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MANU SIDDHARTHA (2019). Heart Disease Dataset (Comprehensive) [Dataset]. https://ieee-dataport.org/open-access/heart-disease-dataset-comprehensive
    Explore at:
    Dataset updated
    Oct 24, 2019
    Authors
    MANU SIDDHARTHA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset

  12. i

    Medical Imaging Datasets for Multimodal Disease Detection and Diagnosis...

    • ieee-dataport.org
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nur Rusyidah Azri (2024). Medical Imaging Datasets for Multimodal Disease Detection and Diagnosis Research [Dataset]. https://ieee-dataport.org/documents/medical-imaging-datasets-multimodal-disease-detection-and-diagnosis-research
    Explore at:
    Dataset updated
    Aug 10, 2024
    Authors
    Nur Rusyidah Azri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    brain MRI

  13. o

    Synthetic Parkinson's Disease Detection Dataset

    • opendatabay.com
    .csv
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Parkinson's Disease Detection Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/b71f3ac4-9476-4c1f-b8f0-aa3f0277603b
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    This synthetic Parkinson's Disease Detection Dataset is designed for educational and research purposes in the fields of data science, healthcare analytics, and medical research. It contains key clinical and speech features from individuals with Parkinson's Disease, which can be used to build predictive models, analyze disease progression, and assess the impact of motor and speech symptoms. The dataset is ideal for tasks such as classification, regression, and the study of biomarkers for Parkinson’s disease.

    Dataset Features: Index: Row identifier for each record. Age: The age of the patient. Sex: Gender of the patient (Male/Female). Test_time: Duration or time of the test conducted (in minutes). Motor_UPDRS: Motor component score from the Unified Parkinson’s Disease Rating Scale (UPDRS). Total_UPDRS: Total score from the UPDRS, including both motor and non-motor components. Jitter(%): Percentage of frequency variation in speech, a key indicator of Parkinson’s disease. Jitter(Abs): Absolute jitter value, quantifying frequency variation. Jitter:RAP: Jitter measured using the Relative Average Perturbation method. Jitter:PPQ5: Jitter measured using the 5-point Perturbation Quotient. Jitter:DDP: Jitter measured using the Difference of Difference of Polynomials method. Shimmer: Amplitude variation in speech, indicating vocal instability. Shimmer(dB): Amplitude variation in decibels. Shimmer:APQ3: Shimmer measured using the 3-point Amplitude Perturbation Quotient. Shimmer:APQ5: Shimmer measured using the 5-point Amplitude Perturbation Quotient. Shimmer:APQ11: Shimmer measured using the 11-point Amplitude Perturbation Quotient. Shimmer:DDA: Shimmer measured using the Difference of Difference of Amplitudes method. NHR: Noise to Harmonics Ratio, a measure of voice quality and periodicity. HNR: Harmonics to Noise Ratio, reflecting the periodicity of speech sounds. RPDE: Recurrence Period Density Entropy, derived from voice signal analysis. DFA: Detrended Fluctuation Analysis, studying self-similarity in speech signals. PPE: Pitch Period Entropy, quantifying irregularity in pitch periods during speech. Usage This dataset is perfect for various applications related to Parkinson's Disease detection and analysis:

    Disease Prediction: Develop machine learning models to predict the presence and progression of Parkinson’s Disease. Speech Analysis: Study speech features like jitter and shimmer for early diagnosis and monitoring of Parkinson's Disease. Predictive Modeling: Build models using clinical and speech features to assess disease severity. Clinical Research: Investigate the relationship between motor and non-motor symptoms of Parkinson's Disease. Healthcare Analytics: Apply data science techniques to improve the diagnosis and treatment of Parkinson’s Disease. Coverage This synthetic dataset is anonymized and designed for research and learning purposes. It includes a diverse range of speech and clinical data, simulating different stages of Parkinson’s Disease for analysis.

    License CC0 (Public Domain)

    Who Can Use It Data Science Practitioners: For practicing data preprocessing, classification, and regression tasks. Healthcare Analysts and Researchers: To explore relationships between clinical and speech features in Parkinson's Disease. Medical Professionals: To enhance understanding of Parkinson’s Disease symptoms and progressions. Machine Learning Enthusiasts: To experiment with models for predicting Parkinson’s Disease using diverse features. Academic Institutions: For use in educational settings to teach data science applications in healthcare.

  14. o

    WHO Health Indicators - Dataset - Data Catalog Armenia

    • data.opendata.am
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). WHO Health Indicators - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/who-health-indicators
    Explore at:
    Dataset updated
    May 31, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Contains data from World Health Organization's data portal covering the following categories: Mortality and global health estimates, Sustainable development goals, Millennium Development Goals (MDGs), Health systems, Malaria, Tuberculosis, Child health, Infectious diseases, Neglected Tropical Diseases, World Health Statistics, Health financing, Tobacco, Substance use and mental health, Injuries and violence, HIV/AIDS and other STIs, Public health and environment, Nutrition, Urban health, Child mortality, Noncommunicable diseases, Noncommunicable diseases CCS, Negelected tropical diseases, Infrastructure, Essential health technologies, Medical equipment, Demographic and socioeconomic statistics, Health inequality monitor, Health Equity Monitor, Child malnutrition, TOBACCO, Neglected tropical diseases, International Health Regulations (2005) monitoring framework, 0, Insecticide resistance, Oral health, Universal Health Coverage, Global Observatory for eHealth (GOe), RSUD: GOVERNANCE, POLICY AND FINANCING : PREVENTION, RSUD: GOVERNANCE, POLICY AND FINANCING: TREATMENT, RSUD: GOVERNANCE, POLICY AND FINANCING: FINANCING, RSUD: SERVICE ORGANIZATION AND DELIVERY: TREATMENT SECTORS AND PROVIDERS, RSUD: SERVICE ORGANIZATION AND DELIVERY: TREATMENT CAPACITY AND TREATMENT COVERAGE, RSUD: SERVICE ORGANIZATION AND DELIVERY: PHARMACOLOGICAL TREATMENT, RSUD: SERVICE ORGANIZATION AND DELIVERY: SCREENING AND BRIEF INTERVENTIONS, RSUD: SERVICE ORGANIZATION AND DELIVERY: PREVENTION PROGRAMS AND PROVIDERS, RSUD: SERVICE ORGANIZATION AND DELIVERY: SPECIAL PROGRAMMES AND SERVICES, RSUD: HUMAN RESOURCES, RSUD: INFORMATION SYSTEMS, RSUD: YOUTH, FINANCIAL PROTECTION, AMR GLASS, Noncommunicable diseases and mental health, Health workforce, AMR GASP, ICD, SEXUAL AND REPRODUCTIVE HEALTH, Immunization, NLIS, AMC GLASS. For links to individual indicator metadata, see resource descriptions.

  15. Medicaid Coverage Of Cessation Treatments And Barriers To Treatments

    • data.cdc.gov
    • data.virginia.gov
    • +3more
    application/rdfxml +5
    Updated Apr 22, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention National Center for Disease Prevention and Health Promotion Office on Smoking and Health (2014). Medicaid Coverage Of Cessation Treatments And Barriers To Treatments [Dataset]. https://data.cdc.gov/Cessation-Coverage-/Medicaid-Coverage-Of-Cessation-Treatments-And-Barr/ntaa-dtex
    Explore at:
    application/rdfxml, json, csv, application/rssxml, tsv, xmlAvailable download formats
    Dataset updated
    Apr 22, 2014
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Authors
    Centers for Disease Control and Prevention National Center for Disease Prevention and Health Promotion Office on Smoking and Health
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    2008-2024. American Lung Association. Cessation Coverage. Medicaid data compiled by the Centers for Disease Control and Prevention’s Office on Smoking and Health were obtained from the State Tobacco Cessation Coverage Database, developed and administered by the American Lung Association. Data from 2008-2012 are reported on an annual basis; beginning in 2013 data are reported on a quarterly basis. Data include state-level information on Medicaid coverage of approved medications by the Food and Drug Administration (FDA) for tobacco cessation treatment; types of counseling recommended by the Public Health Service (PHS) and barriers to accessing cessation treatment. Note: Section 2502 of the Patient Protection and Affordable Care Act requires all state Medicaid programs to cover all FDA-approved tobacco cessation medications as of January 1, 2014. However, states are currently in the process of modifying their coverage to come into compliance with this requirement. Data in the STATE System on Medicaid coverage of tobacco cessation medications reflect evidence of coverage that is found in documentable sources, and may not yet reflect medications covered under this requirement.

  16. f

    Instruction formatted data for LLaMA2-chat fine tuning

    • figshare.com
    application/x-rar
    Updated Feb 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ForexPredict BoEC (2024). Instruction formatted data for LLaMA2-chat fine tuning [Dataset]. http://doi.org/10.6084/m9.figshare.25200836.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Feb 10, 2024
    Dataset provided by
    figshare
    Authors
    ForexPredict BoEC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    ######################### Medical scope, Prompt #1 61K instruction PubMED-Unlabeled ###################################################################Instruction: You are a language model specialized in generating JSON or YAML output. Given an instruction, you should generate a valid JSON or YAML object according to the provided guidelines. Your output should conform to the following JSON format: { "key1": "value1", "key2": "value2", ... } Make sure to adhere to the specified key-value pairs and maintain the correct syntax throughout your response. Your goal is to generate accurate and properly structured JSON or YAML output in response to the given instructions withoutany extra comment. Input: As a doctor that only speak in JSON, provide a "answer", "context and "keywords" about following question: Is naturopathy as effective as conventional therapy for treatment of menopausal symptoms? Output:{"answer" :Naturopathy appears to be an effective alternative for relief of specific menopausal symptoms compared to conventional therapy.,"context" :Although the use of alternative medicine in the United States is increasing, no published studies have documented the effectiveness of naturopathy for treatment of menopausal symptoms compared to women receiving conventional therapy in the clinical setting."keywords" :['Anxiety', 'Cohort Studies', 'Confidence Intervals', 'Estrogen Replacement Therapy', 'Female', 'Health Status', 'Hot Flashes', 'Humans', 'Menopause', 'Middle Aged','Naturopathy', 'Odds Ratio', 'Patient Satisfaction', 'Retrospective Studies', 'Risk Factors', 'Severity of Illness Index', 'Sleep Initiation and Maintenance Disorders','Surveys and Questionnaires', 'Time Factors', 'Urinary Incontinence', 'Vaginal Diseases', 'Washington']} ############################### Medical scope, Prompt #2 symptom-disease dataset 1K instruction ######################################################Instruction: You are a language model specialized in generating JSON or YAML output. Given an instruction, you should generate a valid JSON or YAML object according to theprovided guidelines. Your output should conform to the following JSON format: { "key1": "value1", "key2": "value2", ... } Make sure to adhere to the specified key-value pairs and maintain the correct syntax throughout your response. Your goal is to generate accurate and properly structured JSON or YAML output in response to the given instructions without any extra comment. Input: You are a doctor that only speak in JSON. Do not output that is'nt properly formated JSON. Provide following information about Panic disorder disease. Fill information about "symptom", "reason", "TestsAndProcedures" and "commonMedications" about Panic disorder disease in JSON format. Output: {"symptom" :['Anxiety and nervousness', 'Depression', 'Shortness of breath', 'Depressive or psychotic symptoms', 'Sharp chest pain', 'Dizziness', 'Insomnia', 'Abnormal involuntary movements', 'Chest tightness', 'Palpitations', 'Irregular heartbeat', 'Breathing fast'],"reason" :Panic disorder is an anxiety disorder characterized by recurring severe panic attacks. It may also include significant behavioral changes lasting at least a month and of ongoing worry about the implications or concern about having other attacks. The latter are called anticipatory attacks (DSM-IVR). Panic disorder is not the same as agoraphobia (fear of public places), although many afflicted with panic disorder also suffer from agoraphobia. Panic attacks cannot be predicted, therefore an individual may become stressed, anxious or worried wondering when the next panic attack will occur. Panic disorder may be differentiated as a medical condition, or chemical imbalance. The DSM-IV-TR describes panic disorder and anxiety differently. Whereas anxiety is preceded by chronic stressors which build to reactions of moderate intensity that can last for days, weeks or months, panic attacks are acute events triggered by a sudden, out-of-the-blue cause: duration is short and symptoms are more intense. Panic attacks can occur in children, as well as adults. Panic in young people may be particularly distressing because children tend to have less insight about what is happening, and parents are also likely to experience distress when attacks occur. "TestsAndProcedures" : ['Psychotherapy', 'Mental health counseling', 'Electrocardiogram', 'Depression screen (Depression screening)', 'Toxicology screen', 'Psychological and psychiatric evaluation and therapy']"commonMedications" : ['Lorazepam', 'Alprazolam (Xanax)', 'Clonazepam', 'Paroxetine (Paxil)', 'Venlafaxine (Effexor)', 'Mirtazapine', 'Buspirone (Buspar)', 'Fluvoxamine (Luvox)', 'Imipramine', 'Desvenlafaxine (Pristiq)', 'Clomipramine', 'Acamprosate (Campral)']} ############################### Medical scope, Prompt #3 194K instruction MED-MCQA dataset ######################################################Instruction: You are a language model specialized in generating JSON or YAML output. Given an instruction, you should generate a valid JSON or YAML object according to the provided guidelines. Your output should conform to the following JSON format: { "key1": "value1", "key2": "value2", ... } Make sure to adhere to the specified key-value pairs and maintain the correct syntax throughout your response. Your goal is to generate accurate and properly structured JSON or YAML output in response to the given instructions without any extra comment. Input: You are a medical teacher that only speak in JSON. Do not output that is'nt properly formated JSON. Generate a multichoice question and answer inAnatomy field and Urinary tract topic with following format: { "question": "expresion" , "OPA": "expresion", "OPB" : "expresion", "OPC": "expresion","OPD": "expresion", "Answer": "3"}. Output: { "question": Chronic urethral obstruction due to benign prismatic hyperplasia can lead to the following change in kidney parenchyma, "opa": Hyperplasia, "opb": Hyperophy, "opc": Atrophy, "opd": Dyplasia, "Answer": 2. ############################### Finance scope, Prompt #4 38K instruction ######################################################Instruction: You are a language model specialized in generating JSON or YAML output. Given an instruction, you should generate a valid JSON or YAML object according to the provided guidelines. Your output should conform to the following JSON format: { "key1": "value1", "key2": "value2", ... } Make sure to adhere to the specified key-value pairs and maintain the correct syntax throughout your response. Your goal is to generate accurate and properly structured JSON or YAML output in response to the given instructions without any extra comment. Input: You are a financial news analyzer that only speaks in JSON. Do not output that isn't properly formatted JSON. Analyze this news title "Here Why Bitcoin is Likely to See Historic Volatility in the Next Few Days". Provide sentiment as a probability distribution also indicate target market and related asset to the news title in JSON format. Output:{"sentiment": {"positive": 0.4463904500007629, "negative": 0.06607405841350555,"neutral":0.4875355064868927}, "market": "cryptocurrency" , "relatedAsset": ['BTC/USDT']}.
  17. m

    Prediction of Heart Attack

    • data.mendeley.com
    Updated May 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakin Sad Aftab (2024). Prediction of Heart Attack [Dataset]. http://doi.org/10.17632/yrwd336rkz.1
    Explore at:
    Dataset updated
    May 10, 2024
    Authors
    Rakin Sad Aftab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of 303 observations, each representing a unique patient, and 14 different attributes associated with heart disease. This dataset is a critical resource for researchers focusing on predictive analytics in cardiovascular diseases.

    Variables Overview: 1. Age: A continuous variable indicating the age of the patient. 2. Sex: A categorical variable with two levels ('Male', 'Female'), indicating the gender of the patient. 3. CP (Chest Pain type): A categorical variable describing the type of chest pain experienced by the patient, with categories such as 'Asymptomatic', 'Atypical Angina', 'Typical Angina', and 'Non-Angina'. 4. TRTBPS (Resting Blood Pressure): A continuous variable indicating the resting blood pressure (in mm Hg) on admission to the hospital. 5. Chol (Serum Cholesterol): A continuous variable measuring the serum cholesterol in mg/dl. 6. FBS (Fasting Blood Sugar): A binary variable where 1 represents fasting blood sugar > 120 mg/dl, and 0 otherwise. 7. Rest ECG (Resting Electrocardiographic Results): Categorizes the resting electrocardiographic results of the patient into 'Normal', 'ST Elevation', and other categories. 8. Thalachh (Maximum Heart Rate Achieved): A continuous variable indicating the maximum heart rate achieved by the patient. 9. Exng (Exercise Induced Angina): A binary variable where 1 indicates the presence of exercise-induced angina, and 0 otherwise. 10. Oldpeak (ST Depression Induced by Exercise Relative to Rest): A continuous variable indicating the ST depression induced by exercise relative to rest. 11. Slope (Slope of the Peak Exercise ST Segment): A categorical variable with levels such as 'Flat', 'Up Sloping', representing the slope of the peak exercise ST segment. 12. CA (Number of Major Vessels Colored by Fluoroscopy): A continuous variable ranging from 0 to 3, indicating the number of major vessels colored by fluoroscopy. 13. Thall (Thalassemia): A categorical variable, indicating different types of thalassemia (a blood disorder). 14. Target: A binary target variable indicating the presence (1) or absence (0) of heart disease.

    Descriptive Statistics: The patients' age ranges from 29 to 77 years, with a mean age of approximately 54 years. The resting blood pressure spans from 94 to 200 mm Hg, and the average cholesterol level is about 246 mg/dl. The maximum heart rate achieved varies widely among patients, from 71 to 202 beats per minute.

    Importance for Research: This dataset provides a comprehensive view of various factors that could potentially be linked to heart disease, making it an invaluable resource for developing predictive models. By analyzing relationships and patterns within these variables, researchers can identify key predictors of heart disease and enhance the accuracy of diagnostic tools. This could lead to better preventive measures and treatment strategies, ultimately improving patient outcomes in the realm of cardiovascular health

  18. d

    The medication rate for aspirin (AMI) patients within three months after...

    • data.gov.tw
    csv
    Updated Jun 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Health Insurance Administration (2025). The medication rate for aspirin (AMI) patients within three months after discharge - Aspirin (Disease-specific indicator) [Dataset]. https://data.gov.tw/en/datasets/39230
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    National Health Insurance Administration
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    The Bureau of National Health Insurance has selected evidence-based medical indicators to be made public in order to enhance the quality of medical care for patients with myocardial infarction, and to serve as a reference for disease treatment and healthcare quality. The public indicators are divided into three categories based on the treatment period and quality aspects, including inpatient process assessment, assessment of continued medication after discharge, and outcome assessment. This includes the ratio of inpatients with acute myocardial infarction (AMI) who undergo LDL cholesterol testing during hospitalization, the medication ratio for AMI patients during hospitalization, and within three, six, and nine months after discharge, and the ratio of patients returning to the emergency department within three days after discharge due to the same primary diagnosis or related conditions, as well as the unplanned readmission ratio within fourteen days after discharge due to the same primary diagnosis or related conditions.

  19. P

    Data from: ADNI Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jul 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ADNI Dataset [Dataset]. https://paperswithcode.com/dataset/adni
    Explore at:
    Dataset updated
    Jul 1, 2024
    Description

    Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multisite study that aims to improve clinical trials for the prevention and treatment of Alzheimer’s disease (AD).[1] This cooperative study combines expertise and funding from the private and public sector to study subjects with AD, as well as those who may develop AD and controls with no signs of cognitive impairment.[2] Researchers at 63 sites in the US and Canada track the progression of AD in the human brain with neuroimaging, biochemical, and genetic biological markers.[2][3] This knowledge helps to find better clinical trials for the prevention and treatment of AD. ADNI has made a global impact,[4] firstly by developing a set of standardized protocols to allow the comparison of results from multiple centers,[4] and secondly by its data-sharing policy which makes available all at the data without embargo to qualified researchers worldwide.[5] To date, over 1000 scientific publications have used ADNI data.[6] A number of other initiatives related to AD and other diseases have been designed and implemented using ADNI as a model.[4] ADNI has been running since 2004 and is currently funded until 2021.[7]

    Source: Wikipedia, https://en.wikipedia.org/wiki/Alzheimer%27s_Disease_Neuroimaging_Initiative

  20. Medication-Assisted Treatment in Medi-Cal for Opioid Use Disorders, by...

    • data.chhs.ca.gov
    • data.ca.gov
    • +2more
    csv, zip
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Services (2025). Medication-Assisted Treatment in Medi-Cal for Opioid Use Disorders, by County [Dataset]. https://data.chhs.ca.gov/dataset/medication-assisted-treatment-in-medi-cal-for-opioid-use-disorders-quarterly-by-county
    Explore at:
    csv(20602), zip, csv(107785)Available download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    California Department of Health Care Serviceshttp://www.dhcs.ca.gov/
    Authors
    Department of Health Care Services
    Description

    Medi-Cal claims for persons receiving medication-assisted treatment for opioid use disorders are unduplicated by medication. Methadone program participation is identified from claims submitted by the counties. Buprenorphine and naltrexone pharmacy claims are those used for treating opioid use disorders. Naloxone pharmacy claims are those dispensed as a first aid item (injectables and nasal sprays). The county source is taken from submitted claims.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1

Disease and symptoms dataset 2023

Explore at:
Dataset updated
Mar 3, 2025
Authors
Bran Stark
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.

Search
Clear search
Close search
Google apps
Main menu