Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
🩺 Diseases Dataset
A consolidated medical dataset combining disease names, symptoms, and treatments collected from multiple public datasets across Hugging Face and Kaggle. This dataset can be used for building disease prediction, symptom clustering, and medical assistant models.
📦 Dataset Summary
Field Type Description
Disease string Name of the disease or condition
Symptoms string List of symptoms or Description of symptomps
Treatments string (Optional)… See the full description on the dataset page: https://huggingface.co/datasets/kamruzzaman-asif/Diseases_Dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Synthetic Respiratory Symptoms Dataset is created for educational and research use to analyze associations between respiratory symptoms, underlying diseases, treatment types, and the severity of the condition. This synthetic dataset helps simulate real-world scenarios involving respiratory illnesses while ensuring privacy compliance through full anonymization.
https://storage.googleapis.com/opendatabay_public/38f90264-2312-43c5-9666-e1b18e7ce7e2/4fb39ac546e8_disease_count.png" alt="Synthetic Respiratory Symptoms Dataset Distribution by Disease Count">
https://storage.googleapis.com/opendatabay_public/38f90264-2312-43c5-9666-e1b18e7ce7e2/9b0990ccf2df_age_by_nature.png" alt="Synthetic Respiratory Symptoms Dataset Distribution by Nature">
This dataset can be used for the following applications:
This synthetic dataset is fully anonymised and complies with modern data privacy standards. It incorporates a diverse array of symptom profiles, diseases, and treatments relevant to respiratory illnesses, enabling a broad range of analytical and educational applications.
CC0 (Public Domain)
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The size of the Healthcare Data Industry market was valued at USD XX Million in 2023 and is projected to reach USD XXX Million by 2032, with an expected CAGR of 16.20% during the forecast period. Data in healthcare signifies all the information that is created or gathered in the healthcare industry. These include patient records, electronic health records, genomic data, health insurance claims, medical images, and all other clinical trial data. All this stands at the back of modern healthcare and could support many critical applications. First and foremost, health data improves patient care. Pattern analysis for patient records is simplified by health care providers in ensuring accurate disease diagnosis and application of personalized treatment plans. Medical field images, such as X-rays and MRIs, are helpful in finding abnormalities and useful in surgical methods. Genomic data insights comprise susceptibility from a genetic view point, which therefore enables coming up with a customised treatment plan for diseases such as cancer. Then, the health information data is very crucial in conducting research and developing new medical knowledge. Researchers analyze epidemiology of diseases by adopting massive datasets, manufacture new drugs and treatments, and analyze effectiveness of health care programs by such datasets. For instance, the medical trials dataset helps in the development of evidence about the safety and efficiency of new treatment options. The health insurance claims dataset can help assess healthcare utilization patterns so as to identify areas in need of improvement. Therefore, health care data also enables administrative and operational functions of health care organizations. EHRs allow easy maintenance of the patient data, enable sound communications among healthcare providers, and minimize errors. Apart from this, analytics on health insurance claims are performed to make possible billing and reimbursement services to ensure the payment of the healthcare provider in the right amount of their rendered service. Further, analytics data could be used for optimization of resource utilization, in identifying potential cost savings, and making health care organizations efficient as a whole. Healthcare information is one of those precious assets that propel innovation, promote better patient outcomes, and support the coherent functioning of the healthcare system. Therefore, improving the quality and efficiency in which care delivery is offered can be achieved through the effective use of healthcare information by healthcare providers, researchers, and administrators for a better state of health among individuals and communities. Recent developments include: March 2022: Microsoft launched Azure Health Data Services in the United States. It is a platform as a service (PAAS) offering designed exclusively to support protected health information (PHI) in the cloud., March 2022: The government of Thailand launched a big data portal for healthcare facilities. The National Reforms Committee on Public Health recently joined hands with 12 government agencies to improve the quality of healthcare services by implementing digital technologies.. Key drivers for this market are: Increase in Demand for Analytics Solutions for Population Health Management, Rise in Need for Business Intelligence to Optimize Health Administration and Strategy; Surge in Adoption of Big Data in the Healthcare Industry. Potential restraints include: Security Concerns Related to Sensitive Patients Medical Data, High Cost of Implementation and Deployment. Notable trends are: Cloud Segment is Expected to Register a High Growth Rate Over the Forecast Period.
MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.
The dataset contains 199999 Rows and 21 Variables of various medical Description & health indication.
USMER (Medical Unit Code): An identifier for the medical unit where the patient was treated or admitted. MEDICAL_UNIT: The name or code of the medical unit where the patient received treatment. SEX: The gender of the patient (e.g., Male, Female). PATIENT_TYPE: Indicates whether the patient is an outpatient or hospitalized (e.g., Outpatient, Hospitalized). DATE_DIED: The date of death of the patient, if applicable. This variable may be null for surviving patients. INTUBED: Indicates whether the patient was intubated (i.e., received mechanical ventilation) during treatment (e.g., Yes, No). PNEUMONIA: Indicates whether the patient had pneumonia as a complication of COVID-19 infection (e.g., Yes, No). AGE: The age of the patient at the time of diagnosis or admission. PREGNANT: Indicates whether the patient was pregnant at the time of diagnosis or admission (e.g., Yes, No). DIABETES: Indicates whether the patient had diabetes as a pre-existing condition (e.g., Yes, No). COPD (Chronic Obstructive Pulmonary Disease): Indicates whether the patient had COPD as a pre-existing condition (e.g., Yes, No). ASTHMA: Indicates whether the patient had asthma as a pre-existing condition (e.g., Yes, No). INMSUPR (Immunosuppression): Indicates whether the patient had immunosuppression (weakened immune system) as a pre-existing condition (e.g., Yes, No). HIPERTENSION (Hypertension): Indicates whether the patient had hypertension (high blood pressure) as a pre-existing condition (e.g., Yes, No). OTHER_DISEASE: Indicates whether the patient had other pre-existing diseases or medical conditions not specifically listed (e.g., Yes, No). CARDIOVASCULAR: Indicates whether the patient had cardiovascular disease as a pre-existing condition (e.g., Yes, No). OBESITY: Indicates whether the patient had obesity as a pre-existing condition (e.g., Yes, No). RENAL_CHRONIC (Chronic Renal Insufficiency): Indicates whether the patient had chronic renal insufficiency (chronic kidney disease) as a pre-existing condition (e.g., Yes, No). TOBACCO: Indicates whether the patient was a tobacco user (e.g., Yes, No). CLASIFFICATION_FINAL (Final Classification): The final classification of the patient's COVID-19 case based on severity or outcome (e.g., Mild, Severe, Deceased). ICU (Intensive Care Unit): Indicates whether the patient was admitted to the intensive care unit (ICU) during treatment (e.g., Yes, No).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Dry Eye Disease Patient Records (Synthetic) is designed for educational and research purposes to analyze patterns in sleep behavior, stress levels, lifestyle factors, and their potential links to dry eye disease. It provides anonymized, synthetic data on various health conditions and behavioral habits.
https://storage.googleapis.com/opendatabay_public/f4e9ad52-5d13-4d2e-ac19-207a5b71522e/2e2e949519d7_eye.png" alt="Dry Eye Disease Patient Records Synthetic Data">
This dataset can be used for the following applications:
This synthetic dataset is fully anonymized and complies with data privacy standards. It includes a variety of demographic and lifestyle factors to support a broad range of research and analysis.
CC0 (Public Domain)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set for M. Paul and M. Dredze, "Discovering health topics in social media using topic models". This includes the set of tweets used in the experiments, and the words associatedwith ailments discovered by the Ailment Topic Aspect Model (ATAM). Contact: Michael Paul (mpaul39@gmail.com)Released June 26, 2014 atam.topwords.csv- The most probable words for each ailment. The first column is the ailment ID.The second column indicates if it is a general (G), symptom (S), or treatment (T) word.The third column is the word. The fourth column is the probability. Words are shownin descending order of probability until 90% of the probability mass is accumulatedfor each ailment or until probabilities drop below 1.0e-4. atam.tweets.x.csv (for x=[0-9])- The tweets used in the study. The first column is the tweet ID. The second columnindicates the ailment ID for the ailment sampled for that tweet.(See the atam.topwords.csv file for the most probable words associated with each ailment ID.)Full tweets can be downloaded using the tweet ID through the Twitter API(https://dev.twitter.com/docs/api/1.1). keywords.txt - The set of 269 health-related keywords used in our keyword-filtered Twitter stream as part of our dataset. keywords_x.txt (for x={diseases,symptoms,treatments})- The set of approximately 20,000 keyphrases crawled from wrongdiagnosis.com describingthe names of diseases, symptoms, and treatments and medications. These keyword lists areused to create input for ATAM (which requires phrases to be labeled as symptoms or treatments),and also to initially filter our dataset when constructing our health classifiers.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This Synthetic Cardiovascular Disease Dataset is created for educational and research purposes in cardiology, public health, and data science. It provides demographic, medical, and diagnostic details related to cardiovascular diseases, enabling analysis of risk factors, disease progression, and treatment outcomes. The dataset can be utilized for building predictive models and exploring disease management strategies.
https://storage.googleapis.com/opendatabay_public/fd585f02-456f-4aa1-b7d1-4ea4499c7824/73db4d04d538_cardio.png" alt="Synthetic Cardiovascular Disease Prediction Data Distribution">
This dataset is suited for the following applications:
CC0 (Public Domain)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Lung disease encompasses a wide range of conditions that affect the lungs and their ability to function effectively. These conditions can be caused by various factors, including infections, environmental factors, genetic predispositions, and lifestyle choices. Lung diseases can result in symptoms such as coughing, shortness of breath, chest pain, and reduced lung function. Detecting and diagnosing lung diseases is crucial for patient care, as they can have a significant impact on an individual's health and quality of life.
Global Impact:
Lung diseases have a substantial global impact. According to the World Health Organization (WHO), respiratory diseases, including lung diseases, are responsible for a significant portion of global mortality. In 2016, respiratory diseases were the fourth leading cause of death worldwide, with an estimated 3.0 million deaths attributed to them. Conditions like pneumonia, chronic obstructive pulmonary disease (COPD), and lung cancer contribute to this high mortality rate. Early detection and accurate diagnosis are essential for reducing the burden of lung diseases on public health.
The Need to Detect Lung Diseases:
Detecting lung diseases is vital for several reasons:
Early Intervention: Early detection allows for timely medical intervention and treatment, increasing the chances of successful management and recovery.
Disease Classification: Differentiating between various lung diseases, such as pneumonia, tuberculosis, and lung cancer, is crucial for appropriate treatment planning.
Public Health: Effective disease detection and management can have a positive impact on public health by reducing the overall disease burden.
Lung X-Ray Image Dataset:
The "Lung X-Ray Image Dataset" is a comprehensive collection of X-ray images that plays a pivotal role in the detection and diagnosis of lung diseases. This dataset contains a large number of high-quality X-ray images, meticulously collected from diverse sources, including hospitals, clinics, and healthcare institutions.
Dataset Contents:
Total Number of Images: The dataset comprises a total of 3,475 X-ray images. Classes within the Dataset:
Normal (1250 Images): These images represent healthy lung conditions, serving as a reference for comparison in diagnostic procedures.
Lung Opacity (1125 Images): This class includes X-ray images depicting various degrees of lung abnormalities, providing a diverse set of cases for analysis.
Viral Pneumonia (1100 Images): Images in this category are associated with viral pneumonia cases, contributing to the understanding and identification of this specific lung infection.
In conclusion, the "Lung X-Ray Image Dataset" plays a crucial role in the healthcare sector by providing a diverse and well-documented collection of X-ray images that support the detection, classification, and understanding of lung diseases. This resource is instrumental in advancing the field of respiratory medicine and improving patient outcomes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
brain MRI
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This synthetic Parkinson's Disease Detection Dataset is designed for educational and research purposes in the fields of data science, healthcare analytics, and medical research. It contains key clinical and speech features from individuals with Parkinson's Disease, which can be used to build predictive models, analyze disease progression, and assess the impact of motor and speech symptoms. The dataset is ideal for tasks such as classification, regression, and the study of biomarkers for Parkinson’s disease.
Dataset Features: Index: Row identifier for each record. Age: The age of the patient. Sex: Gender of the patient (Male/Female). Test_time: Duration or time of the test conducted (in minutes). Motor_UPDRS: Motor component score from the Unified Parkinson’s Disease Rating Scale (UPDRS). Total_UPDRS: Total score from the UPDRS, including both motor and non-motor components. Jitter(%): Percentage of frequency variation in speech, a key indicator of Parkinson’s disease. Jitter(Abs): Absolute jitter value, quantifying frequency variation. Jitter:RAP: Jitter measured using the Relative Average Perturbation method. Jitter:PPQ5: Jitter measured using the 5-point Perturbation Quotient. Jitter:DDP: Jitter measured using the Difference of Difference of Polynomials method. Shimmer: Amplitude variation in speech, indicating vocal instability. Shimmer(dB): Amplitude variation in decibels. Shimmer:APQ3: Shimmer measured using the 3-point Amplitude Perturbation Quotient. Shimmer:APQ5: Shimmer measured using the 5-point Amplitude Perturbation Quotient. Shimmer:APQ11: Shimmer measured using the 11-point Amplitude Perturbation Quotient. Shimmer:DDA: Shimmer measured using the Difference of Difference of Amplitudes method. NHR: Noise to Harmonics Ratio, a measure of voice quality and periodicity. HNR: Harmonics to Noise Ratio, reflecting the periodicity of speech sounds. RPDE: Recurrence Period Density Entropy, derived from voice signal analysis. DFA: Detrended Fluctuation Analysis, studying self-similarity in speech signals. PPE: Pitch Period Entropy, quantifying irregularity in pitch periods during speech. Usage This dataset is perfect for various applications related to Parkinson's Disease detection and analysis:
Disease Prediction: Develop machine learning models to predict the presence and progression of Parkinson’s Disease. Speech Analysis: Study speech features like jitter and shimmer for early diagnosis and monitoring of Parkinson's Disease. Predictive Modeling: Build models using clinical and speech features to assess disease severity. Clinical Research: Investigate the relationship between motor and non-motor symptoms of Parkinson's Disease. Healthcare Analytics: Apply data science techniques to improve the diagnosis and treatment of Parkinson’s Disease. Coverage This synthetic dataset is anonymized and designed for research and learning purposes. It includes a diverse range of speech and clinical data, simulating different stages of Parkinson’s Disease for analysis.
License CC0 (Public Domain)
Who Can Use It Data Science Practitioners: For practicing data preprocessing, classification, and regression tasks. Healthcare Analysts and Researchers: To explore relationships between clinical and speech features in Parkinson's Disease. Medical Professionals: To enhance understanding of Parkinson’s Disease symptoms and progressions. Machine Learning Enthusiasts: To experiment with models for predicting Parkinson’s Disease using diverse features. Academic Institutions: For use in educational settings to teach data science applications in healthcare.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Contains data from World Health Organization's data portal covering the following categories: Mortality and global health estimates, Sustainable development goals, Millennium Development Goals (MDGs), Health systems, Malaria, Tuberculosis, Child health, Infectious diseases, Neglected Tropical Diseases, World Health Statistics, Health financing, Tobacco, Substance use and mental health, Injuries and violence, HIV/AIDS and other STIs, Public health and environment, Nutrition, Urban health, Child mortality, Noncommunicable diseases, Noncommunicable diseases CCS, Negelected tropical diseases, Infrastructure, Essential health technologies, Medical equipment, Demographic and socioeconomic statistics, Health inequality monitor, Health Equity Monitor, Child malnutrition, TOBACCO, Neglected tropical diseases, International Health Regulations (2005) monitoring framework, 0, Insecticide resistance, Oral health, Universal Health Coverage, Global Observatory for eHealth (GOe), RSUD: GOVERNANCE, POLICY AND FINANCING : PREVENTION, RSUD: GOVERNANCE, POLICY AND FINANCING: TREATMENT, RSUD: GOVERNANCE, POLICY AND FINANCING: FINANCING, RSUD: SERVICE ORGANIZATION AND DELIVERY: TREATMENT SECTORS AND PROVIDERS, RSUD: SERVICE ORGANIZATION AND DELIVERY: TREATMENT CAPACITY AND TREATMENT COVERAGE, RSUD: SERVICE ORGANIZATION AND DELIVERY: PHARMACOLOGICAL TREATMENT, RSUD: SERVICE ORGANIZATION AND DELIVERY: SCREENING AND BRIEF INTERVENTIONS, RSUD: SERVICE ORGANIZATION AND DELIVERY: PREVENTION PROGRAMS AND PROVIDERS, RSUD: SERVICE ORGANIZATION AND DELIVERY: SPECIAL PROGRAMMES AND SERVICES, RSUD: HUMAN RESOURCES, RSUD: INFORMATION SYSTEMS, RSUD: YOUTH, FINANCIAL PROTECTION, AMR GLASS, Noncommunicable diseases and mental health, Health workforce, AMR GASP, ICD, SEXUAL AND REPRODUCTIVE HEALTH, Immunization, NLIS, AMC GLASS. For links to individual indicator metadata, see resource descriptions.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
2008-2024. American Lung Association. Cessation Coverage. Medicaid data compiled by the Centers for Disease Control and Prevention’s Office on Smoking and Health were obtained from the State Tobacco Cessation Coverage Database, developed and administered by the American Lung Association. Data from 2008-2012 are reported on an annual basis; beginning in 2013 data are reported on a quarterly basis. Data include state-level information on Medicaid coverage of approved medications by the Food and Drug Administration (FDA) for tobacco cessation treatment; types of counseling recommended by the Public Health Service (PHS) and barriers to accessing cessation treatment. Note: Section 2502 of the Patient Protection and Affordable Care Act requires all state Medicaid programs to cover all FDA-approved tobacco cessation medications as of January 1, 2014. However, states are currently in the process of modifying their coverage to come into compliance with this requirement. Data in the STATE System on Medicaid coverage of tobacco cessation medications reflect evidence of coverage that is found in documentable sources, and may not yet reflect medications covered under this requirement.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of 303 observations, each representing a unique patient, and 14 different attributes associated with heart disease. This dataset is a critical resource for researchers focusing on predictive analytics in cardiovascular diseases.
Variables Overview: 1. Age: A continuous variable indicating the age of the patient. 2. Sex: A categorical variable with two levels ('Male', 'Female'), indicating the gender of the patient. 3. CP (Chest Pain type): A categorical variable describing the type of chest pain experienced by the patient, with categories such as 'Asymptomatic', 'Atypical Angina', 'Typical Angina', and 'Non-Angina'. 4. TRTBPS (Resting Blood Pressure): A continuous variable indicating the resting blood pressure (in mm Hg) on admission to the hospital. 5. Chol (Serum Cholesterol): A continuous variable measuring the serum cholesterol in mg/dl. 6. FBS (Fasting Blood Sugar): A binary variable where 1 represents fasting blood sugar > 120 mg/dl, and 0 otherwise. 7. Rest ECG (Resting Electrocardiographic Results): Categorizes the resting electrocardiographic results of the patient into 'Normal', 'ST Elevation', and other categories. 8. Thalachh (Maximum Heart Rate Achieved): A continuous variable indicating the maximum heart rate achieved by the patient. 9. Exng (Exercise Induced Angina): A binary variable where 1 indicates the presence of exercise-induced angina, and 0 otherwise. 10. Oldpeak (ST Depression Induced by Exercise Relative to Rest): A continuous variable indicating the ST depression induced by exercise relative to rest. 11. Slope (Slope of the Peak Exercise ST Segment): A categorical variable with levels such as 'Flat', 'Up Sloping', representing the slope of the peak exercise ST segment. 12. CA (Number of Major Vessels Colored by Fluoroscopy): A continuous variable ranging from 0 to 3, indicating the number of major vessels colored by fluoroscopy. 13. Thall (Thalassemia): A categorical variable, indicating different types of thalassemia (a blood disorder). 14. Target: A binary target variable indicating the presence (1) or absence (0) of heart disease.
Descriptive Statistics: The patients' age ranges from 29 to 77 years, with a mean age of approximately 54 years. The resting blood pressure spans from 94 to 200 mm Hg, and the average cholesterol level is about 246 mg/dl. The maximum heart rate achieved varies widely among patients, from 71 to 202 beats per minute.
Importance for Research: This dataset provides a comprehensive view of various factors that could potentially be linked to heart disease, making it an invaluable resource for developing predictive models. By analyzing relationships and patterns within these variables, researchers can identify key predictors of heart disease and enhance the accuracy of diagnostic tools. This could lead to better preventive measures and treatment strategies, ultimately improving patient outcomes in the realm of cardiovascular health
https://data.gov.tw/licensehttps://data.gov.tw/license
The Bureau of National Health Insurance has selected evidence-based medical indicators to be made public in order to enhance the quality of medical care for patients with myocardial infarction, and to serve as a reference for disease treatment and healthcare quality. The public indicators are divided into three categories based on the treatment period and quality aspects, including inpatient process assessment, assessment of continued medication after discharge, and outcome assessment. This includes the ratio of inpatients with acute myocardial infarction (AMI) who undergo LDL cholesterol testing during hospitalization, the medication ratio for AMI patients during hospitalization, and within three, six, and nine months after discharge, and the ratio of patients returning to the emergency department within three days after discharge due to the same primary diagnosis or related conditions, as well as the unplanned readmission ratio within fourteen days after discharge due to the same primary diagnosis or related conditions.
Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multisite study that aims to improve clinical trials for the prevention and treatment of Alzheimer’s disease (AD).[1] This cooperative study combines expertise and funding from the private and public sector to study subjects with AD, as well as those who may develop AD and controls with no signs of cognitive impairment.[2] Researchers at 63 sites in the US and Canada track the progression of AD in the human brain with neuroimaging, biochemical, and genetic biological markers.[2][3] This knowledge helps to find better clinical trials for the prevention and treatment of AD. ADNI has made a global impact,[4] firstly by developing a set of standardized protocols to allow the comparison of results from multiple centers,[4] and secondly by its data-sharing policy which makes available all at the data without embargo to qualified researchers worldwide.[5] To date, over 1000 scientific publications have used ADNI data.[6] A number of other initiatives related to AD and other diseases have been designed and implemented using ADNI as a model.[4] ADNI has been running since 2004 and is currently funded until 2021.[7]
Source: Wikipedia, https://en.wikipedia.org/wiki/Alzheimer%27s_Disease_Neuroimaging_Initiative
Medi-Cal claims for persons receiving medication-assisted treatment for opioid use disorders are unduplicated by medication. Methadone program participation is identified from claims submitted by the counties. Buprenorphine and naltrexone pharmacy claims are those used for treating opioid use disorders. Naloxone pharmacy claims are those dispensed as a first aid item (injectables and nasal sprays). The county source is taken from submitted claims.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.