100+ datasets found
  1. Covid_19_Weather_Dataset

    • kaggle.com
    Updated Apr 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasanth Antonyraj (2020). Covid_19_Weather_Dataset [Dataset]. https://www.kaggle.com/johnprasanth/covid-19-weather-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Prasanth Antonyraj
    Description

    Context

    This dataset contains weather details of five most important countries including Germany and Italy which was affected greatly with Covid_19 spread.

    Content

    It is believed that climate conditions might be one of the major reasons for the spread of covid_19. This Dataset contains climate changes occured from 19th February to 17th April 2020. This contains the climate changes recorded for every 10 mins on the aforementioned countries.

    File Description

    The file contains below columns:

    Temperature - Actual Temperature Recorded in degree celsius Wind_speed - Wind Speed Description - Description of the current weather Weather - Categorical value depicts the types of weather name - Depicts the country name temp_min - Minimum temperature recorded temp_max - Maximum temperature recorded

    Other variables are pretty much self explanatory.

    Acknowledgements

    As part of my thesis project, this dataset was being prepared with a help of web scraper which will trigger an open source REST API end point for every 10 minutes. It was hosted in an EC2 instance which will update a CSV file periodically. Thought that this could contribute for the analysis of Covid_19 spread, hence shared the same.

    Hope this could be useful!

    Inspiration

    As mentioned earlier, Climate could be one of the significant factors which spreads covid_19. Need to analyse further on the same. Italy could be considered for the research as we have the climate data for that country. Alongside, this country was affected largely.

  2. m

    Disease and symptoms dataset 2023

    • data.mendeley.com
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1
    Explore at:
    Dataset updated
    Mar 3, 2025
    Authors
    Bran Stark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.

  3. p

    Cleveland Clinic Heart Disease Dataset - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Cleveland Clinic Heart Disease Dataset - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/cleveland-clinic-heart-disease-dataset
    Explore at:
    Dataset updated
    Oct 8, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coronary heart disease (CHD) involves the reduction of blood flow to the heart muscle due to build-up of plaque in the arteries of the heart. It is the most common form of cardiovascular disease. Currently, invasive coronary angiography represents the gold standard for establishing the presence, location, and severity of CAD, however this diagnostic method is costly and associated with morbidity and mortality in CAD patients. Therefore, it would be beneficial to develop a non-invasive alternative to replace the current gold standard. Other less invasive diagnostics methods have been proposed in the scientific literature including exercise electrocardiogram, thallium scintigraphy and fluoroscopy of coronary calcification. However the diagnostic accuracy of these tests only ranges between 35%-75%. Therefore, it would be beneficial to develop a computer aided diagnostic tool that could utilize the combined results of these non-invasive tests in conjunction with other patient attributes to boost the diagnostic power of these non-invasive methods with the aim ultimately replacing the current invasive gold standard.

  4. A

    ‘Death Cause by Country’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Death Cause by Country’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-death-cause-by-country-3051/00ae526f/?iid=001-918&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Death Cause by Country’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/majyhain/death-cause-by-country on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Across low- and middle-income countries, mortality from infectious disease, malnutrition, nutritional deficiencies, neonatal and maternal deaths are common – and in some cases, dominant. In Kenya, for example, diarrheal infections are still the primary cause of death. HIV/AIDS is the major cause of death in South Africa and Botswana. However, in high-income countries, the proportion of deaths due by these causes is quite low.

    Content

    The dataset contains thirty two columns and contains the death causes by All Genders (Male, Female) and by all age group.

    Acknowledgements

    Users are allowed to use, copy, distribute and cite the dataset as follows: “Majyhain, Death Causes by Country, Kaggle Dataset, February 04, 2022.”

    Inspiration

    The ideas for this data is to: • The amount of people dying by various diseases.

    • What is the death cause reasons by country.

    • Number of People dying by various diseases.

    • Which disease is causing more deaths by country.

    • Which disease is causing more deaths by world.

    References:

    The Data is collected from the following sites:

    https://www.who.int/

    --- Original source retains full ownership of the source dataset ---

  5. i

    Heart Disease Dataset (Comprehensive)

    • ieee-dataport.org
    Updated Oct 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MANU SIDDHARTHA (2019). Heart Disease Dataset (Comprehensive) [Dataset]. https://ieee-dataport.org/open-access/heart-disease-dataset-comprehensive
    Explore at:
    Dataset updated
    Oct 24, 2019
    Authors
    MANU SIDDHARTHA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset

  6. i

    Cardiovascular Disease Dataset

    • ieee-dataport.org
    Updated Oct 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajib Kumar Halder Halder (2022). Cardiovascular Disease Dataset [Dataset]. https://ieee-dataport.org/documents/cardiovascular-disease-dataset
    Explore at:
    Dataset updated
    Oct 25, 2022
    Authors
    Rajib Kumar Halder Halder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This heart disease dataset is curated by combining 3 popular heart disease datasets. The first dataset (Collected from Kaggle) contains 70000 records with 11 independent features which makes it the largest heart disease dataset available so far for research purposes. These data were collected at the moment of medical examination and information given by the patient. Second and third datasets contain 303 and 293 intstances respectively with 13 common features. The three datasets used for its curation are:Cardio Data (Kaggle Dataset)

  7. Tomato-Village dataset

    • kaggle.com
    Updated Aug 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mvgehlot (2023). Tomato-Village dataset [Dataset]. https://www.kaggle.com/datasets/mamtag/tomato-village/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    mvgehlot
    Description

    Problem statement : Tomato is one of the most extensively grown vegetables in any country, and their diseases can significantly affect yield and quality. Accurate and early detection of tomato diseases is crucial for reducing losses and improving crop management. Current Deep Learning and CNN research have resulted in the availability of multiple CNN designs, making automated plant disease identification viable rather than traditional visual inspection-based disease detection. When using Deep Learning Methods, the dataset serves one of the most crucial roles in disease prediction. PlantVillage is the most widely used publicly available dataset for Tomato Disease detection, but it was created in a lab/controlled environment, and models trained on it do not perform well on real-world images. Some natural or real-world datasets are available, but they are private and not publicly available. Also, when attempting to predict tomato diseases on the field in the Jodhpur and Jaipur districts of Rajasthan, India, we found that the majority of diseases are Leaf Miner, spotted wilt virus, and Nutrition deficiency diseases, but there are no public datasets containing such categories.

    Proposed Solution:To overcome these challenges, we propose the creation of a new dataset called "Tomato-Village" with three variants: a) Multiclass tomato disease classification, b) Multilabel tomato disease classification and c) Object detection based tomato disease detection. As per our best knowledge, “Tomato-Village” will be the first such dataset to be available publicly. Further, we have applied the various CNN architectures/models on this dataset, and baseline results are drawn.

    To use the dataset , Please cite the below article : Gehlot, M., Saxena, R.K. & Gandhi, G.C. “Tomato-Village”: a dataset for end-to-end tomato disease detection in a real-world environment. Multimedia Systems (2023). DOI : https://doi.org/10.1007/s00530-023-01158-y

    Article Link : https://link.springer.com/article/10.1007/s00530-023-01158-y

  8. f

    Data from: Full dataset.

    • plos.figshare.com
    xlsx
    Updated Nov 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josephine Bourner; Lovarivelo Andriamarohasina; Alex Salam; Nzelle Delphine Kayem; Rindra Randremanana; Piero Olliaro (2023). Full dataset. [Dataset]. http://doi.org/10.1371/journal.pntd.0011509.s006
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    PLOS Neglected Tropical Diseases
    Authors
    Josephine Bourner; Lovarivelo Andriamarohasina; Alex Salam; Nzelle Delphine Kayem; Rindra Randremanana; Piero Olliaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundPlague is a zoonotic disease that, despite affecting humans for more than 5000 years, has historically been the subject of limited drug development activity. Drugs that are currently recommended in treatment guidelines have been approved based on animal studies alone–no pivotal clinical trials in humans have yet been completed. As a result of the sparse clinical research attention received, there are a number of methodological challenges that need to be addressed in order to facilitate the collection of clinical trial data that can meaningfully inform clinicians and policy-makers. One such challenge is the identification of clinically-relevant endpoints, which are informed by understanding the clinical characterisation of the disease–how it presents and evolves over time, and important patient outcomes, and how these can be modified by treatment.Methodology/Principal findingsThis systematic review aims to summarise the clinical profile of 1343 patients with bubonic plague described in 87 publications, identified by searching bibliographic databases for studies that meet pre-defined eligibility criteria. The majority of studies were individual case reports. A diverse group of signs and symptoms were reported at baseline and post-baseline timepoints–the most common of which was presence of a bubo, for which limited descriptive and longitudinal information was available. Death occurred in 15% of patients; although this varied from an average 10% in high-income countries to an average 17% in low- and middle-income countries. The median time to death was 1 day, ranging from 0 to 16 days.Conclusions/SignificanceThis systematic review elucidates the restrictions that limited disease characterisation places on clinical trials for infectious diseases such as plague, which not only impacts the definition of trial endpoints but has the knock-on effect of challenging the interpretation of a trial’s results. For this reason and despite interventional trials for plague having taken place, questions around optimal treatment for plague persist.

  9. Deaths from Liver Disease - Datasets - Lincolnshire Open Data

    • lincolnshire.ckan.io
    Updated May 10, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.io (2017). Deaths from Liver Disease - Datasets - Lincolnshire Open Data [Dataset]. https://lincolnshire.ckan.io/dataset/deaths-from-liver-disease
    Explore at:
    Dataset updated
    May 10, 2017
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This data shows premature deaths (Age under 75) from Liver Disease, numbers and rates by gender, as 3-year moving-averages. Most liver disease is preventable and much is influenced by alcohol consumption and obesity prevalence, which are both amenable to public health interventions. Directly Age-Standardised Rates (DASR) are shown in the data (where numbers are sufficient) so that death rates can be directly compared between areas. The DASR calculation applies Age-specific rates to a Standard (European) population to cancel out possible effects on crude rates due to different age structures among populations, thus enabling direct comparisons of rates. A limitation on using mortalities as a proxy for prevalence of health conditions is that mortalities may give an incomplete view of health conditions in an area, as ill-health might not lead to premature death. Low numbers may result in zero values or missing data. Data source: Office for Health Improvement and Disparities (OHID), Public Health Outcomes Framework (PHOF) indicator 40601 (E06a). The data is updated annually.

  10. m

    Covid-19 latest news dataset

    • data.mendeley.com
    Updated Oct 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Thakur (2021). Covid-19 latest news dataset [Dataset]. http://doi.org/10.17632/8rbm7d874k.1
    Explore at:
    Dataset updated
    Oct 27, 2021
    Authors
    Rajat Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coronavirus disease 2019 (COVID19) time series that lists confirmed cases, reported deaths, and reported recoveries. Data is broken down by country (and sometimes by sub-region).

    Coronavirus disease (COVID19) is caused by severe acute respiratory syndrome Coronavirus 2 (SARSCoV2) and has had an effect worldwide. On March 11, 2020, the World Health Organization (WHO) declared it a pandemic, currently indicating more than 118,000 cases of coronavirus disease in more than 110 countries and territories around the world.

    This dataset contains the latest news related to Covid-19 and it was fetched with the help of Newsdata.io news API.

  11. BRFSS 2020 Heart Disease Dataset(Cleaned Version)

    • zenodo.org
    csv
    Updated May 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koushal Kumar; BP Pande; Koushal Kumar; BP Pande (2025). BRFSS 2020 Heart Disease Dataset(Cleaned Version) [Dataset]. http://doi.org/10.5281/zenodo.15364962
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 8, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Koushal Kumar; BP Pande; Koushal Kumar; BP Pande
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Originally, the dataset come from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]".

    To improve the efficiency and relevance of our analysis, we removed certain attributes from the original BRFSS dataset. Many of the 279 original attributes included administrative codes, metadata, or survey-specific variables that do not contribute meaningfully to heart disease prediction—such as respondent IDs, timestamps, state-level identifiers, and detailed lifestyle questions unrelated to cardiovascular health. By focusing on a carefully selected subset of 18 attributes directly linked to medical, behavioral, and demographic factors known to influence heart health, we streamlined the dataset. This not only reduced computational complexity but also improved model interpretability and performance by eliminating noise and irrelevant information. All predicting variables could be divided into 4 broad categories:

    1. Demographic factors: sex, age category (14 levels), race, BMI (Body Mass Index)

    2. Diseases: weather respondent ever had such diseases as asthma, skin cancer, diabetes, stroke or kidney disease (not including kidney stones, bladder infection or incontinence)

    3. Unhealthy habits:

      • Smoking - respondents that smoked at least 100 cigarettes in their entire life (5 packs = 100 cigarettes)
      • Alcohol Drinking - heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
    4. General Health:

      • Difficulty Walking - weather respondent have serious difficulty walking or climbing stairs
      • Physical Activity - adults who reported doing physical activity or exercise during the past 30 days other than their regular job
      • Sleep Time - respondent’s reported average hours of sleep in a 24-hour period
      • Physical Health - number of days being physically ill or injured (0-30 days)
      • Mental Health - number of days having bad mental health (0-30 days)
      • General Health - respondents declared their health as ’Excellent’, ’Very good’, ’Good’ ,’Fair’ or ’Poor’

    Below is a description of the features collected for each patient:

    <td style="width:

    S. No.

    Original Variable/Attribute

    Coded Variable/Attribute

    Interpretation

    1.

    CVDINFR4

    HeartDisease

    Those who have ever had CHD or myocardial infarction

    2.

    _BMI5CAT

    BMI

    Body Mass Index

    3.

    _SMOKER3

    Smoking

    Have you ever smoked more than 100 cigarettes in your life? (The answer is either yes or no)

    4.

    _RFDRHV7

    AlcoholDrinking

    Adult men who drink more than 14 drinks per week and adult women who consume more than 7 drinks per week are considered heavy drinkers

    5.

    CVDSTRK3

    Stroke

    (Ever told) (you had) a stroke?

    6.

    PHYSHLTH

    PhysicalHealth

    It includes physical illness and injury during the past 30 days

    7.

    MENTHLTH

    MentalHealth

    How many days in the last 30 days have you had poor mental health?

    8.

    DIFFWALK

    DiffWalking

    Are you having trouble walking or climbing stairs?

    9.

    SEXVAR

    Sex

    Are you male or female?

    10.

    _AGE_G

    AgeCategory

    Out of given fourteen age groups, which group do you fall into?

  12. COVID-19 Cases by Country

    • console.cloud.google.com
    Updated Jul 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:European%20Centre%20for%20Disease%20Prevention%20and%20Control&inv=1&invt=Ab2tgg (2020). COVID-19 Cases by Country [Dataset]. https://console.cloud.google.com/marketplace/product/european-cdc/covid-19-global-cases
    Explore at:
    Dataset updated
    Jul 23, 2020
    Dataset provided by
    Googlehttp://google.com/
    Description

    This dataset is maintained by the European Centre for Disease Prevention and Control (ECDC) and reports on the geographic distribution of COVID-19 cases worldwide. This data includes COVID-19 reported cases and deaths broken out by country. This data can be visualized via ECDC’s Situation Dashboard . More information on ECDC’s response to COVID-19 is available here . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This dataset is hosted in both the EU and US regions of BigQuery. See the links below for the appropriate dataset copy: US region EU region This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. Users of ECDC public-use data files must comply with data use restrictions to ensure that the information will be used solely for statistical analysis or reporting purposes.

  13. P

    ViMedical_Disease Dataset

    • paperswithcode.com
    Updated Jul 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ViMedical_Disease Dataset [Dataset]. https://paperswithcode.com/dataset/vimedical-disease
    Explore at:
    Dataset updated
    Jul 27, 2024
    Description

    This dataset contains over 12K+ questions and symptoms related to various common diseases in Vietnamese. It's designed to aid in the classification of medical symptoms and provide preliminary disease identification. The dataset covers a wide range of diseases, including cardiovascular, digestive, neurological, dermatological, endocrine, and others.

    For more information and updates about the dataset, please refer to the main repository here.

    This dataset can be used for:

    Data analysis Building disease prediction models Creating chatbots Providing information to users

    The dataset has two columns: Disease: The name of the disease in Vietnamese. Question: Questions and descriptions of disease symptoms in Vietnamese, often posed as a query seeking information about a possible diagnosis.

    Important Notes: This dataset provides information on disease symptoms, not official medical diagnoses. Users should consult a doctor for proper diagnosis and treatment.

  14. a

    PHIDU - Prevalence of Chronic Diseases (PHA) 2017-2018 - Dataset - AURIN

    • data.aurin.org.au
    Updated Mar 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PHIDU - Prevalence of Chronic Diseases (PHA) 2017-2018 - Dataset - AURIN [Dataset]. https://data.aurin.org.au/dataset/tua-phidu-phidu-estimates-chronic-disease-pha-2017-18-pha2016
    Explore at:
    Dataset updated
    Mar 6, 2025
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    This dataset, released January 2020, contains for the time period of 2017-2018 the Estimated population, aged 18 years and over with diabetes mellitus; Estimated male population with mental and behavioural problems; Estimated female population with mental and behavioural problems; Estimated population with mental and behavioural problems; Estimated population with heart, stroke and vascular disease; Estimated population with asthma; Estimated population with chronic obstructive pulmonary disease; Estimated population with arthritis; Estimated population with osteoporosis; The data is by Population Health Area (PHA) 2016 geographic boundaries based on the 2016 Australian Statistical Geography Standard (ASGS). Population Health Areas, developed by PHIDU, are comprised of a combination of whole SA2s and multiple (aggregates of) SA2s, where the SA2 is an area in the ABS structure. For more information please see the data source notes on the data. Source: Estimates for Population Health Areas (PHAs) are modelled estimates and were produced by the ABS; estimates at the LGA and PHN level were derived from the PHA estimates. AURIN has spatially enabled the original data. Data that was not shown/not applicable/not published/not available for the specific area ('#', '..', '^', 'np, 'n.a.', 'n.y.a.' in original PHIDU data) was removed.It has been replaced by by Blank cells. For other keys and abbreviations refer to PHIDU Keys.

  15. m

    Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

    • data.mendeley.com
    Updated Jul 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
    Explore at:
    Dataset updated
    Jul 25, 2022
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

    Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

    Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

    The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.

  16. P

    Wheat Plant Diseases Dataset Dataset

    • paperswithcode.com
    Updated Mar 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Wheat Plant Diseases Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/wheat-plant-diseases-dataset
    Explore at:
    Dataset updated
    Mar 21, 2025
    Description

    Description:

    👉 Download the dataset here

    The Wheat Plant Diseases Dataset is a comprehensive collection of high-resolution images designed to assist researchers, agronomists, and developers in the development of advanced machine learning models for the classification and diagnosis of various wheat plant diseases. This dataset aims to contribute to the sustainable management of wheat crops by enabling the early detection and treatment of diseases, ultimately safeguarding food security.

    Download Dataset

    Dataset Content

    Total Number of Images: 14,155

    Image Quality: High-resolution images capturing real-world disease conditions, devoid of any artificial augmentations to preserve the authenticity and natural variability of the dataset.

    Disease Classes: The dataset covers a wide range of wheat plant diseases, categorized into the following classes:

    Pest-related Diseases:

    Aphid: A common pest known to cause yellowing and stunted growth in wheat plants.

    Mite: Tiny arachnids that feed on the plant sap, leading to discoloration and leaf curling.

    Stem Fly: Insects that lay eggs in the stems of wheat plants, causing structural damage and reduced yield.

    Fungal Diseases:

    Rusts: A group of fungal diseases, each causing different symptoms but all leading to significant crop loss.

    Black Rust / Stem Rust: Causes dark, elongated pustules on stems and leaves.

    Brown Rust / Leaf Rust: Results in orange-brown pustules primarily on the leaves.

    Yellow Rust / Stripe Rust: Characterized by yellow stripes running along the length of the leaves.

    Benefits of the Wheat Plant Diseases Dataset

    Extensive Coverage: With over 14,000 images, the dataset provides a robust foundation for developing machine learning models capable of identifying a wide range of wheat diseases.

    Authenticity: The dataset contains real-world images, free from artificial augmentation, ensuring that the trained models are more likely to perform well in practical scenarios.

    Educational Value: The inclusion of disease causes and visual monitoring guides makes this dataset not only a tool for machine learning but also an educational resource for understanding wheat plant health.

    Enhanced Agricultural Practices: By utilizing this dataset, stakeholders in agriculture can adopt more proactive and informed approaches to disease management, leading to healthier crops and higher yields.

    Conclusion

    The Wheat Plant Diseases Dataset is an indispensable resource for anyone involved in agricultural research, disease diagnosis, and crop management. Its extensive and varied image collection, coupled with detailed disease information, makes it a powerful tool for advancing wheat disease detection through Al and machine learning.

    This dataset is sourced from Kaggle.

  17. m

    Potato Leaf Disease Dataset in Uncontrolled Environment

    • data.mendeley.com
    Updated Nov 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nabila Husna Shabrina (2023). Potato Leaf Disease Dataset in Uncontrolled Environment [Dataset]. http://doi.org/10.17632/ptz377bwb8.1
    Explore at:
    Dataset updated
    Nov 10, 2023
    Authors
    Nabila Husna Shabrina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Existing potato leaf datasets might not accurately reflect the real-world conditions of potato leaf diseases because of the controlled environment in which the images were captured and the lack of information on disease type, which only captures diseases caused by fungi. Therefore, we obtained new primary data that offers several advantages over previous datasets and will better represent the various types of diseases commonly found on the leaves of potato plants. Our proposed dataset was captured in an uncontrolled setting, resulting in a wide range of variables, including the background and diverse directions and distances of the images. The dataset includes several classes of potato leaf diseases caused by fungi, viruses, pests, bacteria, Phytophthora, nematodes, and healthy leaves. The introduction of this new dataset will facilitate a more accurate representation of potato leaf diseases and will allow for the advancement of current research on potato leaf disease identification.

    Image size : 1500 x 1500 pixel Data format : .jpg Number of images : 3076 images Category : bacteria, fungi, healthy, nematode, pest, phytophthora, and virus Data source location : Central Java, Indonesia How data were acquired : Captured from potato farms located in Central Java, Indonesia, using several smartphone cameras.

  18. m

    PotatoCare: Deep learning based potato disease dataset

    • data.mendeley.com
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samiul Islam (2025). PotatoCare: Deep learning based potato disease dataset [Dataset]. http://doi.org/10.17632/7vm7xskfg4.2
    Explore at:
    Dataset updated
    Apr 25, 2025
    Authors
    Samiul Islam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of 10,117 images categorized into 10 classes, representing different potato diseases and healthy samples. The classes include Black Scurf (49 images), Blackleg (47), Blackspot Bruising (770), Brown Rot (105), Common Scab (60), Dry Rot (1,355), Healthy Potatoes (815), Miscellaneous (73), Pink Rot (57), and Soft Rot (560). The dataset was compiled from various sources and merged to create a diverse and representative collection of images. However, the distribution of images across classes is imbalanced, with some diseases like Dry Rot and Blackspot Bruising having significantly more samples than others like Blackleg and Pink Rot. This dataset is useful for training deep learning models for automated disease detection in potatoes, enabling early identification and reducing the risk of crop damage. The diverse nature of the dataset enhances model generalizability, making it suitable for real-world agricultural applications.

  19. Synthetic Gastrointestinal Disease Patient Records Dataset

    • opendatabay.com
    .undefined
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Gastrointestinal Disease Patient Records Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/02185296-ec00-4159-ba19-2df70ea680f6
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 14, 2025
    Dataset provided by
    Buy & Sell Data | Opendatabay - AI & Synthetic Data Marketplace
    Authors
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    The Synthetic Gastrointestinal Disease Dataset has been generated to support research, model development, and education related to gastrointestinal (GI) health. This comprehensive dataset captures a wide range of patient features, lifestyle factors, test results, symptoms, and clinical diagnoses to simulate real-world diagnostic complexity.

    Dataset Features

    • Age: Age of the patient in years.
    • Gender: Biological sex of the patient (M/F).
    • BMI: Body Mass Index.
    • Body_Weight: Patient's weight in kilograms.
    • Obesity_Status: Categorized as Normal, Overweight, or Obese based on BMI.
    • Ethnicity: Ethnic background (e.g., White, Hispanic, Asian, etc.).
    • Family_History: Indicates presence of family history of GI conditions (Yes/No).
    • Genetic_Markers: Count of relevant genetic risk markers detected.
    • Microbiome_Index: Numerical score representing gut microbiota diversity or imbalance.
    • Autoimmune_Disorders: Presence of autoimmune conditions (Yes/No).
    • H_Pylori_Status: Helicobacter pylori infection status (Yes/No).
    • Fecal_Calprotectin: Inflammatory marker measured in stool (numeric count).
    • Occult_Blood_Test: Result of hidden blood detection in stool (Positive/Negative).
    • CRP_ESR: Combined C-Reactive Protein / Erythrocyte Sedimentation Rate value, an inflammation marker.
    • Endoscopy_Result / Colonoscopy_Result / Stool_Culture: Clinical test results (e.g., Normal, Abnormal).
    • Diet_Type: Type of diet followed (e.g., Vegetarian, Western, etc.).
    • Food_Intolerance: Reported intolerances (Yes/No).
    • Smoking_Status / Alcohol_Use / Physical_Activity: Lifestyle habits.
    • Stress_Level: Reported level of psychological stress (Low/Moderate/High). Note: Some entries missing.
    • GI Symptoms: Includes:
      • Abdominal_Pain, Bloating, Diarrhea, Constipation
      • Rectal_Bleeding, Appetite_Loss, Weight_Loss
    • Bowel_Habits: Overall pattern (e.g., Normal, Frequent, Irregular).
    • Bowel_Movement_Frequency: Number of bowel movements per week.
    • Medication Use: Includes:
      • NSAID_Use (e.g., ibuprofen), Antibiotic_Use, PPI_Use (proton-pump inhibitors), Medications (Yes/No)
    • Disease_Class: Primary GI-related condition diagnosed (e.g., Blood in stool, Nausea or vomiting, Abdominal cramps or pain, Unexplained weight loss).

    Distribution

    https://storage.googleapis.com/opendatabay_public/02185296-ec00-4159-ba19-2df70ea680f6/e72683f668b8_eda_summary_plots.png" alt="Synthetic Gastrointestional Disease Patient Records Data Distribution.png">

    Usage

    This dataset is ideal for:

    • Disease Classification: Predict GI disease categories using symptoms and clinical test results.
    • Feature Importance Analysis: Understand contributing factors in diagnosis.
    • Pattern Mining: Detect associations among lifestyle, symptoms, and microbiome/genetic indicators.
    • Model Training: Useful for supervised learning (e.g., random forest, XGBoost) or unsupervised clustering.

    Coverage

    The data integrates symptoms, lifestyle, inflammation markers, test outcomes, and genetics—making it valuable for both biological and behavioral models of disease. It reflects realistic distributions of obesity, diet, and ethnicity found in contemporary populations.

    License

    CC0 (Public Domain)

    Who Can Use It

    • Medical Researchers and GI Specialists: For testing diagnostic hypotheses and exploring symptom clusters.
    • Data Scientists and ML Engineers: For building diagnostic classifiers or recommender systems.
    • Educators and Students: For practical exercises in predictive modeling and health analytics.
  20. R

    Black Pod Disease Dataset

    • universe.roboflow.com
    zip
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Red Oscar Lopez (2024). Black Pod Disease Dataset [Dataset]. https://universe.roboflow.com/red-oscar-lopez/black-pod-disease/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 28, 2024
    Dataset authored and provided by
    Red Oscar Lopez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Blac
    Description
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Prasanth Antonyraj (2020). Covid_19_Weather_Dataset [Dataset]. https://www.kaggle.com/johnprasanth/covid-19-weather-dataset/code
Organization logo

Covid_19_Weather_Dataset

Five country's dataset including Germany and Italy

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prasanth Antonyraj
Description

Context

This dataset contains weather details of five most important countries including Germany and Italy which was affected greatly with Covid_19 spread.

Content

It is believed that climate conditions might be one of the major reasons for the spread of covid_19. This Dataset contains climate changes occured from 19th February to 17th April 2020. This contains the climate changes recorded for every 10 mins on the aforementioned countries.

File Description

The file contains below columns:

Temperature - Actual Temperature Recorded in degree celsius Wind_speed - Wind Speed Description - Description of the current weather Weather - Categorical value depicts the types of weather name - Depicts the country name temp_min - Minimum temperature recorded temp_max - Maximum temperature recorded

Other variables are pretty much self explanatory.

Acknowledgements

As part of my thesis project, this dataset was being prepared with a help of web scraper which will trigger an open source REST API end point for every 10 minutes. It was hosted in an EC2 instance which will update a CSV file periodically. Thought that this could contribute for the analysis of Covid_19 spread, hence shared the same.

Hope this could be useful!

Inspiration

As mentioned earlier, Climate could be one of the significant factors which spreads covid_19. Need to analyse further on the same. Italy could be considered for the research as we have the climate data for that country. Alongside, this country was affected largely.

Search
Clear search
Close search
Google apps
Main menu