100+ datasets found

Covid_19_Weather_Dataset
kaggle.com
Updated Apr 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasanth Antonyraj (2020). Covid_19_Weather_Dataset [Dataset]. https://www.kaggle.com/johnprasanth/covid-19-weather-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prasanth Antonyraj
Description
Context

This dataset contains weather details of five most important countries including Germany and Italy which was affected greatly with Covid_19 spread.

Content

It is believed that climate conditions might be one of the major reasons for the spread of covid_19. This Dataset contains climate changes occured from 19th February to 17th April 2020. This contains the climate changes recorded for every 10 mins on the aforementioned countries.

File Description

The file contains below columns:

Temperature - Actual Temperature Recorded in degree celsius Wind_speed - Wind Speed Description - Description of the current weather Weather - Categorical value depicts the types of weather name - Depicts the country name temp_min - Minimum temperature recorded temp_max - Maximum temperature recorded

Other variables are pretty much self explanatory.

Acknowledgements

As part of my thesis project, this dataset was being prepared with a help of web scraper which will trigger an open source REST API end point for every 10 minutes. It was hosted in an EC2 instance which will update a CSV file periodically. Thought that this could contribute for the analysis of Covid_19 spread, hence shared the same.

Hope this could be useful!

Inspiration

As mentioned earlier, Climate could be one of the significant factors which spreads covid_19. Need to analyse further on the same. Italy could be considered for the research as we have the climate data for that country. Alongside, this country was affected largely.
m
Disease and symptoms dataset 2023
data.mendeley.com
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1
Explore at:
Unique identifier
https://doi.org/10.17632/2cxccsxydc.1
Dataset updated
Mar 3, 2025
Authors
Bran Stark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.
p
Cleveland Clinic Heart Disease Dataset - Dataset - CKAN
data.poltekkes-smg.ac.id
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Cleveland Clinic Heart Disease Dataset - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/cleveland-clinic-heart-disease-dataset
Explore at:
Dataset updated
Oct 8, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Coronary heart disease (CHD) involves the reduction of blood flow to the heart muscle due to build-up of plaque in the arteries of the heart. It is the most common form of cardiovascular disease. Currently, invasive coronary angiography represents the gold standard for establishing the presence, location, and severity of CAD, however this diagnostic method is costly and associated with morbidity and mortality in CAD patients. Therefore, it would be beneficial to develop a non-invasive alternative to replace the current gold standard. Other less invasive diagnostics methods have been proposed in the scientific literature including exercise electrocardiogram, thallium scintigraphy and fluoroscopy of coronary calcification. However the diagnostic accuracy of these tests only ranges between 35%-75%. Therefore, it would be beneficial to develop a computer aided diagnostic tool that could utilize the combined results of these non-invasive tests in conjunction with other patient attributes to boost the diagnostic power of these non-invasive methods with the aim ultimately replacing the current invasive gold standard.
A
‘Death Cause by Country’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Death Cause by Country’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-death-cause-by-country-3051/00ae526f/?iid=001-918&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Death Cause by Country’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/majyhain/death-cause-by-country on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

Across low- and middle-income countries, mortality from infectious disease, malnutrition, nutritional deficiencies, neonatal and maternal deaths are common – and in some cases, dominant. In Kenya, for example, diarrheal infections are still the primary cause of death. HIV/AIDS is the major cause of death in South Africa and Botswana. However, in high-income countries, the proportion of deaths due by these causes is quite low.

Content

The dataset contains thirty two columns and contains the death causes by All Genders (Male, Female) and by all age group.

Acknowledgements

Users are allowed to use, copy, distribute and cite the dataset as follows: “Majyhain, Death Causes by Country, Kaggle Dataset, February 04, 2022.”

Inspiration

The ideas for this data is to: • The amount of people dying by various diseases.

• What is the death cause reasons by country.

• Number of People dying by various diseases.

• Which disease is causing more deaths by country.

• Which disease is causing more deaths by world.

References:

The Data is collected from the following sites:

https://www.who.int/

--- Original source retains full ownership of the source dataset ---
i
Heart Disease Dataset (Comprehensive)
ieee-dataport.org
Updated Oct 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MANU SIDDHARTHA (2019). Heart Disease Dataset (Comprehensive) [Dataset]. https://ieee-dataport.org/open-access/heart-disease-dataset-comprehensive
Explore at:
Dataset updated
Oct 24, 2019
Authors
MANU SIDDHARTHA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset
i
Cardiovascular Disease Dataset
ieee-dataport.org
Updated Oct 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajib Kumar Halder Halder (2022). Cardiovascular Disease Dataset [Dataset]. https://ieee-dataport.org/documents/cardiovascular-disease-dataset
Explore at:
Dataset updated
Oct 25, 2022
Authors
Rajib Kumar Halder Halder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This heart disease dataset is curated by combining 3 popular heart disease datasets. The first dataset (Collected from Kaggle) contains 70000 records with 11 independent features which makes it the largest heart disease dataset available so far for research purposes. These data were collected at the moment of medical examination and information given by the patient. Second and third datasets contain 303 and 293 intstances respectively with 13 common features. The three datasets used for its curation are:Cardio Data (Kaggle Dataset)
Tomato-Village dataset
kaggle.com
Updated Aug 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mvgehlot (2023). Tomato-Village dataset [Dataset]. https://www.kaggle.com/datasets/mamtag/tomato-village/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 27, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mvgehlot
Description
Problem statement : Tomato is one of the most extensively grown vegetables in any country, and their diseases can significantly affect yield and quality. Accurate and early detection of tomato diseases is crucial for reducing losses and improving crop management. Current Deep Learning and CNN research have resulted in the availability of multiple CNN designs, making automated plant disease identification viable rather than traditional visual inspection-based disease detection. When using Deep Learning Methods, the dataset serves one of the most crucial roles in disease prediction. PlantVillage is the most widely used publicly available dataset for Tomato Disease detection, but it was created in a lab/controlled environment, and models trained on it do not perform well on real-world images. Some natural or real-world datasets are available, but they are private and not publicly available. Also, when attempting to predict tomato diseases on the field in the Jodhpur and Jaipur districts of Rajasthan, India, we found that the majority of diseases are Leaf Miner, spotted wilt virus, and Nutrition deficiency diseases, but there are no public datasets containing such categories.

Proposed Solution:To overcome these challenges, we propose the creation of a new dataset called "Tomato-Village" with three variants: a) Multiclass tomato disease classification, b) Multilabel tomato disease classification and c) Object detection based tomato disease detection. As per our best knowledge, “Tomato-Village” will be the first such dataset to be available publicly. Further, we have applied the various CNN architectures/models on this dataset, and baseline results are drawn.

To use the dataset , Please cite the below article : Gehlot, M., Saxena, R.K. & Gandhi, G.C. “Tomato-Village”: a dataset for end-to-end tomato disease detection in a real-world environment. Multimedia Systems (2023). DOI : https://doi.org/10.1007/s00530-023-01158-y

Article Link : https://link.springer.com/article/10.1007/s00530-023-01158-y
f
Data from: Full dataset.
plos.figshare.com
xlsx
Updated Nov 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josephine Bourner; Lovarivelo Andriamarohasina; Alex Salam; Nzelle Delphine Kayem; Rindra Randremanana; Piero Olliaro (2023). Full dataset. [Dataset]. http://doi.org/10.1371/journal.pntd.0011509.s006
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pntd.0011509.s006
Dataset updated
Nov 21, 2023
Dataset provided by
PLOS Neglected Tropical Diseases
Authors
Josephine Bourner; Lovarivelo Andriamarohasina; Alex Salam; Nzelle Delphine Kayem; Rindra Randremanana; Piero Olliaro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundPlague is a zoonotic disease that, despite affecting humans for more than 5000 years, has historically been the subject of limited drug development activity. Drugs that are currently recommended in treatment guidelines have been approved based on animal studies alone–no pivotal clinical trials in humans have yet been completed. As a result of the sparse clinical research attention received, there are a number of methodological challenges that need to be addressed in order to facilitate the collection of clinical trial data that can meaningfully inform clinicians and policy-makers. One such challenge is the identification of clinically-relevant endpoints, which are informed by understanding the clinical characterisation of the disease–how it presents and evolves over time, and important patient outcomes, and how these can be modified by treatment.Methodology/Principal findingsThis systematic review aims to summarise the clinical profile of 1343 patients with bubonic plague described in 87 publications, identified by searching bibliographic databases for studies that meet pre-defined eligibility criteria. The majority of studies were individual case reports. A diverse group of signs and symptoms were reported at baseline and post-baseline timepoints–the most common of which was presence of a bubo, for which limited descriptive and longitudinal information was available. Death occurred in 15% of patients; although this varied from an average 10% in high-income countries to an average 17% in low- and middle-income countries. The median time to death was 1 day, ranging from 0 to 16 days.Conclusions/SignificanceThis systematic review elucidates the restrictions that limited disease characterisation places on clinical trials for infectious diseases such as plague, which not only impacts the definition of trial endpoints but has the knock-on effect of challenging the interpretation of a trial’s results. For this reason and despite interventional trials for plague having taken place, questions around optimal treatment for plague persist.
Deaths from Liver Disease - Datasets - Lincolnshire Open Data
lincolnshire.ckan.io
Updated May 10, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.io (2017). Deaths from Liver Disease - Datasets - Lincolnshire Open Data [Dataset]. https://lincolnshire.ckan.io/dataset/deaths-from-liver-disease
Explore at:
Dataset updated
May 10, 2017
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This data shows premature deaths (Age under 75) from Liver Disease, numbers and rates by gender, as 3-year moving-averages. Most liver disease is preventable and much is influenced by alcohol consumption and obesity prevalence, which are both amenable to public health interventions. Directly Age-Standardised Rates (DASR) are shown in the data (where numbers are sufficient) so that death rates can be directly compared between areas. The DASR calculation applies Age-specific rates to a Standard (European) population to cancel out possible effects on crude rates due to different age structures among populations, thus enabling direct comparisons of rates. A limitation on using mortalities as a proxy for prevalence of health conditions is that mortalities may give an incomplete view of health conditions in an area, as ill-health might not lead to premature death. Low numbers may result in zero values or missing data. Data source: Office for Health Improvement and Disparities (OHID), Public Health Outcomes Framework (PHOF) indicator 40601 (E06a). The data is updated annually.
m
Covid-19 latest news dataset
data.mendeley.com
Updated Oct 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajat Thakur (2021). Covid-19 latest news dataset [Dataset]. http://doi.org/10.17632/8rbm7d874k.1
Explore at:
Unique identifier
https://doi.org/10.17632/8rbm7d874k.1
Dataset updated
Oct 27, 2021
Authors
Rajat Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Coronavirus disease 2019 (COVID19) time series that lists confirmed cases, reported deaths, and reported recoveries. Data is broken down by country (and sometimes by sub-region).

Coronavirus disease (COVID19) is caused by severe acute respiratory syndrome Coronavirus 2 (SARSCoV2) and has had an effect worldwide. On March 11, 2020, the World Health Organization (WHO) declared it a pandemic, currently indicating more than 118,000 cases of coronavirus disease in more than 110 countries and territories around the world.

This dataset contains the latest news related to Covid-19 and it was fetched with the help of Newsdata.io news API.

BRFSS 2020 Heart Disease Dataset(Cleaned Version)

zenodo.org

csv

Updated May 8, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Koushal Kumar; BP Pande; Koushal Kumar; BP Pande (2025). BRFSS 2020 Heart Disease Dataset(Cleaned Version) [Dataset]. http://doi.org/10.5281/zenodo.15364962

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15364962

Dataset updated

May 8, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Koushal Kumar; BP Pande; Koushal Kumar; BP Pande

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Originally, the dataset come from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]".

To improve the efficiency and relevance of our analysis, we removed certain attributes from the original BRFSS dataset. Many of the 279 original attributes included administrative codes, metadata, or survey-specific variables that do not contribute meaningfully to heart disease prediction—such as respondent IDs, timestamps, state-level identifiers, and detailed lifestyle questions unrelated to cardiovascular health. By focusing on a carefully selected subset of 18 attributes directly linked to medical, behavioral, and demographic factors known to influence heart health, we streamlined the dataset. This not only reduced computational complexity but also improved model interpretability and performance by eliminating noise and irrelevant information. All predicting variables could be divided into 4 broad categories:

Demographic factors: sex, age category (14 levels), race, BMI (Body Mass Index)
Diseases: weather respondent ever had such diseases as asthma, skin cancer, diabetes, stroke or kidney disease (not including kidney stones, bladder infection or incontinence)
Unhealthy habits:
- Smoking - respondents that smoked at least 100 cigarettes in their entire life (5 packs = 100 cigarettes)
- Alcohol Drinking - heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
General Health:
- Difficulty Walking - weather respondent have serious difficulty walking or climbing stairs
- Physical Activity - adults who reported doing physical activity or exercise during the past 30 days other than their regular job
- Sleep Time - respondent’s reported average hours of sleep in a 24-hour period
- Physical Health - number of days being physically ill or injured (0-30 days)
- Mental Health - number of days having bad mental health (0-30 days)
- General Health - respondents declared their health as ’Excellent’, ’Very good’, ’Good’ ,’Fair’ or ’Poor’

Below is a description of the features collected for each patient:

<td style="width:

S. No.	Original Variable/Attribute	Coded Variable/Attribute	Interpretation
1.	CVDINFR4	HeartDisease	Those who have ever had CHD or myocardial infarction
2.	_BMI5CAT	BMI	Body Mass Index
3.	_SMOKER3	Smoking	Have you ever smoked more than 100 cigarettes in your life? (The answer is either yes or no)
4.	_RFDRHV7	AlcoholDrinking	Adult men who drink more than 14 drinks per week and adult women who consume more than 7 drinks per week are considered heavy drinkers
5.	CVDSTRK3	Stroke	(Ever told) (you had) a stroke?
6.	PHYSHLTH	PhysicalHealth	It includes physical illness and injury during the past 30 days
7.	MENTHLTH	MentalHealth	How many days in the last 30 days have you had poor mental health?
8.	DIFFWALK	DiffWalking	Are you having trouble walking or climbing stairs?
9.	SEXVAR	Sex	Are you male or female?
10.	_AGE_G	AgeCategory	Out of given fourteen age groups, which group do you fall into?

COVID-19 Cases by Country
console.cloud.google.com
Updated Jul 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:European%20Centre%20for%20Disease%20Prevention%20and%20Control&inv=1&invt=Ab2tgg (2020). COVID-19 Cases by Country [Dataset]. https://console.cloud.google.com/marketplace/product/european-cdc/covid-19-global-cases
Explore at:
Dataset updated
Jul 23, 2020
Dataset provided by
Googlehttp://google.com/
Description
This dataset is maintained by the European Centre for Disease Prevention and Control (ECDC) and reports on the geographic distribution of COVID-19 cases worldwide. This data includes COVID-19 reported cases and deaths broken out by country. This data can be visualized via ECDC’s Situation Dashboard . More information on ECDC’s response to COVID-19 is available here . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This dataset is hosted in both the EU and US regions of BigQuery. See the links below for the appropriate dataset copy: US region EU region This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. Users of ECDC public-use data files must comply with data use restrictions to ensure that the information will be used solely for statistical analysis or reporting purposes.
P
ViMedical_Disease Dataset
paperswithcode.com
Updated Jul 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). ViMedical_Disease Dataset [Dataset]. https://paperswithcode.com/dataset/vimedical-disease
Explore at:
Dataset updated
Jul 27, 2024
Description
This dataset contains over 12K+ questions and symptoms related to various common diseases in Vietnamese. It's designed to aid in the classification of medical symptoms and provide preliminary disease identification. The dataset covers a wide range of diseases, including cardiovascular, digestive, neurological, dermatological, endocrine, and others.

For more information and updates about the dataset, please refer to the main repository here.

This dataset can be used for:

Data analysis Building disease prediction models Creating chatbots Providing information to users

The dataset has two columns: Disease: The name of the disease in Vietnamese. Question: Questions and descriptions of disease symptoms in Vietnamese, often posed as a query seeking information about a possible diagnosis.

Important Notes: This dataset provides information on disease symptoms, not official medical diagnoses. Users should consult a doctor for proper diagnosis and treatment.
a
PHIDU - Prevalence of Chronic Diseases (PHA) 2017-2018 - Dataset - AURIN
data.aurin.org.au
Updated Mar 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PHIDU - Prevalence of Chronic Diseases (PHA) 2017-2018 - Dataset - AURIN [Dataset]. https://data.aurin.org.au/dataset/tua-phidu-phidu-estimates-chronic-disease-pha-2017-18-pha2016
Explore at:
Dataset updated
Mar 6, 2025
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
This dataset, released January 2020, contains for the time period of 2017-2018 the Estimated population, aged 18 years and over with diabetes mellitus; Estimated male population with mental and behavioural problems; Estimated female population with mental and behavioural problems; Estimated population with mental and behavioural problems; Estimated population with heart, stroke and vascular disease; Estimated population with asthma; Estimated population with chronic obstructive pulmonary disease; Estimated population with arthritis; Estimated population with osteoporosis; The data is by Population Health Area (PHA) 2016 geographic boundaries based on the 2016 Australian Statistical Geography Standard (ASGS). Population Health Areas, developed by PHIDU, are comprised of a combination of whole SA2s and multiple (aggregates of) SA2s, where the SA2 is an area in the ABS structure. For more information please see the data source notes on the data. Source: Estimates for Population Health Areas (PHAs) are modelled estimates and were produced by the ABS; estimates at the LGA and PHN level were derived from the PHA estimates. AURIN has spatially enabled the original data. Data that was not shown/not applicable/not published/not available for the specific area ('#', '..', '^', 'np, 'n.a.', 'n.y.a.' in original PHIDU data) was removed.It has been replaced by by Blank cells. For other keys and abbreviations refer to PHIDU Keys.
m
Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...
data.mendeley.com
Updated Jul 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
Explore at:
Unique identifier
https://doi.org/10.17632/xmcg82mx9k.3
Dataset updated
Jul 25, 2022
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.
P
Wheat Plant Diseases Dataset Dataset
paperswithcode.com
Updated Mar 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Wheat Plant Diseases Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/wheat-plant-diseases-dataset
Explore at:
Dataset updated
Mar 21, 2025
Description
Description:

👉 Download the dataset here

The Wheat Plant Diseases Dataset is a comprehensive collection of high-resolution images designed to assist researchers, agronomists, and developers in the development of advanced machine learning models for the classification and diagnosis of various wheat plant diseases. This dataset aims to contribute to the sustainable management of wheat crops by enabling the early detection and treatment of diseases, ultimately safeguarding food security.

Download Dataset

Dataset Content

Total Number of Images: 14,155

Image Quality: High-resolution images capturing real-world disease conditions, devoid of any artificial augmentations to preserve the authenticity and natural variability of the dataset.

Disease Classes: The dataset covers a wide range of wheat plant diseases, categorized into the following classes:

Pest-related Diseases:

Aphid: A common pest known to cause yellowing and stunted growth in wheat plants.

Mite: Tiny arachnids that feed on the plant sap, leading to discoloration and leaf curling.

Stem Fly: Insects that lay eggs in the stems of wheat plants, causing structural damage and reduced yield.

Fungal Diseases:

Rusts: A group of fungal diseases, each causing different symptoms but all leading to significant crop loss.

Black Rust / Stem Rust: Causes dark, elongated pustules on stems and leaves.

Brown Rust / Leaf Rust: Results in orange-brown pustules primarily on the leaves.

Yellow Rust / Stripe Rust: Characterized by yellow stripes running along the length of the leaves.

Benefits of the Wheat Plant Diseases Dataset

Extensive Coverage: With over 14,000 images, the dataset provides a robust foundation for developing machine learning models capable of identifying a wide range of wheat diseases.

Authenticity: The dataset contains real-world images, free from artificial augmentation, ensuring that the trained models are more likely to perform well in practical scenarios.

Educational Value: The inclusion of disease causes and visual monitoring guides makes this dataset not only a tool for machine learning but also an educational resource for understanding wheat plant health.

Enhanced Agricultural Practices: By utilizing this dataset, stakeholders in agriculture can adopt more proactive and informed approaches to disease management, leading to healthier crops and higher yields.

Conclusion

The Wheat Plant Diseases Dataset is an indispensable resource for anyone involved in agricultural research, disease diagnosis, and crop management. Its extensive and varied image collection, coupled with detailed disease information, makes it a powerful tool for advancing wheat disease detection through Al and machine learning.

This dataset is sourced from Kaggle.
m
Potato Leaf Disease Dataset in Uncontrolled Environment
data.mendeley.com
Updated Nov 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nabila Husna Shabrina (2023). Potato Leaf Disease Dataset in Uncontrolled Environment [Dataset]. http://doi.org/10.17632/ptz377bwb8.1
Explore at:
Unique identifier
https://doi.org/10.17632/ptz377bwb8.1
Dataset updated
Nov 10, 2023
Authors
Nabila Husna Shabrina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Existing potato leaf datasets might not accurately reflect the real-world conditions of potato leaf diseases because of the controlled environment in which the images were captured and the lack of information on disease type, which only captures diseases caused by fungi. Therefore, we obtained new primary data that offers several advantages over previous datasets and will better represent the various types of diseases commonly found on the leaves of potato plants. Our proposed dataset was captured in an uncontrolled setting, resulting in a wide range of variables, including the background and diverse directions and distances of the images. The dataset includes several classes of potato leaf diseases caused by fungi, viruses, pests, bacteria, Phytophthora, nematodes, and healthy leaves. The introduction of this new dataset will facilitate a more accurate representation of potato leaf diseases and will allow for the advancement of current research on potato leaf disease identification.

Image size : 1500 x 1500 pixel Data format : .jpg Number of images : 3076 images Category : bacteria, fungi, healthy, nematode, pest, phytophthora, and virus Data source location : Central Java, Indonesia How data were acquired : Captured from potato farms located in Central Java, Indonesia, using several smartphone cameras.
m
PotatoCare: Deep learning based potato disease dataset
data.mendeley.com
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samiul Islam (2025). PotatoCare: Deep learning based potato disease dataset [Dataset]. http://doi.org/10.17632/7vm7xskfg4.2
Explore at:
Unique identifier
https://doi.org/10.17632/7vm7xskfg4.2
Dataset updated
Apr 25, 2025
Authors
Samiul Islam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of 10,117 images categorized into 10 classes, representing different potato diseases and healthy samples. The classes include Black Scurf (49 images), Blackleg (47), Blackspot Bruising (770), Brown Rot (105), Common Scab (60), Dry Rot (1,355), Healthy Potatoes (815), Miscellaneous (73), Pink Rot (57), and Soft Rot (560). The dataset was compiled from various sources and merged to create a diverse and representative collection of images. However, the distribution of images across classes is imbalanced, with some diseases like Dry Rot and Blackspot Bruising having significantly more samples than others like Blackleg and Pink Rot. This dataset is useful for training deep learning models for automated disease detection in potatoes, enabling early identification and reducing the risk of crop damage. The diverse nature of the dataset enhances model generalizability, making it suitable for real-world agricultural applications.
Synthetic Gastrointestinal Disease Patient Records Dataset
opendatabay.com
.undefined
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Gastrointestinal Disease Patient Records Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/02185296-ec00-4159-ba19-2df70ea680f6
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 14, 2025
Dataset provided by
Buy & Sell Data | Opendatabay - AI & Synthetic Data Marketplace
Authors
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Patient Health Records & Digital Health
Description
The Synthetic Gastrointestinal Disease Dataset has been generated to support research, model development, and education related to gastrointestinal (GI) health. This comprehensive dataset captures a wide range of patient features, lifestyle factors, test results, symptoms, and clinical diagnoses to simulate real-world diagnostic complexity.

Dataset Features

Age: Age of the patient in years.

Gender: Biological sex of the patient (M/F).

BMI: Body Mass Index.

Body_Weight: Patient's weight in kilograms.

Obesity_Status: Categorized as Normal, Overweight, or Obese based on BMI.

Ethnicity: Ethnic background (e.g., White, Hispanic, Asian, etc.).

Family_History: Indicates presence of family history of GI conditions (Yes/No).

Genetic_Markers: Count of relevant genetic risk markers detected.

Microbiome_Index: Numerical score representing gut microbiota diversity or imbalance.

Autoimmune_Disorders: Presence of autoimmune conditions (Yes/No).

H_Pylori_Status: Helicobacter pylori infection status (Yes/No).

Fecal_Calprotectin: Inflammatory marker measured in stool (numeric count).

Occult_Blood_Test: Result of hidden blood detection in stool (Positive/Negative).

CRP_ESR: Combined C-Reactive Protein / Erythrocyte Sedimentation Rate value, an inflammation marker.

Endoscopy_Result / Colonoscopy_Result / Stool_Culture: Clinical test results (e.g., Normal, Abnormal).

Diet_Type: Type of diet followed (e.g., Vegetarian, Western, etc.).

Food_Intolerance: Reported intolerances (Yes/No).

Smoking_Status / Alcohol_Use / Physical_Activity: Lifestyle habits.

Stress_Level: Reported level of psychological stress (Low/Moderate/High). Note: Some entries missing.

GI Symptoms: Includes:

Abdominal_Pain, Bloating, Diarrhea, Constipation

Rectal_Bleeding, Appetite_Loss, Weight_Loss

Bowel_Habits: Overall pattern (e.g., Normal, Frequent, Irregular).

Bowel_Movement_Frequency: Number of bowel movements per week.

Medication Use: Includes:

NSAID_Use (e.g., ibuprofen), Antibiotic_Use, PPI_Use (proton-pump inhibitors), Medications (Yes/No)

Disease_Class: Primary GI-related condition diagnosed (e.g., Blood in stool, Nausea or vomiting, Abdominal cramps or pain, Unexplained weight loss).

Distribution

https://storage.googleapis.com/opendatabay_public/02185296-ec00-4159-ba19-2df70ea680f6/e72683f668b8_eda_summary_plots.png" alt="Synthetic Gastrointestional Disease Patient Records Data Distribution.png">

Usage

This dataset is ideal for:

Disease Classification: Predict GI disease categories using symptoms and clinical test results.

Feature Importance Analysis: Understand contributing factors in diagnosis.

Pattern Mining: Detect associations among lifestyle, symptoms, and microbiome/genetic indicators.

Model Training: Useful for supervised learning (e.g., random forest, XGBoost) or unsupervised clustering.

Coverage

The data integrates symptoms, lifestyle, inflammation markers, test outcomes, and genetics—making it valuable for both biological and behavioral models of disease. It reflects realistic distributions of obesity, diet, and ethnicity found in contemporary populations.

License

CC0 (Public Domain)

Who Can Use It

Medical Researchers and GI Specialists: For testing diagnostic hypotheses and exploring symptom clusters.

Data Scientists and ML Engineers: For building diagnostic classifiers or recommender systems.

Educators and Students: For practical exercises in predictive modeling and health analytics.
R
Black Pod Disease Dataset
universe.roboflow.com
zip
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Red Oscar Lopez (2024). Black Pod Disease Dataset [Dataset]. https://universe.roboflow.com/red-oscar-lopez/black-pod-disease/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Nov 28, 2024
Dataset authored and provided by
Red Oscar Lopez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Blac
Description
This work contains images captured by myself, but most of the images come from the following links. I would like to thank the authors of the respective projects: 1. https://search.app?link=https%3A%2F%2Fwww.kaggle.com%2Fdatasets%2Fzaldyjr%2Fcacao-diseases&utm_campaign=aga&utm_source=agsadl1%2Csh%2Fx%2Fgs%2Fm2%2F4 2. https://search.app?link=https%3A%2F%2Funiverse.roboflow.com%2Fmiss-nyarko-s2gtm%2Fcocoa-disease-detection%2Fbrowse%3FqueryText%3Dclass%253AFROSTYPOD%26pageSize%3D50%26startingIndex%3D0%26browseQuery%3Dtrue&utm_campaign=aga&utm_source=agsadl1%2Csh%2Fx%2Fgs%2Fm2%2F4 3. https://www.kaggle.com/datasets/serranosebas/enfermedades-cacao-yolov4 4. https://universe.roboflow.com/class-oirrr/enfermedades-del-cacao/dataset/ 5. https://universe.roboflow.com/lab-apkst/cocoa-fxvcr/browse?queryText=&pageSize=50&startingIndex=0&browseQuery=true

I selected the images that were most useful for my model in question, since I am evaluating saturation at -25% and +25%, and distances of 10 cm and 30 cm, so quite a few images look similar.

Facebook

Twitter

Click to copy link

Link copied

Cite

Prasanth Antonyraj (2020). Covid_19_Weather_Dataset [Dataset]. https://www.kaggle.com/johnprasanth/covid-19-weather-dataset/code

Covid_19_Weather_Dataset

Five country's dataset including Germany and Italy

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 17, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Prasanth Antonyraj

Description

Context

This dataset contains weather details of five most important countries including Germany and Italy which was affected greatly with Covid_19 spread.

Content

It is believed that climate conditions might be one of the major reasons for the spread of covid_19. This Dataset contains climate changes occured from 19th February to 17th April 2020. This contains the climate changes recorded for every 10 mins on the aforementioned countries.

File Description

The file contains below columns:

Temperature - Actual Temperature Recorded in degree celsius Wind_speed - Wind Speed Description - Description of the current weather Weather - Categorical value depicts the types of weather name - Depicts the country name temp_min - Minimum temperature recorded temp_max - Maximum temperature recorded

Other variables are pretty much self explanatory.

Acknowledgements

As part of my thesis project, this dataset was being prepared with a help of web scraper which will trigger an open source REST API end point for every 10 minutes. It was hosted in an EC2 instance which will update a CSV file periodically. Thought that this could contribute for the analysis of Covid_19 spread, hence shared the same.

Hope this could be useful!

Inspiration

As mentioned earlier, Climate could be one of the significant factors which spreads covid_19. Need to analyse further on the same. Italy could be considered for the research as we have the climate data for that country. Alongside, this country was affected largely.

Clear search

Close search

Google apps

Main menu

Covid_19_Weather_Dataset

Context

Content

File Description

Acknowledgements

Inspiration

Disease and symptoms dataset 2023

Cleveland Clinic Heart Disease Dataset - Dataset - CKAN

‘Death Cause by Country’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

References:

Heart Disease Dataset (Comprehensive)

Cardiovascular Disease Dataset

Tomato-Village dataset

Data from: Full dataset.

Deaths from Liver Disease - Datasets - Lincolnshire Open Data

Covid-19 latest news dataset

BRFSS 2020 Heart Disease Dataset(Cleaned Version)

COVID-19 Cases by Country

ViMedical_Disease Dataset

PHIDU - Prevalence of Chronic Diseases (PHA) 2017-2018 - Dataset - AURIN

Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

Wheat Plant Diseases Dataset Dataset

Potato Leaf Disease Dataset in Uncontrolled Environment

PotatoCare: Deep learning based potato disease dataset

Synthetic Gastrointestinal Disease Patient Records Dataset

Dataset Features

Distribution

Usage

Coverage

License

Who Can Use It

Black Pod Disease Dataset

Covid_19_Weather_Dataset

Five country's dataset including Germany and Italy

Context

Content

File Description

Acknowledgements

Inspiration