12 datasets found
  1. Cancer survival in England - adults diagnosed

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Aug 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2019). Cancer survival in England - adults diagnosed [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/cancersurvivalratescancersurvivalinenglandadultsdiagnosed
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 12, 2019
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    One-year and five-year net survival for adults (15-99) in England diagnosed with one of 29 common cancers, by age and sex.

  2. Cancer and Deaths Dataset : 1990~2019 Globally

    • kaggle.com
    zip
    Updated Feb 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Belayet HossainDS (2023). Cancer and Deaths Dataset : 1990~2019 Globally [Dataset]. https://www.kaggle.com/belayethossainds/cancer-and-deaths-dataset-19902019-globally
    Explore at:
    zip(2214619 bytes)Available download formats
    Dataset updated
    Feb 25, 2023
    Authors
    Belayet HossainDS
    Description

    https://max-website20-images.s3.ap-south-1.amazonaws.com/MHC_Digital_Treatments_Available_For_Blood_Cancer_Part_13_925x389pix_150322n_01_dc4d07f20e.jpg" alt="Is Blood Cancer Curable - Types, Diagnosis & Cure | Max Hospital">

    The "Cancer and Deaths Dataset: 1990~2019 Globally" is a comprehensive dataset containing information on cancer incidence and mortality rates across the world from 1990 to 2019.

    The dataset is an excellent resource for researchers, healthcare professionals, and policymakers who are interested in understanding the global burden of cancer and its impact on populations.

    [ Total 9 file , 160 columns, All the countries, 30Years data ]

    >In 2017, 9.6 million people are estimated to have died from the various forms of cancer. Every sixth death in the world is due to cancer, making it the second leading cause of death – second only to cardiovascular diseases.1
    
    Progress against many other causes of deaths and demographic drivers of increasing population size, life expectancy and — particularly in higher-income countries — aging populations mean that the total number of cancer deaths continues to increase. This is a very personal topic to many: nearly everyone knows or has lost someone dear to them from this collection of diseases.
    

    ## Data vastness of this dataset: 01. annual-number-of-deaths-by-cause data. 02. total-cancer-deaths-by-type data. 03. cancer-death-rates-by-age data. 04. share-of-population-with-cancer-types data. 05. share-of-population-with-cancer data. 06. number-of-people-with-cancer-by-age data. 07. share-of-population-with-cancer-by-age data. 08. disease-burden-rates-by-cancer-types data. 09. cancer-deaths-rate-and-age-standardized-rate-index data.

  3. Crowds Cure Cancer 2017

    • kaggle.com
    zip
    Updated Jun 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K Scott Mader (2018). Crowds Cure Cancer 2017 [Dataset]. https://www.kaggle.com/kmader/crowds-cure-cancer-2017
    Explore at:
    zip(20147893029 bytes)Available download formats
    Dataset updated
    Jun 11, 2018
    Authors
    K Scott Mader
    Description

    Context

    (from the original page) Many Cancers routinely identified by imaging haven’t yet benefited from recent advances in computer science. Approaches such as machine learning and deep learning can generate quantitative tumor 3D volumes, complex features and therapy-tracking temporal dynamics. However, cross-disciplinary researchers striving to develop new approaches often lack disease understanding or sufficient contacts within the medical community. Their research can greatly benefit from labeling and annotating basic information in the images such as tumor locations, which are obvious to radiologists.

    Crowd-sourcing the creation of publicly-accessible reference data sets could address this challenge. In 2011 the National Cancer Institute funded development of The Cancer Imaging Archive (TCIA), a free and open-access database of medical images. However, most of these collections lack the labeling and annotations needed by image processing researchers for progress in deep learning and radiomics. As a result, TCIA has partnered with the Radiological Society of North America (RSNA) and numerous academic centers to harness the vast knowledge of RSNA meeting attendees to generate these tumor markups.

    Content

    The csv file contains a list of all annotations on the images organized by author, disease type, location and patient There are two subfolders

    1. annotated_dicoms: contains all of the DICOM slices referenced in the CSV file (but nothing else, no above / below and no full patient context)
    2. compressed_stacks: the nifti (.nii.gz) stacks of the entire scans corresponding to around 70% (file size limit of Kaggle) of the data. The nifti files are much more useful for testing models since you won't know the slices to look for apriori.

    Acknowledgements

    The original dataset was downloaded from https://wiki.cancerimagingarchive.net/plugins/servlet/mobile?contentId=33948774#content/view/33948774 The citation for the data should be used as below: Jayashree Kalpathy-Cramer, Andrew Beers, Artem Mamonov, Erik Ziegler, Rob Lewis, Andre Botelho Almeida, Gordon Harris, Steve Pieper, Ashish Sharma, Lawrence Tarbox, Jeff Tobler, Fred Prior, Adam Flanders, Jamie Dulkowski, Brenda Fevrier-Sullivan, Carl Jaffe, John Freymann, Justin Kirby. Crowds Cure Cancer: Data collected at the RSNA 2017 annual meeting. The Cancer Imaging Archive. doi: 10.7937/K9/TCIA.2018.OW73VLO2

    Inspiration

    The work was done by volunteer, unpaid radiologists and non-radiologists, which makes it a very unreliable dataset. Even in the example image it is clear the definition of a tumor and where its boundaries are varies from person to person.

    The biggest question is how do you perform quality control?

    How can you determine which annotators create the best data?

    Are bad annotations useful or should they be deleted?

  4. d

    Cancer Survival in England

    • digital.nhs.uk
    Updated Feb 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Cancer Survival in England [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/cancer-survival-in-england
    Explore at:
    Dataset updated
    Feb 16, 2023
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Description

    This release summarises the survival of adults diagnosed with cancer in England between 2016 and 2020 and followed to 2021, and children diagnosed with cancer in England between 2002 and 2020 and followed to 2021. Adult cancer survival estimates are presented by age, deprivation, gender, stage at diagnosis, and geography.

  5. f

    Data from: S1 Dataset -

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Feb 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jie, Wang; Khan, Rabnawaz (2025). S1 Dataset - [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001306137
    Explore at:
    Dataset updated
    Feb 20, 2025
    Authors
    Jie, Wang; Khan, Rabnawaz
    Description

    Cancer, the second-leading cause of mortality, kills 16% of people worldwide. Unhealthy lifestyles, smoking, alcohol abuse, obesity, and a lack of exercise have been linked to cancer incidence and mortality. However, it is hard. Cancer and lifestyle correlation analysis and cancer incidence and mortality prediction in the next several years are used to guide people’s healthy lives and target medical financial resources. Two key research areas of this paper are Data preprocessing and sample expansion design Using experimental analysis and comparison, this study chooses the best cubic spline interpolation technology on the original data from 32 entry points to 420 entry points and converts annual data into monthly data to solve the problem of insufficient correlation analysis and prediction. Factor analysis is possible because data sources indicate changing factors. TSA-LSTM Two-stage attention design a popular tool with advanced visualization functions, Tableau, simplifies this paper’s study. Tableau’s testing findings indicate it cannot analyze and predict this paper’s time series data. LSTM is utilized by the TSA-LSTM optimization model. By commencing with input feature attention, this model attention technique guarantees that the model encoder converges to a subset of input sequence features during the prediction of output sequence features. As a result, the model’s natural learning trend and prediction quality are enhanced. The second step, time performance attention, maintains We can choose network features and improve forecasts based on real-time performance. Validating the data source with factor correlation analysis and trend prediction using the TSA-LSTM model Most cancers have overlapping risk factors, and excessive drinking, lack of exercise, and obesity can cause breast, colorectal, and colon cancer. A poor lifestyle directly promotes lung, laryngeal, and oral cancers, according to visual tests. Cancer incidence is expected to climb 18–21% between 2020 and 2025, according to 2021. Long-term projection accuracy is 98.96 percent, and smoking and obesity may be the main cancer causes.

  6. Demographic Trends and Health Outcomes in the U.S

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Demographic Trends and Health Outcomes in the U.S [Dataset]. https://www.kaggle.com/datasets/thedevastator/demographic-trends-and-health-outcomes-in-the-u
    Explore at:
    zip(1726637 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    Demographic Trends and Health Outcomes in the U.S

    Inequalities,Risk Factors and Access to Care

    By Data Society [source]

    About this dataset

    This dataset contains key demographic, health status indicators and leading cause of death data to help us understand the current trends and health outcomes in communities across the United States. By looking at this data, it can be seen how different states, counties and populations have changed over time. With this data we can analyze levels of national health services use such as vaccination rates or mammography rates; review leading causes of death to create public policy initiatives; as well as identify risk factors for specific conditions that may be associated with certain populations or regions. The information from these files includes State FIPS Code, County FIPS Code, CHSI County Name, CHSI State Name, CHSI State Abbreviation, Influenza B (FluB) report count & expected cases rate per 100K population , Hepatitis A (HepA) Report Count & expected cases rate per 100K population , Hepatitis B (HepB) Report Count & expected cases rate per 100K population , Measles (Meas) Report Count & expected cases rate per 100K population , Pertussis(Pert) Report Count & expected case rate per 100K population , CRS report count & expected case rate per 100K population , Syphilis report count and expected case rate per 100k popuation. We also look at measures related to preventive care services such as Pap smear screen among women aged 18-64 years old check lower/upper confidence intervals seperately ; Mammogram checks among women aged 40-64 years old specified lower/upper conifence intervals separetly ; Colonosopy/ Proctoscpushy among men aged 50+ measured in lower/upper limits ; Pneumonia Vaccination amongst 65+ with loewr/upper confidence level detail Additionally we have some interesting trend indicating variables like measures of birth adn death which includes general fertility ratye ; Teen Birth Rate by Mother's age group etc Summary Measures covers mortality trend following life expectancy by sex&age categories Vressionable populations access info gives us insight into disablilty ratio + access to envtiromental issues due to poor quality housing facilities Finally Risk Factors cover speicfic hoslitic condtiions suchs asthma diagnosis prevelance cancer diabetes alcholic abuse smoking trends All these information give a good understanding on Healthy People 2020 target setings demograpihcally speaking hence will aid is generating more evience backed policies

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    What the Dataset Contains

    This dataset contains valuable information about public health relevant to each county in the United States, broken down into 9 indicator domains: Demographics, Leading Causes of Death, Summary Measures of Health, Measures of Birth and Death Rates, Relative Health Importance, Vulnerable Populations and Environmental Health Conditions, Preventive Services Use Data from BRFSS Survey System Data , Risk Factors and Access to Care/Health Insurance Coverage & State Developed Types of Measurements such as CRS with Multiple Categories Identified for Each Type . The data includes indicators such as percentages or rates for influenza (FLU), hepatitis (HepA/B), measles(MEAS) pertussis(PERT), syphilis(Syphilis) , cervical cancer (CI_Min_Pap_Smear - CI_Max\Pap \Smear), breast cancer (CI\Min Mammogram - CI \Max \Mammogram ) proctoscopy (CI Min Proctoscopy - CI Max Proctoscopy ), pneumococcal vaccinations (Ci min Pneumo Vax - Ci max Pneumo Vax )and flu vaccinations (Ci min Flu Vac - Ci Max Flu Vac). Additionally , it provides information on leading causes of death at both county levels & national level including age-adjusted mortality rates due to suicide among teens aged between 15-19 yrs per 100000 population etc.. Furthermore , summary measures such as age adjusted percentage who consider their physical health fair or poor are provided; vulnerable populations related indicators like relative importance score for disabled adults ; preventive service use related ones ranging from self reported vaccination coverage among men40-64 yrs old against hepatitis B virus etc...

    Getting Started With The Dataset

    To get started with exploring this dataset first your need to understand what each column in the table represents: State FIPS Code identifies a unique identifier used by various US government agencies which denote states . County FIPS code denotes counties wi...

  7. Aishwarya

    • kaggle.com
    zip
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AISHWARYA ROY (2025). Aishwarya [Dataset]. https://www.kaggle.com/datasets/aishwarya2025009/aishwarya
    Explore at:
    zip(243325896 bytes)Available download formats
    Dataset updated
    May 12, 2025
    Authors
    AISHWARYA ROY
    Description

    About Dataset This dataset includes preprocessed brain MRI images for tumor classification tasks. Each image is available in three versions: 1. Normal 2. Noisy 3. Blur All images are categorized into their respective classes (e.g., glioma, meningioma, pituitary, no tumor) and stored in structured folders, making it ideal for training deep learning models for classification or segmentation tasks. Abstract Brain tumors are among the most aggressive and life-threatening diseases in both children and adults. Approximately 11,700 individuals are diagnosed each year, with a survival rate as low as 34% for men and 36% for women. MRI (Magnetic Resonance Imaging) remains the most effective method for detecting and analyzing brain tumors.

    Due to the complexity of tumors and the need for experienced radiologists, manual diagnosis is time-consuming and error-prone. Automated classification using Deep Learning, particularly CNNs and Transfer Learning, offers a promising solution for accurate and scalable brain tumor diagnostics.

    Context The variability in tumor size, shape, and location introduces significant diagnostic challenges. Additionally, in developing countries, limited access to skilled radiologists makes rapid and accurate diagnosis more difficult. An AI-based solution hosted on the cloud can help overcome these barriers and assist in early tumor detection. This dataset and project are built upon the original contributions of https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri/data.

  8. f

    Summary of quantitative comparison of models.

    • datasetcatalog.nlm.nih.gov
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jie, Wang; Khan, Rabnawaz (2025). Summary of quantitative comparison of models. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001306160
    Explore at:
    Dataset updated
    Feb 20, 2025
    Authors
    Jie, Wang; Khan, Rabnawaz
    Description

    Cancer, the second-leading cause of mortality, kills 16% of people worldwide. Unhealthy lifestyles, smoking, alcohol abuse, obesity, and a lack of exercise have been linked to cancer incidence and mortality. However, it is hard. Cancer and lifestyle correlation analysis and cancer incidence and mortality prediction in the next several years are used to guide people’s healthy lives and target medical financial resources. Two key research areas of this paper are Data preprocessing and sample expansion design Using experimental analysis and comparison, this study chooses the best cubic spline interpolation technology on the original data from 32 entry points to 420 entry points and converts annual data into monthly data to solve the problem of insufficient correlation analysis and prediction. Factor analysis is possible because data sources indicate changing factors. TSA-LSTM Two-stage attention design a popular tool with advanced visualization functions, Tableau, simplifies this paper’s study. Tableau’s testing findings indicate it cannot analyze and predict this paper’s time series data. LSTM is utilized by the TSA-LSTM optimization model. By commencing with input feature attention, this model attention technique guarantees that the model encoder converges to a subset of input sequence features during the prediction of output sequence features. As a result, the model’s natural learning trend and prediction quality are enhanced. The second step, time performance attention, maintains We can choose network features and improve forecasts based on real-time performance. Validating the data source with factor correlation analysis and trend prediction using the TSA-LSTM model Most cancers have overlapping risk factors, and excessive drinking, lack of exercise, and obesity can cause breast, colorectal, and colon cancer. A poor lifestyle directly promotes lung, laryngeal, and oral cancers, according to visual tests. Cancer incidence is expected to climb 18–21% between 2020 and 2025, according to 2021. Long-term projection accuracy is 98.96 percent, and smoking and obesity may be the main cancer causes.

  9. U

    Nursing professional perspective on the grieving process in oncological...

    • dataverse.unisimon.edu.co
    • unisimon.digitalcommonsdata.com
    png, tsv
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    en ncia e Innovación en Salud; en ncia e Innovación en Salud (2025). Nursing professional perspective on the grieving process in oncological situations. A study from grounded theory [Dataset]. http://doi.org/10.17632/GWNKJC98PT.1
    Explore at:
    tsv(1297), tsv(648), tsv(2048), png(113048), tsv(3300)Available download formats
    Dataset updated
    Aug 20, 2025
    Dataset provided by
    UNISIMON
    Authors
    en ncia e Innovación en Salud; en ncia e Innovación en Salud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: This article discusses what the process experiencedby nursing professionals is like having to face care situations in death stages and the grief of people with cancer. Objective:To understand the grief oncology process from nurse’s perspective in a group to Bogotá, Colombia. Methods:Qualitative study,based on grounded theory. The information was collected in in-depth interviews with 7 nurses specialized in cancer care, from public hospitals in Bogotá D.C; Results: 5 categories were found that describe how nurses live a grieving process from the momentthey recognize themselves and their reality, feel defeated, act for a good death, learn from the losses and transcend to from multiple losses. Conclusions:A substantive theory called “Resignifying death and mourning to empower my NURSE I” was derived, allowing the conclusion of the hypothetical existence of 5 profiles: Novice, impotent, human, reflexive and leading nurse.

  10. e

    Malmö Offspring Study

    • data.europa.eu
    • researchdata.se
    • +1more
    Updated Feb 7, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lunds universitet (2017). Malmö Offspring Study [Dataset]. https://data.europa.eu/data/datasets/https-snd-se-catalogue-dataset-ext0202-1~~1?locale=en
    Explore at:
    Dataset updated
    Feb 7, 2017
    Dataset authored and provided by
    Lunds universitet
    Area covered
    Malmö
    Description

    The steering group for Malmö Offspring Study: Peter Nilsson, Lund University, Department of Clinical Sciences, Malmö, Internal Medicine Research Unit Olle Melander, Lund University, Faculty of Medicine Jan Nilsson, Lund university, Department of Clinical Sciences Gunnar Engström, Lund University, Department of Clinical Sciences, Malmö Margaretha Persson, Skåne University Hospital, Department of Clinical Sciences, Malmö Marju Orho-Melander, Lund University, Department of Clinical Sciences, Malmö, Division of Diabetes and cardiovascular disease - genetic epidemiology

    In Malmö Offspring Study, children and grandchildren to participants from the previous population study Malmö Diet Cancer are invited to participate. The children are today in the ages 50-55 while the grandchildren are 20-30 years old. The objective is to examine 5,000-6,000 individuals by the year 2020.

    There is a long tradition of larger population studies in Malmö. The main ones are Malmö Preventing Project and Malmö Diet Cancer which together has engaged over 50 000 unique participants. They have created a foundation for future studies and research projects in both Sweden and Internationally. This has resulted in new knowledges about, for example, diabetes, cardiovascular diseases, cancer, alcohol abuse and the importance of nutrition and diet.

    Researchers are now hoping to attain more relevant data than in the earlier population studies. This will be carried out through the usage of new methods for function analysis of blood vessels , lungs, brain and body metabolism which will be used at inspections and test. The connection between what people eat daily and the intestinal bacterial flora and how this affects people’s health is of special interest. Inspections and testing takes place at Skåne’s University Hospital in Malmö.

    The participants are monitored clinically through tests as well as in records for a long period of time, based on informed consent and in accordance with ethical approval and the Privacy Act (PUL).

    Purpose:

    The purpose of the study is to provide future research access to new information about how diseases are spread within families, not only through genetic inheritance but also through life style, social patterns and health habits.

  11. Global Suicide, Mental Health, Substance Use

    • kaggle.com
    zip
    Updated Jan 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Global Suicide, Mental Health, Substance Use [Dataset]. https://www.kaggle.com/datasets/thedevastator/global-suicide-mental-health-substance-use-disor
    Explore at:
    zip(69880 bytes)Available download formats
    Dataset updated
    Jan 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Global Suicide, Mental Health, Substance Use Disorders Trends

    Analyzing the Impact Across Countries

    By [source]

    About this dataset

    This dataset contains comprehensive data on global suicide, mental health, substance use disorders, and economic trends from 1990 to 2017. Using this data, researchers can delve deep into the effects of these trends across countries and ultimately uncover important insights about the state of global health. The dataset contains information about suicide rates (per 100,000 people), mental disorder prevalence (as a percentage of population size in 2017), population share with substance use disorders (as a percentage from 1990-2016), GDP per capita by purchasing power parity (in terms of current US$ for 1990-2017) and net national income per capita adjusted for inflation effects(in current US$, as in 2016). Additionally it tracks unemployment rate among populations over time(populaton%, 1991-2017). All this will help us to better understand how issues such as suicide, mental health and substance use disorders are affecting the lives of people globally

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset offers insights into how mental health, substance use disorders, and economic status can impact global suicide trends. To get the most out of this data set, it is important to note the various columns available and their purpose as outlined above.

    To analyze global suicide rates, look at the column “Probability (%) of dying between age 30 and exact age 70 from any of cardiovascular disease, cancer, diabetes or chronic respiratory disease” for a summary of estimated suicide rates for different countries over time. Additionally the columns “Entity” and “Code” provide useful information on which country is being discussed in each row.

    The column “Prevalence- Alcohol and Substance Use Disorders” provides an overview of substance use disorders across different countries while the year column indicates when these trends are taking place.

    For economic indicators related to mental health there is data available on national income per capita (current US$, 2016) as well as unemployment rate (population % 1991-2017). Together these metrics give a detailed picture into how economics can be interlinked with mental health and potentially suicide rates.

    Finally this dataset also allows you to investigate varying trends overtime between different countries by looking at any common metrics but only in one specific year using appropriate filters when exploring the data set in more detail

    Research Ideas

    • Analyzing the correlation between mental health and economic indicators.
    • Identifying countries with the highest prevalence of substance use disorders and developing targeted interventions for those populations.
    • Examining the impact of global suicide rates over time to increase awareness and reduce stigma surrounding mental health issues in different countries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: share-with-alcohol-and-substance-use-disorders 1990-2016.csv | Column name | Description | |:-----------------------------------------------------|:-----------------------------------------------------------------------------------| | Entity | The name of the country. (String) | | Code | The ISO code of the country. (String) | | Year | The year of the data. (Integer) | | Prevalence - Alcohol and substance use disorders | The percentage of the population with alcohol and substance use disorders. (Float) | | **Prevalence ** | Both (age-standardized percent) (%) |

    **File: crude suicide rate...

  12. MedQuAD: Medical Question-Answer Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afroz (2024). MedQuAD: Medical Question-Answer Dataset [Dataset]. https://www.kaggle.com/datasets/pythonafroz/medquad-medical-question-answer-for-ai-research
    Explore at:
    zip(5188686 bytes)Available download formats
    Dataset updated
    Sep 7, 2024
    Authors
    Afroz
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Medical Questions: Unveiling the MedQuAD Dataset

    Have you ever wondered where medical chatbots or intelligent search engines for health information get their knowledge? The answer lies in large datasets like MedQuAD! This rich resource provides a treasure trove of real-world medical questions and informative answers, paving the way for advancements in Natural Language Processing (NLP) and Information Retrieval (IR) within the healthcare domain.

    What is MedQuAD?

    MedQuAD, short for Medical Question Answering Dataset, is a collection of question-answer pairs meticulously curated from 12 trusted National Institutes of Health (NIH) websites. These websites cover a wide range of health topics, from cancer.gov to GARD (Genetic and Rare Diseases Information Resource).

    What makes MedQuAD unique?

    Beyond the sheer volume of data, MedQuAD offers unique features that empower researchers and developers:

    1. Diversity of Questions: MedQuAD encompasses a spectrum of 37 question types, ranging from treatment options and diagnosis inquiries to understanding side effects. This variety reflects the diverse needs of individuals seeking medical information.
    2. Focus on Specific Entities: MedQuAD goes beyond just questions and answers. It delves deeper by associating each question with the entity it focuses on, such as diseases, drugs, or other medical tests. This targeted approach facilitates more focused research and NLP applications.
    3. Rich Annotations: While the answers from MedlinePlus collections are excluded due to copyright restrictions, MedQuAD retains valuable annotations within its XML files. These annotations include question type, synonyms, unique identifiers (CUI) for medical concepts, and semantic types. This additional information opens doors for more sophisticated NLP tasks.

    The Power of MedQuAD

    MedQuAD serves as a valuable springboard for various applications in the medical NLP and IR field. Here are some potential uses:

    1. Training Chatbots and Virtual Assistants: AI-powered medical chatbots can leverage MedQuAD to learn how to respond accurately and informatively to a wide range of health inquiries from users.
    2. Developing Intelligent Search Engines: Search engines can be enhanced to provide more relevant and accurate health information by drawing insights from the question types and focuses presented in MedQuAD.
    3. Studying User Concerns in Healthcare: Analyzing the types of questions within MedQuAD can reveal valuable insights into what information users are most interested in and what areas require clearer explanations.

    In essence, MedQuAD is a powerful tool for unlocking the potential of NLP and IR in the medical domain. By leveraging this rich dataset, researchers and developers are paving the way for a future where individuals can access accurate and comprehensive health information with increasing ease and efficiency.

    Reference:

    If you use the MedQuAD dataset or the associated QA test collection, please cite the following paper: Ben Abacha, A., & Demner-Fushman, D. (2019). A Question-Entailment Approach to Question Answering. BMC Bioinformatics, 20(1), 511. https://doi.org/10.1186/s12859-019-3119-4

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Office for National Statistics (2019). Cancer survival in England - adults diagnosed [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/cancersurvivalratescancersurvivalinenglandadultsdiagnosed
Organization logo

Cancer survival in England - adults diagnosed

Explore at:
104 scholarly articles cite this dataset (View in Google Scholar)
xlsxAvailable download formats
Dataset updated
Aug 12, 2019
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Description

One-year and five-year net survival for adults (15-99) in England diagnosed with one of 29 common cancers, by age and sex.

Search
Clear search
Close search
Google apps
Main menu