100+ datasets found
  1. Lung Cancer Mortality Datasets v2

    • kaggle.com
    zip
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MasterDataSan (2024). Lung Cancer Mortality Datasets v2 [Dataset]. https://www.kaggle.com/datasets/masterdatasan/lung-cancer-mortality-datasets-v2
    Explore at:
    zip(81127029 bytes)Available download formats
    Dataset updated
    Jun 1, 2024
    Authors
    MasterDataSan
    Description

    This dataset contains data about lung cancer Mortality. This database is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. It is designed to facilitate the analysis of various factors that may influence cancer prognosis and treatment outcomes. The database includes a range of demographic, medical, and treatment-related variables, capturing essential details about each patient's condition and history.

    Key components of the database include:

    Demographic Information: Basic details about the patients such as age, gender, and country of residence. This helps in understanding the distribution of cancer cases across different populations and regions.

    Medical History: Information about each patient’s medical background, including family history of cancer, smoking status, Body Mass Index (BMI), cholesterol levels, and the presence of other health conditions such as hypertension, asthma, cirrhosis, and other cancers. This section is crucial for identifying potential risk factors and comorbidities.

    Cancer Diagnosis: Detailed data about the cancer diagnosis itself, including the date of diagnosis and the stage of cancer at the time of diagnosis. This helps in tracking the progression and severity of the disease.

    Treatment Details: Information regarding the type of treatment each patient received, the end date of the treatment, and the outcome (whether the patient survived or not). This is essential for evaluating the effectiveness of different treatment approaches.

    The structure of the database allows for in-depth analysis and research, making it possible to identify patterns, correlations, and potential causal relationships between various factors and cancer outcomes. It is a valuable resource for medical researchers, epidemiologists, and healthcare providers aiming to improve cancer treatment and patient care.

    id: A unique identifier for each patient in the dataset. age: The age of the patient at the time of diagnosis. gender: The gender of the patient (e.g., male, female). country: The country or region where the patient resides. diagnosis_date: The date on which the patient was diagnosed with lung cancer. cancer_stage: The stage of lung cancer at the time of diagnosis (e.g., Stage I, Stage II, Stage III, Stage IV). family_history: Indicates whether there is a family history of cancer (e.g., yes, no). smoking_status: The smoking status of the patient (e.g., current smoker, former smoker, never smoked, passive smoker). bmi: The Body Mass Index of the patient at the time of diagnosis. cholesterol_level: The cholesterol level of the patient (value). hypertension: Indicates whether the patient has hypertension (high blood pressure) (e.g., yes, no). asthma: Indicates whether the patient has asthma (e.g., yes, no). cirrhosis: Indicates whether the patient has cirrhosis of the liver (e.g., yes, no). other_cancer: Indicates whether the patient has had any other type of cancer in addition to the primary diagnosis (e.g., yes, no). treatment_type: The type of treatment the patient received (e.g., surgery, chemotherapy, radiation, combined). end_treatment_date: The date on which the patient completed their cancer treatment or died. survived: Indicates whether the patient survived (e.g., yes, no).

    This dataset contains artificially generated data with as close a representation of reality as possible. This data is free to use without any licence required.

    Good luck Gakusei!

  2. Breast Cancer Dataset [Wisconsin Diagnostic UCI]

    • kaggle.com
    zip
    Updated Jan 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhinav Mangalore (2024). Breast Cancer Dataset [Wisconsin Diagnostic UCI] [Dataset]. https://www.kaggle.com/datasets/abhinavmangalore/breast-cancer-dataset-wisconsin-diagnostic-uci
    Explore at:
    zip(49831 bytes)Available download formats
    Dataset updated
    Jan 22, 2024
    Authors
    Abhinav Mangalore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Wisconsin
    Description

    This dataset is taken from the UCI Machine Learning Repository (Link: https://data.world/health/breast-cancer-wisconsin) by the Donor: Nick Street

    The main idea and inspiration behind the upload was to provide datasets for Machine Learning as practice and reference for my peers at college. The main purpose is to analyze data and experiment with different machine learning ideas and techniques for this binary classification task. As such, this dataset is a very useful resource to practice on.

    Breast cancer is when breast cells mutate and become cancerous cells that multiply and form tumors. It accounts for 25% of all cancer cases and affected over 2.1 Million people in 2015 alone. Breast cancer typically affects women and people assigned female at birth (AFAB) age 50 and older, but it can also affect men and people assigned male at birth (AMAB), as well as younger women. Healthcare providers may treat breast cancer with surgery to remove tumors or treatment to kill cancerous cells.

    Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at http://www.cs.wisc.edu/~street/images/

    The task: To classify whether the tumor is benign (B) or malignant (M).

    Relevant information

    Features are computed from a digitized image of a fine needle
    aspirate (FNA) of a breast mass. They describe
    characteristics of the cell nuclei present in the image.
    A few of the images can be found at
    http://www.cs.wisc.edu/~street/images/
    
    Separating plane described above was obtained using
    Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree
    Construction Via Linear Programming." Proceedings of the 4th
    Midwest Artificial Intelligence and Cognitive Science Society,
    pp. 97-101, 1992], a classification method which uses linear
    programming to construct a decision tree. Relevant features
    were selected using an exhaustive search in the space of 1-4
    features and 1-3 separating planes.
    
    The actual linear program used to obtain the separating plane
    in the 3-dimensional space is that described in:
    [K. P. Bennett and O. L. Mangasarian: "Robust Linear
    Programming Discrimination of Two Linearly Inseparable Sets",
    Optimization Methods and Software 1, 1992, 23-34].
    
    
    This database is also available through the UW CS ftp server:
    
    ftp ftp.cs.wisc.edu
    cd math-prog/cpo-dataset/machine-learn/WDBC/
    

    Number of instances: 569

    Number of attributes: 32 (ID, diagnosis, 30 real-valued input features)

    Original Creators:

    Dr. William H. Wolberg, General Surgery Dept., University of
    Wisconsin, Clinical Sciences Center, Madison, WI 53792
    wolberg@eagle.surgery.wisc.edu
    
    W. Nick Street, Computer Sciences Dept., University of
    Wisconsin, 1210 West Dayton St., Madison, WI 53706
    street@cs.wisc.edu 608-262-6619
    
    Olvi L. Mangasarian, Computer Sciences Dept., University of
    Wisconsin, 1210 West Dayton St., Madison, WI 53706
    olvi@cs.wisc.edu 
    

    Donor: Nick Street

    Date: November 1995

    Past Usage:

    first usage:

    W.N. Street, W.H. Wolberg and O.L. Mangasarian 
    Nuclear feature extraction for breast tumor diagnosis.
    IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science
    and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.
    

    OR literature:

    O.L. Mangasarian, W.N. Street and W.H. Wolberg. 
    Breast cancer diagnosis and prognosis via linear programming. 
    Operations Research, 43(4), pages 570-577, July-August 1995.
    

    Medical literature:

    W.H. Wolberg, W.N. Street, and O.L. Mangasarian. 
    Machine learning techniques to diagnose breast cancer from
    fine-needle aspirates. 
    Cancer Letters 77 (1994) 163-171.
    
    W.H. Wolberg, W.N. Street, and O.L. Mangasarian. 
    Image analysis and machine learning applied to breast cancer
    diagnosis and prognosis. 
    Analytical and Quantitative Cytology and Histology, Vol. 17
    No. 2, pages 77-87, April 1995. 
    
    W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. 
    Computerized breast cancer diagnosis and prognosis from fine
    needle aspirates. 
    Archives of Surgery 1995;130:511-516.
    
    W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. 
    Computer-derived nuclear features distinguish malignant from
    benign breast cytology. 
    Human Pathology, 26:792--796, 1995.
    

    See also: http://www.cs.wisc.edu/~olvi/uwmp/mpml.html http://www.cs.wisc.edu/~olvi/uwmp/cancer.html

  3. Appendix Cancer Prediction Dataset

    • kaggle.com
    zip
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankush Panday (2025). Appendix Cancer Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/ankushpanday1/appendix-cancer-prediction-dataset
    Explore at:
    zip(7343922 bytes)Available download formats
    Dataset updated
    Feb 4, 2025
    Authors
    Ankush Panday
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains clinical, demographic, and lifestyle data for 260,000 individuals from 25 countries. Designed for healthcare research and predictive modeling, it includes diverse variables relevant to appendix cancer diagnosis and risk factors. The dataset can support machine learning tasks, statistical analysis, and exploratory data studies in oncology and public health domains.

  4. Lung Cancer Dataset

    • kaggle.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman_Kumar094 (2025). Lung Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/amankumar094/lung-cancer-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2025
    Dataset provided by
    Kaggle
    Authors
    Aman_Kumar094
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ** Description**

    This dataset contains data about lung cancer Mortality and is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. This dataset contains comprehensive information on 800,000 individuals related to lung cancer diagnosis, treatment, and outcomes. With 16 well-structured columns. This large-scale dataset is designed to aid researchers, data scientists, and healthcare professionals in studying patterns, building predictive models, and enhancing early detection and treatment strategies.

    🌍 The Societal Impact of Lung Cancer

    Lung cancer is not just a disease — it's a global crisis that steals time, health, and hope from millions of people every year. As the #1 cause of cancer deaths worldwide, it takes more lives annually than breast, colon, and prostate cancer combined.

    But behind every statistic is a story:

    A parent who never saw their child graduate.

    A worker who had to leave their job too soon.

    A community that lost a leader, a friend, a neighbor.

    Why does this matter? Lung cancer often goes undetected until it's too late. It’s aggressive, silent, and devastating — especially in underserved areas where early detection is rare and treatment options are limited. It doesn’t just affect patients. It affects families, economies, and healthcare systems on a massive scale.

    This dataset represents more than numbers. It represents 800,000 real-world stories — people who can help us unlock patterns, train models, and advance life-saving research.

    By working with this data, you're not just analyzing a dataset — you're stepping into the fight against one of humanity’s deadliest diseases.

    Let’s turn insight into impact. (😊The above descriptions is generated with the help of AI, Just wanted to share this dataset That all. Thank you)

  5. d

    [MI] Rapid Cancer Registration Data

    • digital.nhs.uk
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). [MI] Rapid Cancer Registration Data [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mi-rapid-cancer-registration-data
    Explore at:
    Dataset updated
    Nov 27, 2025
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Description

    Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.

  6. One-year survival from all cancers (NHSOF 1.4.i) - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Aug 4, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2015). One-year survival from all cancers (NHSOF 1.4.i) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/one-year-survival-from-all-cancers-nhsof-1-4-i
    Explore at:
    Dataset updated
    Aug 4, 2015
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    A measure of the number of adults diagnosed with any type of cancer in a year who are still alive one year after diagnosis. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with any type of cancer. Current version updated: Feb-17 Next version due: Feb-18

  7. Cancer survival in England - adults diagnosed

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Aug 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2019). Cancer survival in England - adults diagnosed [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/cancersurvivalratescancersurvivalinenglandadultsdiagnosed
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 12, 2019
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    One-year and five-year net survival for adults (15-99) in England diagnosed with one of 29 common cancers, by age and sex.

  8. Number and rates of new cases of primary cancer, by cancer type, age group...

    • www150.statcan.gc.ca
    • datasets.ai
    • +2more
    Updated May 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2021). Number and rates of new cases of primary cancer, by cancer type, age group and sex [Dataset]. http://doi.org/10.25318/1310011101-eng
    Explore at:
    Dataset updated
    May 19, 2021
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.

  9. Five-year survival from all cancers (NHSOF 1.4.ii) - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Aug 4, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2015). Five-year survival from all cancers (NHSOF 1.4.ii) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/five-year-survival-from-all-cancers-nhsof-1-4-ii
    Explore at:
    Dataset updated
    Aug 4, 2015
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    A measure of the number of adults diagnosed with any type of cancer in a year who are still alive five years after diagnosis. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with any type of cancer. Current version updated: Feb-17 Next version due: Feb-18

  10. p

    Breast Cancer Dataset - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Breast Cancer Dataset - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/breast-cancer-dataset
    Explore at:
    Dataset updated
    Oct 7, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description: Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area. The key challenges against it’s detection is how to classify tumors into malignant (cancerous) or benign(non cancerous). We ask you to complete the analysis of classifying these tumors using machine learning (with SVMs) and the Breast Cancer Wisconsin (Diagnostic) Dataset. Acknowledgements: This dataset has been referred from Kaggle. Objective: Understand the Dataset & cleanup (if required). Build classification models to predict whether the cancer type is Malignant or Benign. Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.

  11. h

    lung-cancer

    • huggingface.co
    Updated Jun 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nate Raw (2022). lung-cancer [Dataset]. https://huggingface.co/datasets/nateraw/lung-cancer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 24, 2022
    Authors
    Nate Raw
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Lung Cancer

      Dataset Summary
    

    The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information Needed]

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/lung-cancer.
    
  12. breast cancer

    • figshare.com
    txt
    Updated Mar 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ariel Silva (2022). breast cancer [Dataset]. http://doi.org/10.6084/m9.figshare.19441766.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 28, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ariel Silva
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Cancer affects people of different ages, ethnicities and sex. Collecting and storing data from these people assists in the development, understanding and analysis of statistics on the disease. In Brazil, the oncology hospital units, whichreceive patients diagnosed with cancer, store the information in a national database, called Hospital Registry of Cancer (RHC). Were selected the folowing variables: age, sex, race, alcohol consumption, tobacco consumption and cancer staging.

  13. s

    Five-year survival from breast, lung and colorectal cancer (NHSOF 1.4.iv) -...

    • ckan.publishing.service.gov.uk
    Updated Aug 4, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). Five-year survival from breast, lung and colorectal cancer (NHSOF 1.4.iv) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/five-year-survival-from-breast-lung-and-colorectal-cancer-nhsof-1-4-iv
    Explore at:
    Dataset updated
    Aug 4, 2015
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    A measure of the number of adults diagnosed with breast, lung or colorectal cancer in a year who are still alive five years after diagnosis. ONS still publish survival percentages for individual types of cancers. These can be found at: http://www.ons.gov.uk/ons/rel/cancer-unit/cancer-survival/cancer-survival-in-england--patients-diagnosed-2007-2011-and-followed-up-to-2012/index.html A time series for five-year survival figures for breast, lung and colorectal cancer individually (previous NHS Outcomes Framework indicators 1.4.ii, 1.4.iv and 1.4.vi) is still published and can be found under the link 'Indicator data - previous methodology (.xls)' below. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with breast, lung or colorectal cancer. Current version updated: May-14 Next version due: To be confirmed

  14. b

    Mortality rate from oral cancer, all ages - WMCA

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Mortality rate from oral cancer, all ages - WMCA [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/mortality-rate-from-oral-cancer-all-ages-wmca/
    Explore at:
    csv, geojson, json, excelAvailable download formats
    Dataset updated
    Nov 3, 2025
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Age-standardised rate of mortality from oral cancer (ICD-10 codes C00-C14) in persons of all ages and sexes per 100,000 population.RationaleOver the last decade in the UK (between 2003-2005 and 2012-2014), oral cancer mortality rates have increased by 20% for males and 19% for females1Five year survival rates are 56%. Most oral cancers are triggered by tobacco and alcohol, which together account for 75% of cases2. Cigarette smoking is associated with an increased risk of the more common forms of oral cancer. The risk among cigarette smokers is estimated to be 10 times that for non-smokers. More intense use of tobacco increases the risk, while ceasing to smoke for 10 years or more reduces it to almost the same as that of non-smokers3. Oral cancer mortality rates can be used in conjunction with registration data to inform service planning as well as comparing survival rates across areas of England to assess the impact of public health prevention policies such as smoking cessation.References:(1) Cancer Research Campaign. Cancer Statistics: Oral – UK. London: CRC, 2000.(2) Blot WJ, McLaughlin JK, Winn DM et al. Smoking and drinking in relation to oral and pharyngeal cancer. Cancer Res 1988; 48: 3282-7. (3) La Vecchia C, Tavani A, Franceschi S et al. Epidemiology and prevention of oral cancer. Oral Oncology 1997; 33: 302-12.Definition of numeratorAll cancer mortality for lip, oral cavity and pharynx (ICD-10 C00-C14) in the respective calendar years aggregated into quinary age bands (0-4, 5-9,…, 85-89, 90+). This does not include secondary cancers or recurrences. Data are reported according to the calendar year in which the cancer was diagnosed.Counts of deaths for years up to and including 2019 have been adjusted where needed to take account of the MUSE ICD-10 coding change introduced in 2020. Detailed guidance on the MUSE implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/causeofdeathcodinginmortalitystatisticssoftwarechanges/january2020Counts of deaths for years up to and including 2013 have been double adjusted by applying comparability ratios from both the IRIS coding change and the MUSE coding change where needed to take account of both the MUSE ICD-10 coding change and the IRIS ICD-10 coding change introduced in 2014. The detailed guidance on the IRIS implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/impactoftheimplementationofirissoftwareforicd10causeofdeathcodingonmortalitystatisticsenglandandwales/2014-08-08Counts of deaths for years up to and including 2010 have been triple adjusted by applying comparability ratios from the 2011 coding change, the IRIS coding change and the MUSE coding change where needed to take account of the MUSE ICD-10 coding change, the IRIS ICD-10 coding change and the ICD-10 coding change introduced in 2011. The detailed guidance on the 2011 implementation is available at https://webarchive.nationalarchives.gov.uk/ukgwa/20160108084125/http://www.ons.gov.uk/ons/guide-method/classifications/international-standard-classifications/icd-10-for-mortality/comparability-ratios/index.htmlDefinition of denominatorPopulation-years (aggregated populations for the three years) for people of all ages, aggregated into quinary age bands (0-4, 5-9, …, 85-89, 90+)

  15. Lung-Cancer-Risk-Dataset

    • kaggle.com
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikey-TraceGod (2025). Lung-Cancer-Risk-Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/12844025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mikey-TraceGod
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Lung Cancer Risk Dataset

    Overview

    This dataset contains 50,000 patient profiles designed for lung cancer risk analysis and machine learning applications. The dataset is clean, preprocessed, and ready for immediate use in classification tasks, statistical analysis, and data visualization.

    • Rows: 50,000
    • Columns: 11
    • File: preprocessed_lung_cancer_dataset.csv
    • License: CC0: Public Domain

    Dataset Description

    The dataset includes patient profiles with features based on established lung cancer risk factors such as smoking history, environmental exposures, and chronic lung conditions. All data is synthetic and designed to reflect realistic risk factor distributions while maintaining patient privacy.

    Features

    ColumnTypeDescriptionValues/Range
    patient_idIntegerUnique patient identifier100000-149999
    ageIntegerPatient age in years18-100
    genderStringPatient gender'Male', 'Female'
    pack_yearsFloatSmoking exposure (years × packs per day)0-100
    radon_exposureStringResidential radon exposure level'Low', 'Medium', 'High'
    asbestos_exposureStringOccupational asbestos exposure history'Yes', 'No'
    secondhand_smoke_exposureStringPassive smoking exposure'Yes', 'No'
    copd_diagnosisStringChronic obstructive pulmonary disease diagnosis'Yes', 'No'
    alcohol_consumptionStringAlcohol consumption pattern'None', 'Moderate', 'Heavy'
    family_historyStringFamily history of lung cancer'Yes', 'No'
    lung_cancerStringTarget variable: Lung cancer diagnosis'Yes', 'No'

    Data Quality

    • Complete: No missing values or duplicates
    • Clean: All values within realistic ranges
    • Balanced Features: Realistic distribution of risk factors
    • Target Distribution: Approximately 25% positive cases, reflecting real-world lung cancer prevalence

    Use Cases

    • Binary classification modeling
    • Risk factor correlation analysis
    • Data visualization and exploratory analysis
    • Machine learning pipeline development
    • Statistical hypothesis testing
  16. p

    Urinary biomarkers for pancreatic cancer - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Urinary biomarkers for pancreatic cancer - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/urinary-biomarkers-for-pancreatic-cancer
    Explore at:
    Dataset updated
    Oct 8, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Can a simple urine test detect one of the deadliest cancers? About Dataset This is a brand-new (!) dataset from an open-access paper published December 10, 2020. The paper and the full dataset are open-access (CC-BY), so please give attribution to the original authors in your work. Background Pancreatic cancer is an extremely deadly type of cancer. Once diagnosed, the five-year survival rate is less than 10%. However, if pancreatic cancer is caught early, the odds of surviving are much better. Unfortunately, many cases of pancreatic cancer show no symptoms until the cancer has spread throughout the body. A diagnostic test to identify people with pancreatic cancer could be enormously helpful. The paper In a paper by Silvana Debernardi and colleagues, published this year in the journal PLOS Medicine, a multi-national team of researchers sought to develop an accurate diagnostic test for the most common type of pancreatic cancer, called pancreatic ductal adenocarcinoma or PDAC. They gathered a series of biomarkers from the urine of three groups of patients: Healthy controls Patients with non-cancerous pancreatic conditions, like chronic pancreatitis Patients with pancreatic ductal adenocarcinoma When possible, these patients were age- and sex-matched. The goal was to develop an accurate way to identify patients with pancreatic cancer. The data The key features are four urinary biomarkers: creatinine, LYVE1, REG1B, and TFF1. Creatinine is a protein that is often used as an indicator of kidney function. YVLE1 is lymphatic vessel endothelial hyaluronan receptor 1, a protein that may play a role in tumor metastasis REG1B is a protein that may be associated with pancreas regeneration TFF1 is trefoil factor 1, which may be related to regeneration and repair of the urinary tract Age and sex, both included in the dataset, may also play a role in who gets pancreatic cancer. The dataset includes a few other biomarkers as well, but these were not measured in all patients (they were collected partly to measure how various blood biomarkers compared to urine biomarkers). I have not changed any of the data from the paper, other than renaming the columns for easy importing and use. The file Debernardi et al 2020 data.csv contains the raw data, while the file Debernardi et al 2020 documentation.csv contains a detailed documentation of what each column represents (as well as the original column names from the paper). Prediction task The goal in this dataset is predicting diagnosis, and more specifically, differentiating between 3 (pancreatic cancer) versus 2 (non-cancerous pancreas condition) and 1 (healthy). The dataset includes information on stage of pancreatic cancer, and diagnosis for non-cancerous patients, but remember—these won't be available to a predictive model. The goal, after all, is to predict the presence of disease before it's diagnosed, not after! Acknowledgements I would like to thank the authors of this paper, for graciously sharing their raw data with the research community.

  17. IDC Breast Cancer Dataset Descriptions.

    • plos.figshare.com
    xls
    Updated Sep 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mudhafar Jalil Jassim Ghrabat; Arkan A. Ghaib; Auhood Al-Hossenat; Zaid Ameen Abduljabbar; Vincent Omollo Nyangaresi; Junchao Ma; Abdulla J. Y. Aldarwish; Iman Qays Abduljaleel; Dhafer G. Honi; Husam A. Neamah (2025). IDC Breast Cancer Dataset Descriptions. [Dataset]. http://doi.org/10.1371/journal.pone.0329078.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 3, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mudhafar Jalil Jassim Ghrabat; Arkan A. Ghaib; Auhood Al-Hossenat; Zaid Ameen Abduljabbar; Vincent Omollo Nyangaresi; Junchao Ma; Abdulla J. Y. Aldarwish; Iman Qays Abduljaleel; Dhafer G. Honi; Husam A. Neamah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Breast cancer is highlighted in recent research as one of the most prevalent types of cancer. Timely identification is essential for enhancing patient results and decreasing fatality rates. Utilizing computer-assisted detection and diagnosis early on may greatly improve the chances of recovery by accurately predicting outcomes and developing suitable treatment plans. Grading breast cancer properly, especially evaluating nuclear atypia, is difficult owing to faults and inconsistencies in slide preparation and the intricate nature of tissue patterns. This work explores the capability of deep learning to extract characteristics from histopathology photos of breast cancer. The research introduces a new method called SMOTE-based Convolutional Neural Network (CNN) technology to detect areas impacted by Invasive Ductal Carcinoma (IDC) in whole slide pictures. The trials used a dataset of 162 individuals with IDC, split into training (113 photos) and testing (49 images) groups. Every model was subjected to individual testing. The SMO_CNN model we developed demonstrated exceptional testing and training accuracies of 98.95% and 99.20% respectively, surpassing CNN, VGG19, and ResNet50 models. The results highlight the effectiveness of the created model in properly detecting IDC-affected tissue areas, showing great promise for improving breast cancer diagnosis and treatment planning. We surpassing other models as such, CNN, VGG19, ResNet50.

  18. b

    One year survival from all cancers - ICP Outcomes Framework - Birmingham and...

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Sep 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). One year survival from all cancers - ICP Outcomes Framework - Birmingham and Solihull [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/one-year-survival-from-all-cancers-icp-outcomes-framework-birmingham-and-solihull/
    Explore at:
    excel, csv, geojson, jsonAvailable download formats
    Dataset updated
    Sep 10, 2025
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Solihull
    Description

    This dataset provides insights into one-year survival rates from all cancers, serving as a key indicator of early cancer outcomes. It measures the proportion of individuals diagnosed with an invasive cancer who survive for at least one year following their diagnosis. The dataset includes all invasive tumours classified under ICD-10 codes C00 to C97, excluding non-melanoma skin cancer (C44). It supports analysis across different population groups and geographies, including ethnicity, deprivation levels, and the Birmingham and Solihull (BSol) area.

    Rationale

    Improving one-year survival rates is a critical goal in cancer care, as it reflects the effectiveness of early diagnosis and initial treatment. This indicator helps monitor progress in reducing early mortality from cancer and supports targeted interventions to improve outcomes.

    Numerator

    The numerator includes individuals who were diagnosed with a specific type of cancer and died from the same type of cancer within one year of diagnosis. Only invasive cancers are included, as defined by ICD-10 codes C00 to C97, excluding non-melanoma skin cancer (C44). Data is sourced from the National Cancer Registration and Analysis Service (NCRAS).

    Denominator

    The denominator comprises all individuals diagnosed with an invasive cancer (ICD-10 codes C00 to C97, excluding C44) within a five-year period. This data is also sourced from the National Cancer Registration and Analysis Service (NCRAS).

    Caveats

    This dataset uses a simplified methodology that differs from the national calculation of one-year cancer survival. As a result, the figures presented here may not align with nationally published statistics. However, this approach enables the provision of survival data disaggregated by ethnicity, deprivation, and local geographies such as BSol, which is not always possible with national data.

    External references

    For more information, visit the National Cancer Registration and Analysis Service (NCRAS).

    Click here to explore more from the Birmingham and Solihull Integrated Care Partnerships Outcome Framework.

  19. DataSheet_1_Triple-negative breast cancer survival prediction:...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu Qiu; Yan Chen; Haoyang Shen; Shuixin Yan; Jiadi Li; Weizhu Wu (2024). DataSheet_1_Triple-negative breast cancer survival prediction: population-based research using the SEER database and an external validation cohort.xls [Dataset]. http://doi.org/10.3389/fonc.2024.1388869.s001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Yu Qiu; Yan Chen; Haoyang Shen; Shuixin Yan; Jiadi Li; Weizhu Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionTriple-negative breast cancer (TNBC) is linked to a poorer outlook, heightened aggressiveness relative to other breast cancer variants, and limited treatment choices. The absence of conventional treatment methods makes TNBC patients susceptible to metastasis. The objective of this research was to assess the clinical and pathological traits of TNBC patients, predict the influence of risk elements on their outlook, and create a prediction model to assist doctors in treating TNBC patients and enhancing their prognosis.MethodsWe included 23,394 individuals with complete baseline clinical data and survival information who were diagnosed with primary TNBC between 2010 and 2015 based on the SEER database. External validation utilised a group from The Affiliated Lihuili Hospital of Ningbo University. Independent risk factors linked to TNBC prognosis were identified through univariate, multivariate, and least absolute shrinkage and selection operator regression methods. These characteristics were chosen as parameters to develop 3- and 5-year overall survival (OS) and breast cancer-specific survival (BCSS) nomogram models. Model accuracy was assessed using calibration curves, consistency indices (C-indices), receiver operating characteristic curves (ROCs), and decision curve analyses (DCAs). Finally, TNBC patients were divided into groups of high, medium, and low risk, employing the nomogram model for conducting a Kaplan-Meier survival analysis.ResultsIn the training cohort, variables such as age at diagnosis, marital status, grade, T stage, N stage, M stage, surgery, radiation, and chemotherapy were linked to OS and BCSS. For the nomogram, the C-indices stood at 0.762, 0.747, and 0.764 in forecasting OS across the training, internal validation, and external validation groups, respectively. Additionally, the C-index values for the training, internal validation, and external validation groups in BCSS prediction stood at 0.793, 0.755, and 0.811, in that order. The findings revealed that the calibration of our nomogram model was successful, and the time-variant ROC curves highlighted its effectiveness in clinical settings. Ultimately, the clinical DCA showcased the prospective clinical advantages of the suggested model. Furthermore, the online version was simple to use, and nomogram classification may enhance the differentiation of TNBC prognosis and distinguish risk groups more accurately.ConclusionThese nomograms are precise tools for assessing risk in patients with TNBC and forecasting survival. They can help doctors identify prognostic markers and create more effective treatment plans for patients with TNBC, providing more accurate assessments of their 3- and 5-year OS and BCSS.

  20. Z

    Data set - What Defines Quality of Life for Older Patients Diagnosed with...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seghers, PAL (2022). Data set - What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_7062210
    Explore at:
    Dataset updated
    Oct 5, 2022
    Dataset provided by
    Jolina A. Kregting
    Siri Rostoft
    Seghers, PAL
    Shane O'Hanlon
    Marije E. Hamaker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data set from- What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study

    Abstract of the study: The treatment of cancer can have a significant impact on quality of life in older patients and this needs to be taken into account in decision making. However, quality of life can consist of many different components with varying importance between individuals. We set out to assess how older patients with cancer define quality of life and the components that are most significant to them. This was a single-centre, qualitative interview study. Patients aged 70 years or older with cancer were asked to answer open-ended questions: What makes life worthwhile? What does quality of life mean to you? What could affect your quality of life? Subsequently, they were asked to choose the five most important determinants of quality of life from a predefined list: cognition, contact with family or with community, independence, staying in your own home, helping others, having enough energy, emotional well-being, life satisfaction, religion and leisure activities. Afterwards, answers to the open-ended questions were independently categorized by two authors. The proportion of patients mentioning each category in the open-ended questions were compared to the predefined questions. Overall, 63 patients (median age 76 years) were included. When asked, “What makes life worthwhile?”, patients identified social functioning (86%) most frequently. Moreover, to define quality of life, patients most frequently mentioned categories in the domains of physical functioning (70%) and physical health (48%). Maintaining cognition was mentioned in 17% of the open-ended questions and it was the most commonly chosen option from the list of determinants (72% of respondents). In conclusion, physical functioning, social functioning, physical health and cognition are important components in quality of life. When discussing treatment options, the impact of treatment on these aspects should be taken into consideration.

    Reference of research paper: Seghers PAL, Kregting JA, van Huis-Tanja LH, Soubeyran P, O'Hanlon S, Rostoft S, Hamaker ME, Portielje JEA. What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study. Cancers. 2022; 14(5):1123. https://doi.org/10.3390/cancers14051123

    Content of the data set: The first Tab describes what questions were asked, the second tab shows all individual anonymised answers to the open questions, the fourth shows the definitions that were used to classify all answers. Q1-Q4 show how the answers were categorised.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
MasterDataSan (2024). Lung Cancer Mortality Datasets v2 [Dataset]. https://www.kaggle.com/datasets/masterdatasan/lung-cancer-mortality-datasets-v2
Organization logo

Lung Cancer Mortality Datasets v2

Dataset of lung cancer with time observation durring theatment period

Explore at:
zip(81127029 bytes)Available download formats
Dataset updated
Jun 1, 2024
Authors
MasterDataSan
Description

This dataset contains data about lung cancer Mortality. This database is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. It is designed to facilitate the analysis of various factors that may influence cancer prognosis and treatment outcomes. The database includes a range of demographic, medical, and treatment-related variables, capturing essential details about each patient's condition and history.

Key components of the database include:

Demographic Information: Basic details about the patients such as age, gender, and country of residence. This helps in understanding the distribution of cancer cases across different populations and regions.

Medical History: Information about each patient’s medical background, including family history of cancer, smoking status, Body Mass Index (BMI), cholesterol levels, and the presence of other health conditions such as hypertension, asthma, cirrhosis, and other cancers. This section is crucial for identifying potential risk factors and comorbidities.

Cancer Diagnosis: Detailed data about the cancer diagnosis itself, including the date of diagnosis and the stage of cancer at the time of diagnosis. This helps in tracking the progression and severity of the disease.

Treatment Details: Information regarding the type of treatment each patient received, the end date of the treatment, and the outcome (whether the patient survived or not). This is essential for evaluating the effectiveness of different treatment approaches.

The structure of the database allows for in-depth analysis and research, making it possible to identify patterns, correlations, and potential causal relationships between various factors and cancer outcomes. It is a valuable resource for medical researchers, epidemiologists, and healthcare providers aiming to improve cancer treatment and patient care.

id: A unique identifier for each patient in the dataset. age: The age of the patient at the time of diagnosis. gender: The gender of the patient (e.g., male, female). country: The country or region where the patient resides. diagnosis_date: The date on which the patient was diagnosed with lung cancer. cancer_stage: The stage of lung cancer at the time of diagnosis (e.g., Stage I, Stage II, Stage III, Stage IV). family_history: Indicates whether there is a family history of cancer (e.g., yes, no). smoking_status: The smoking status of the patient (e.g., current smoker, former smoker, never smoked, passive smoker). bmi: The Body Mass Index of the patient at the time of diagnosis. cholesterol_level: The cholesterol level of the patient (value). hypertension: Indicates whether the patient has hypertension (high blood pressure) (e.g., yes, no). asthma: Indicates whether the patient has asthma (e.g., yes, no). cirrhosis: Indicates whether the patient has cirrhosis of the liver (e.g., yes, no). other_cancer: Indicates whether the patient has had any other type of cancer in addition to the primary diagnosis (e.g., yes, no). treatment_type: The type of treatment the patient received (e.g., surgery, chemotherapy, radiation, combined). end_treatment_date: The date on which the patient completed their cancer treatment or died. survived: Indicates whether the patient survived (e.g., yes, no).

This dataset contains artificially generated data with as close a representation of reality as possible. This data is free to use without any licence required.

Good luck Gakusei!

Search
Clear search
Close search
Google apps
Main menu