100+ datasets found
  1. Data from: County-level cumulative environmental quality associated with...

    • s.cnmilf.com
    • catalog.data.gov
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). County-level cumulative environmental quality associated with cancer incidence. [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/county-level-cumulative-environmental-quality-associated-with-cancer-incidence
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air _domain includes 87 variables representing criteria and hazardous air pollutants. The water _domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land _domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built _domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).

  2. Number and rates of new cases of primary cancer, by cancer type, age group...

    • www150.statcan.gc.ca
    • datasets.ai
    • +3more
    Updated May 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2021). Number and rates of new cases of primary cancer, by cancer type, age group and sex [Dataset]. http://doi.org/10.25318/1310011101-eng
    Explore at:
    Dataset updated
    May 19, 2021
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.

  3. d

    [MI] Rapid Cancer Registration Data

    • digital.nhs.uk
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). [MI] Rapid Cancer Registration Data [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mi-rapid-cancer-registration-data
    Explore at:
    Dataset updated
    Oct 2, 2025
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Description

    Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.

  4. p

    Breast Cancer Dataset - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Breast Cancer Dataset - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/breast-cancer-dataset
    Explore at:
    Dataset updated
    Oct 7, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description: Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area. The key challenges against it’s detection is how to classify tumors into malignant (cancerous) or benign(non cancerous). We ask you to complete the analysis of classifying these tumors using machine learning (with SVMs) and the Breast Cancer Wisconsin (Diagnostic) Dataset. Acknowledgements: This dataset has been referred from Kaggle. Objective: Understand the Dataset & cleanup (if required). Build classification models to predict whether the cancer type is Malignant or Benign. Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.

  5. Cancer Mortality & Incidence Rates: (Country LVL)

    • kaggle.com
    Updated Dec 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Cancer Mortality & Incidence Rates: (Country LVL) [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-county-level-cancer-mortality-and-incidence-r/versions/2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Cancer Mortality & Incidence Rates: (Country LVL)

    Investigating Cancer Trends over time

    By Data Exercises [source]

    About this dataset

    This dataset is a comprehensive collection of data from county-level cancer mortality and incidence rates in the United States between 2000-2014. This data provides an unprecedented level of detail into cancer cases, deaths, and trends at a local level. The included columns include County, FIPS, age-adjusted death rate, average death rate per year, recent trend (2) in death rates, recent 5-year trend (2) in death rates and average annual count for each county. This dataset can be used to provide deep insight into the patterns and effects of cancer on communities as well as help inform policy decisions related to mitigating risk factors or increasing preventive measures such as screenings. With this comprehensive set of records from across the United States over 15 years, you will be able to make informed decisions regarding individual patient care or policy development within your own community!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides comprehensive US county-level cancer mortality and incidence rates from 2000 to 2014. It includes the mortality and incidence rate for each county, as well as whether the county met the objective of 45.5 deaths per 100,000 people. It also provides information on recent trends in death rates and average annual counts of cases over the five year period studied.

    This dataset can be extremely useful to researchers looking to study trends in cancer death rates across counties. By using this data, researchers will be able to gain valuable insight into how different counties are performing in terms of providing treatment and prevention services for cancer patients and whether preventative measures and healthcare access are having an effect on reducing cancer mortality rates over time. This data can also be used to inform policy makers about counties needing more target prevention efforts or additional resources for providing better healthcare access within at risk communities.

    When using this dataset, it is important to pay close attention to any qualitative columns such as “Recent Trend” or “Recent 5-Year Trend (2)” that may provide insights into long term changes that may not be readily apparent when using quantitative variables such as age-adjusted death rate or average deaths per year over shorter periods of time like one year or five years respectively. Additionally, when studying differences between different counties it is important to take note of any standard FIPS code differences that may indicate that data was collected by a different source with a difference methodology than what was used in other areas studied

    Research Ideas

    • Using this dataset, we can identify patterns in cancer mortality and incidence rates that are statistically significant to create treatment regimens or preventive measures specifically targeting those areas.
    • This data can be useful for policymakers to target areas with elevated cancer mortality and incidence rates so they can allocate financial resources to these areas more efficiently.
    • This dataset can be used to investigate which factors (such as pollution levels, access to medical care, genetic make up) may have an influence on the cancer mortality and incidence rates in different US counties

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: death .csv | Column name | Description | |:-------------------------------------------|:-------------------------------------------------------------------...

  6. h

    lungs_cancer

    • huggingface.co
    Updated Dec 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    virtualcollaborationhub (2024). lungs_cancer [Dataset]. https://huggingface.co/datasets/virtual10/lungs_cancer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 23, 2024
    Dataset authored and provided by
    virtualcollaborationhub
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Lung Cancer

      Dataset Summary
    

    The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information Needed]

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/virtual10/lungs_cancer.
    
  7. c

    Cancer (in persons of all ages): England

    • data.catchmentbasedapproach.org
    • hub.arcgis.com
    Updated Apr 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Rivers Trust (2021). Cancer (in persons of all ages): England [Dataset]. https://data.catchmentbasedapproach.org/datasets/cancer-in-persons-of-all-ages-england
    Explore at:
    Dataset updated
    Apr 6, 2021
    Dataset authored and provided by
    The Rivers Trust
    Area covered
    Description

    SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of cancer (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to cancer (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with cancer was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with cancer was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with cancer, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have cancerB) the NUMBER of people within that MSOA who are estimated to have cancerAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have cancer, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from cancer, and where those people make up a large percentage of the population, indicating there is a real issue with cancer within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of cancer, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of cancer.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.MSOA boundaries: © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021.Population data: Mid-2019 (June 30) Population Estimates for Middle Layer Super Output Areas in England and Wales. © Office for National Statistics licensed under the Open Government Licence v3.0. © Crown Copyright 2020.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital; © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021. © Crown Copyright 2020.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.

  8. RSNA Mammography Breast Cancer TFRecord Dataset

    • kaggle.com
    Updated Dec 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    muhammed (2023). RSNA Mammography Breast Cancer TFRecord Dataset [Dataset]. https://www.kaggle.com/datasets/clkmuhammed/rsna-mammography-breast-cancer-tfrecord-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 17, 2023
    Dataset provided by
    Kaggle
    Authors
    muhammed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Source RSNA Screening Mammography Breast Cancer Detection

    Processing of the huge 314GB+ Dataset (Include 54713 Images) of this competition into TFRecords for fast dataloading during training.

    All images are resized to 768x1280 and saved in 100 TFRecords, making each TFRecord contain roughly 548 images as 8.6GB+ Dataset.

    TFRecords have the benefit of loading large chunks of data containing many samples instead of loading every image and label seperately.

    Dataset Description

    Note: The dataset for this challenge contains radiographic breast images of female subjects. The goal of this competition is to identify cases of breast cancer in mammograms from screening exams. It is important to identify cases of cancer for obvious reasons, but false positives also have downsides for patients. As millions of women get mammograms each year, a useful machine learning tool could help a great many people. This competition uses a hidden test. When your submitted notebook is scored the actual test data (including a full length sample submission) will be made available to your notebook.

    Files

    [train/test]_images/[patient_id]/[image_id].dcm The mammograms, in dicom format. You can expect roughly 8,000 patients in the hidden test set. There are usually but not always 4 images per patient. Note that many of the images use the jpeg 2000 format which may you may need special libraries to load.

    sample_submission.csv A valid sample submission. Only the first few rows are available for download.

    [train/test].csv Metadata for each patient and image. Only the first few rows of the test set are available for download.

    site_id - ID code for the source hospital. patient_id - ID code for the patient. image_id - ID code for the image. laterality - Whether the image is of the left or right breast. view - The orientation of the image. The default for a screening exam is to capture two views per breast. age - The patient's age in years. implant - Whether or not the patient had breast implants. Site 1 only provides breast implant information at the patient level, not at the breast level. density - A rating for how dense the breast tissue is, with A being the least dense and D being the most dense. Extremely dense tissue can make diagnosis more difficult. Only provided for train. machine_id - An ID code for the imaging device. cancer - Whether or not the breast was positive for malignant cancer. The target value. Only provided for train. biopsy - Whether or not a follow-up biopsy was performed on the breast. Only provided for train. invasive - If the breast is positive for cancer, whether or not the cancer proved to be invasive. Only provided for train. BIRADS - 0 if the breast required follow-up, 1 if the breast was rated as negative for cancer, and 2 if the breast was rated as normal. Only provided for train. prediction_id - The ID for the matching submission row. Multiple images will share the same prediction ID. Test only. difficult_negative_case - True if the case was unusually difficult. Only provided for train.

  9. Cancer survival in England - adults diagnosed

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Aug 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2019). Cancer survival in England - adults diagnosed [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/cancersurvivalratescancersurvivalinenglandadultsdiagnosed
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 12, 2019
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    One-year and five-year net survival for adults (15-99) in England diagnosed with one of 29 common cancers, by age and sex.

  10. One-year survival from all cancers (NHSOF 1.4.i) - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Aug 4, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2015). One-year survival from all cancers (NHSOF 1.4.i) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/one-year-survival-from-all-cancers-nhsof-1-4-i
    Explore at:
    Dataset updated
    Aug 4, 2015
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    A measure of the number of adults diagnosed with any type of cancer in a year who are still alive one year after diagnosis. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with any type of cancer. Current version updated: Feb-17 Next version due: Feb-18

  11. b

    Mortality rate from oral cancer, all ages - WMCA

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Mortality rate from oral cancer, all ages - WMCA [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/mortality-rate-from-oral-cancer-all-ages-wmca/
    Explore at:
    csv, geojson, json, excelAvailable download formats
    Dataset updated
    Oct 3, 2025
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Age-standardised rate of mortality from oral cancer (ICD-10 codes C00-C14) in persons of all ages and sexes per 100,000 population.RationaleOver the last decade in the UK (between 2003-2005 and 2012-2014), oral cancer mortality rates have increased by 20% for males and 19% for females1Five year survival rates are 56%. Most oral cancers are triggered by tobacco and alcohol, which together account for 75% of cases2. Cigarette smoking is associated with an increased risk of the more common forms of oral cancer. The risk among cigarette smokers is estimated to be 10 times that for non-smokers. More intense use of tobacco increases the risk, while ceasing to smoke for 10 years or more reduces it to almost the same as that of non-smokers3. Oral cancer mortality rates can be used in conjunction with registration data to inform service planning as well as comparing survival rates across areas of England to assess the impact of public health prevention policies such as smoking cessation.References:(1) Cancer Research Campaign. Cancer Statistics: Oral – UK. London: CRC, 2000.(2) Blot WJ, McLaughlin JK, Winn DM et al. Smoking and drinking in relation to oral and pharyngeal cancer. Cancer Res 1988; 48: 3282-7. (3) La Vecchia C, Tavani A, Franceschi S et al. Epidemiology and prevention of oral cancer. Oral Oncology 1997; 33: 302-12.Definition of numeratorAll cancer mortality for lip, oral cavity and pharynx (ICD-10 C00-C14) in the respective calendar years aggregated into quinary age bands (0-4, 5-9,…, 85-89, 90+). This does not include secondary cancers or recurrences. Data are reported according to the calendar year in which the cancer was diagnosed.Counts of deaths for years up to and including 2019 have been adjusted where needed to take account of the MUSE ICD-10 coding change introduced in 2020. Detailed guidance on the MUSE implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/causeofdeathcodinginmortalitystatisticssoftwarechanges/january2020Counts of deaths for years up to and including 2013 have been double adjusted by applying comparability ratios from both the IRIS coding change and the MUSE coding change where needed to take account of both the MUSE ICD-10 coding change and the IRIS ICD-10 coding change introduced in 2014. The detailed guidance on the IRIS implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/impactoftheimplementationofirissoftwareforicd10causeofdeathcodingonmortalitystatisticsenglandandwales/2014-08-08Counts of deaths for years up to and including 2010 have been triple adjusted by applying comparability ratios from the 2011 coding change, the IRIS coding change and the MUSE coding change where needed to take account of the MUSE ICD-10 coding change, the IRIS ICD-10 coding change and the ICD-10 coding change introduced in 2011. The detailed guidance on the 2011 implementation is available at https://webarchive.nationalarchives.gov.uk/ukgwa/20160108084125/http://www.ons.gov.uk/ons/guide-method/classifications/international-standard-classifications/icd-10-for-mortality/comparability-ratios/index.htmlDefinition of denominatorPopulation-years (aggregated populations for the three years) for people of all ages, aggregated into quinary age bands (0-4, 5-9, …, 85-89, 90+)

  12. Breast Cancer Coimbra

    • kaggle.com
    Updated Jan 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek Agrawal (2024). Breast Cancer Coimbra [Dataset]. https://www.kaggle.com/datasets/atom1991/breast-cancer-coimbra
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2024
    Dataset provided by
    Kaggle
    Authors
    Vivek Agrawal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset originates from a deep learning model trained on the "Coimbra Breast Cancer" dataset, with feature distributions closely resembling the original. The original data includes clinical observations from 64 patients with breast cancer and 52 healthy controls, encompassing 10 quantitative predictors and a binary dependent variable indicating the presence or absence of breast cancer.

    Quantitative Attributes:

    Age (years): Represents the age of individuals in the dataset.

    BMI (kg/m²): Body Mass Index, a measure of body fat based on weight and height.

    Glucose (mg/dL): Reflects blood glucose levels, a vital metabolic indicator.

    Insulin (µU/mL): Indicates insulin levels, a hormone associated with glucose regulation.

    HOMA: Homeostatic Model Assessment, a method assessing insulin resistance and beta-cell function.

    Leptin (ng/mL): Represents leptin levels, a hormone involved in appetite and energy balance regulation.

    Adiponectin (µg/mL): Reflects adiponectin levels, a protein associated with metabolic regulation.

    Resistin (ng/mL): Indicates resistin levels, a protein implicated in insulin resistance.

    MCP-1 (pg/dL): Reflects Monocyte Chemoattractant Protein-1 levels, a cytokine involved in inflammation.

    Labels:

    1: Healthy controls

    2: Patients with breast cancer

    These quantitative attributes, including anthropometric data and parameters gathered from routine blood analysis, serve as the foundation for potential biomarkers of breast cancer. The dataset presents an opportunity for developing accurate prediction models, aiding in the identification and understanding of factors associated with breast cancer.

  13. Cancer incidence, by selected sites of cancer and sex, three-year average,...

    • www150.statcan.gc.ca
    • data.urbandatacentre.ca
    • +2more
    Updated Feb 14, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2018). Cancer incidence, by selected sites of cancer and sex, three-year average, census metropolitan areas [Dataset]. http://doi.org/10.25318/1310011201-eng
    Explore at:
    Dataset updated
    Feb 14, 2018
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Age standardized rate of cancer incidence, by selected sites of cancer and sex, three-year average, census metropolitan areas.

  14. d

    SHIP Cancer Mortality Rate 2009-2021

    • catalog.data.gov
    • opendata.maryland.gov
    • +2more
    Updated Aug 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata.maryland.gov (2024). SHIP Cancer Mortality Rate 2009-2021 [Dataset]. https://catalog.data.gov/dataset/ship-cancer-mortality-rate-2009-2017
    Explore at:
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    opendata.maryland.gov
    Description

    This is historical data. The update frequency has been set to "Static Data" and is here for historic value. Updated on 8/14/2024 Cancer Mortality Rate - This indicator shows the age-adjusted mortality rate from cancer (per 100,000 population). Maryland’s age adjusted cancer mortality rate is higher than the US cancer mortality rate. Cancer impacts people across all population groups, however wide racial disparities exist. Link to Data Details

  15. d

    Mortality Rates

    • catalog.data.gov
    • data.amerigeoss.org
    • +3more
    Updated Nov 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lake County Illinois GIS (2024). Mortality Rates [Dataset]. https://catalog.data.gov/dataset/mortality-rates-6fb72
    Explore at:
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    Lake County Illinois GIS
    Description

    Mortality Rates for Lake County, Illinois. Explanation of field attributes: Average Age of Death – The average age at which a people in the given zip code die. Cancer Deaths – Cancer deaths refers to individuals who have died of cancer as the underlying cause. This is a rate per 100,000. Heart Disease Related Deaths – Heart Disease Related Deaths refers to individuals who have died of heart disease as the underlying cause. This is a rate per 100,000. COPD Related Deaths – COPD Related Deaths refers to individuals who have died of chronic obstructive pulmonary disease (COPD) as the underlying cause. This is a rate per 100,000.

  16. Digital Pathology Dataset for Prostate Cancer Diagnosis

    • zenodo.org
    zip
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Umit Oner; Mustafa Umit Oner; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Malay Singh; Malay Singh; Weimiao Yu; Weimiao Yu; Wing-Kin Sung; Wing-Kin Sung; Chin Fong Wong; Hwee Kuan Lee; Hwee Kuan Lee; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Chin Fong Wong (2022). Digital Pathology Dataset for Prostate Cancer Diagnosis [Dataset]. http://doi.org/10.5281/zenodo.5971764
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mustafa Umit Oner; Mustafa Umit Oner; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Malay Singh; Malay Singh; Weimiao Yu; Weimiao Yu; Wing-Kin Sung; Wing-Kin Sung; Chin Fong Wong; Hwee Kuan Lee; Hwee Kuan Lee; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Chin Fong Wong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Links to code and bioRxiv pre-print:

    1. Multi-lens Neural Machine (MLNM) Code

    2. An AI-assisted Tool For Efficient Prostate Cancer Diagnosis (bioRxiv Pre-print)

    Digitized hematoxylin and eosin (H&E)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. H&E-stained slides were scanned at 40× magnification (specimen-level pixel size 0·25μm × 0·25μm) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified.

    Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated.

    Patches of size 512 × 512 pixels were cropped from whole slide images at resolutions 5×, 10×, 20×, and 40× with an annotated gland centered at each patch. This dataset contains these cropped images.

    This dataset is used to train two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows:

    1. gland_segmentation_dataset.zip
    2. gland_classification_dataset.zip

    Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen.

    #Slides

    Train

    Valid

    Test

    Total

    Prostatectomy

    17

    8

    15

    40

    Biopsy

    26

    13

    20

    59

    Total

    43

    21

    35

    99

    #Patches

    Train

    Valid

    Test

    Total

    Prostatectomy

    7795

    3753

    7224

    18772

    Biopsy

    5559

    4028

    5981

    15568

    Total

    13354

    7781

    13205

    34340

    Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.

    #Slides (GS 3+3:3+4:4+3)

    Train

    Valid

    Test

    Total

    Biopsy

    10:9:1

    3:7:0

    6:10:0

    19:26:1

    #Patches (B:M)

    Train

    Valid

    Test

    Total

    Biopsy

    1557:2277

    1216:1341

    1543:2718

    4316:6336

    NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from H&E slides. They were not used in the machine learning study.

  17. Computed Tomography (CT) of the Brain

    • kaggle.com
    Updated Oct 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). Computed Tomography (CT) of the Brain [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/computed-tomography-ct-of-the-brain
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 13, 2023
    Dataset provided by
    Kaggle
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Computed Tomography (CT) of the Brain - Object Detection dataset

    The dataset consists of CT brain scans with cancer, tumor, and aneurysm. Each scan represents a detailed image of a patient's brain taken using CT (Computed Tomography). The data are presented in 2 different formats: .jpg and .dcm.

    💴 For Commercial Usage: Full version of the dataset includes much more brain scans of people with different conditions, leave a request on TrainingData to buy the dataset

    The dataset of CT brain scans is valuable for research in neurology, radiology, and oncology. It allows the development and evaluation of computer-based algorithms, machine learning models, and deep learning techniques for automated detection, diagnosis, and classification of these conditions.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fd534483d76552e312cf094fbe23d8cc5%2Fezgif.com-optimize.gif?generation=1697211124166914&alt=media" alt="">

    Types of brain diseases in the dataset:

    • cancer
    • tumor
    • aneurysm

    OTHER MEDICAL DATASETS:

    💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

    Content

    The folder "files" includes 3 folders:

    • corresponding to name of the brain disease and including ct scans of people with this disease (cancer, tumor or aneurysm)
    • including brain scans in 2 different formats: .jpg and .dcm.

    File with the extension .csv includes the following information for each media file:

    • dcm: link to access the .dcm file,
    • jpg: link to access the .jpg file,
    • type: name of the brain disease on the ct

    Medical data might be collected in accordance with your requirements.

    TrainingData provides high-quality data annotation tailored to your needs

    keywords: aneurysm, cancer detection, cancer segmentation, tumor, computed tomography, head, skull, brain scan, eye sockets, sinuses, medical imaging, radiology dataset, neurology dataset, oncology dataset, image dataset, abnormalities detection, brain anatomy, health, brain formations, imaging procedure, x-rays measurements, machine learning, computer vision, deep learning

  18. Five-year survival from all cancers (NHSOF 1.4.ii) - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Aug 4, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2015). Five-year survival from all cancers (NHSOF 1.4.ii) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/five-year-survival-from-all-cancers-nhsof-1-4-ii
    Explore at:
    Dataset updated
    Aug 4, 2015
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    A measure of the number of adults diagnosed with any type of cancer in a year who are still alive five years after diagnosis. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with any type of cancer. Current version updated: Feb-17 Next version due: Feb-18

  19. r

    Cancer Incidence och mortality in a population based investigation in the...

    • researchdata.se
    • demo.researchdata.se
    Updated Oct 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Håkan Olsson (2024). Cancer Incidence och mortality in a population based investigation in the southern health care region - Cost for health care for controls [Dataset]. https://researchdata.se/en/catalogue/dataset/ext0119-2
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Lund University
    Authors
    Håkan Olsson
    Time period covered
    2000 - 2007
    Description

    All individuals diagnosed with cancer from 2000 to 2007 were identified in the Cancer Register of Southern Sweden, but only individuals who were also identified in the Population Register of Scania were included in this cohort. Age- and gender-matched controls were identified in the Population Register of Scania. The controls were reconciled with the cancer registry in southern Sweden so that they had no prior diagnosis of cancer and with the Population Register of Scania that they were alive at time of diagnosis to the matched case. Also spouses to cancer patients were used as controls.

    For each individual, healthcare costs were monitored related to the date of diagnosis. Costs for outpatient care, inpatient care, number of days in hospital and medications were included. Costs were also calculated for the controls.

    Other information available about the individuals in the cohort are age, sex, domicile, type of tumor and medication.

    Purpose:

    To study the health cost per individual in relation to mortality and comorbidity.

    Dataset includes the study controls (individuals matched by age and sex ) Also spouses to cancer patients were included in the control group.

  20. f

    DataSheet_1_Triple-negative breast cancer survival prediction:...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan, Shuixin; Shen, Haoyang; Chen, Yan; Li, Jiadi; Qiu, Yu; Wu, Weizhu (2024). DataSheet_1_Triple-negative breast cancer survival prediction: population-based research using the SEER database and an external validation cohort.xls [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001388252
    Explore at:
    Dataset updated
    Jun 10, 2024
    Authors
    Yan, Shuixin; Shen, Haoyang; Chen, Yan; Li, Jiadi; Qiu, Yu; Wu, Weizhu
    Description

    IntroductionTriple-negative breast cancer (TNBC) is linked to a poorer outlook, heightened aggressiveness relative to other breast cancer variants, and limited treatment choices. The absence of conventional treatment methods makes TNBC patients susceptible to metastasis. The objective of this research was to assess the clinical and pathological traits of TNBC patients, predict the influence of risk elements on their outlook, and create a prediction model to assist doctors in treating TNBC patients and enhancing their prognosis.MethodsWe included 23,394 individuals with complete baseline clinical data and survival information who were diagnosed with primary TNBC between 2010 and 2015 based on the SEER database. External validation utilised a group from The Affiliated Lihuili Hospital of Ningbo University. Independent risk factors linked to TNBC prognosis were identified through univariate, multivariate, and least absolute shrinkage and selection operator regression methods. These characteristics were chosen as parameters to develop 3- and 5-year overall survival (OS) and breast cancer-specific survival (BCSS) nomogram models. Model accuracy was assessed using calibration curves, consistency indices (C-indices), receiver operating characteristic curves (ROCs), and decision curve analyses (DCAs). Finally, TNBC patients were divided into groups of high, medium, and low risk, employing the nomogram model for conducting a Kaplan-Meier survival analysis.ResultsIn the training cohort, variables such as age at diagnosis, marital status, grade, T stage, N stage, M stage, surgery, radiation, and chemotherapy were linked to OS and BCSS. For the nomogram, the C-indices stood at 0.762, 0.747, and 0.764 in forecasting OS across the training, internal validation, and external validation groups, respectively. Additionally, the C-index values for the training, internal validation, and external validation groups in BCSS prediction stood at 0.793, 0.755, and 0.811, in that order. The findings revealed that the calibration of our nomogram model was successful, and the time-variant ROC curves highlighted its effectiveness in clinical settings. Ultimately, the clinical DCA showcased the prospective clinical advantages of the suggested model. Furthermore, the online version was simple to use, and nomogram classification may enhance the differentiation of TNBC prognosis and distinguish risk groups more accurately.ConclusionThese nomograms are precise tools for assessing risk in patients with TNBC and forecasting survival. They can help doctors identify prognostic markers and create more effective treatment plans for patients with TNBC, providing more accurate assessments of their 3- and 5-year OS and BCSS.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. EPA Office of Research and Development (ORD) (2020). County-level cumulative environmental quality associated with cancer incidence. [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/county-level-cumulative-environmental-quality-associated-with-cancer-incidence
Organization logo

Data from: County-level cumulative environmental quality associated with cancer incidence.

Related Article
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description

Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air _domain includes 87 variables representing criteria and hazardous air pollutants. The water _domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land _domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built _domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).

Search
Clear search
Close search
Google apps
Main menu