https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.
By Data Exercises [source]
This dataset is a comprehensive collection of data from county-level cancer mortality and incidence rates in the United States between 2000-2014. This data provides an unprecedented level of detail into cancer cases, deaths, and trends at a local level. The included columns include County, FIPS, age-adjusted death rate, average death rate per year, recent trend (2) in death rates, recent 5-year trend (2) in death rates and average annual count for each county. This dataset can be used to provide deep insight into the patterns and effects of cancer on communities as well as help inform policy decisions related to mitigating risk factors or increasing preventive measures such as screenings. With this comprehensive set of records from across the United States over 15 years, you will be able to make informed decisions regarding individual patient care or policy development within your own community!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides comprehensive US county-level cancer mortality and incidence rates from 2000 to 2014. It includes the mortality and incidence rate for each county, as well as whether the county met the objective of 45.5 deaths per 100,000 people. It also provides information on recent trends in death rates and average annual counts of cases over the five year period studied.
This dataset can be extremely useful to researchers looking to study trends in cancer death rates across counties. By using this data, researchers will be able to gain valuable insight into how different counties are performing in terms of providing treatment and prevention services for cancer patients and whether preventative measures and healthcare access are having an effect on reducing cancer mortality rates over time. This data can also be used to inform policy makers about counties needing more target prevention efforts or additional resources for providing better healthcare access within at risk communities.
When using this dataset, it is important to pay close attention to any qualitative columns such as “Recent Trend” or “Recent 5-Year Trend (2)” that may provide insights into long term changes that may not be readily apparent when using quantitative variables such as age-adjusted death rate or average deaths per year over shorter periods of time like one year or five years respectively. Additionally, when studying differences between different counties it is important to take note of any standard FIPS code differences that may indicate that data was collected by a different source with a difference methodology than what was used in other areas studied
- Using this dataset, we can identify patterns in cancer mortality and incidence rates that are statistically significant to create treatment regimens or preventive measures specifically targeting those areas.
- This data can be useful for policymakers to target areas with elevated cancer mortality and incidence rates so they can allocate financial resources to these areas more efficiently.
- This dataset can be used to investigate which factors (such as pollution levels, access to medical care, genetic make up) may have an influence on the cancer mortality and incidence rates in different US counties
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: death .csv | Column name | Description | |:-------------------------------------------|:-------------------------------------------------------------------...
Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of cancer (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to cancer (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with cancer was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with cancer was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with cancer, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have cancerB) the NUMBER of people within that MSOA who are estimated to have cancerAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have cancer, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from cancer, and where those people make up a large percentage of the population, indicating there is a real issue with cancer within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of cancer, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of cancer.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.MSOA boundaries: © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021.Population data: Mid-2019 (June 30) Population Estimates for Middle Layer Super Output Areas in England and Wales. © Office for National Statistics licensed under the Open Government Licence v3.0. © Crown Copyright 2020.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital; © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021. © Crown Copyright 2020.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Synthetic Oral Cancer Prediction Dataset is designed for educational and research purposes to analyse factors associated with oral cancer risk, progression, and treatment outcomes. The dataset includes anonymised, synthetic data on various clinical, lifestyle, and demographic factors for individuals diagnosed with oral cancer.
https://storage.googleapis.com/opendatabay_public/09f348fc-a2e8-4132-9f1b-195765d80afc/622bf59174d1_plot_output.png" alt="Synthetic oral cancer dataset plot_output.png">
This dataset can be used for the following applications:
This synthetic dataset is fully anonymized and complies with data privacy standards. It includes a wide array of factors that support diverse research and analysis in the oncology and public health domains.
CC0 (Public Domain)
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
One-year and five-year net survival for adults (15-99) in England diagnosed with one of 29 common cancers, by age and sex.
I was interested in investigating cancer incidence levels in the US by looking at how they vary by race or state. All the data is collected online from Centers for Disease Control and Prevention, State Cancer Profiles, and United States Census Bureau. This dataset can be used to answer questions on the correlation between poverty levels, insurance levels and cancer incidence levels. Further, one can find which cancers affect a certain race more or a certain state.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
A measure of the number of adults diagnosed with any type of cancer in a year who are still alive one year after diagnosis. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with any type of cancer. Current version updated: Feb-17 Next version due: Feb-18
By UCI [source]
This dataset contains data on breast cancer diagnosis, a devastating medical condition that affects thousands of people around the world each year. The data is comprised of patient ID, diagnosis (Malignant or Benign), and 30 computed features extracted from a digitized image of a fine needle aspirate (FNA) of a breast mass. Features include radius, texture, perimeter, area, smoothness, compactness concavity and concave points as well as symmetry and fractal dimension.
Created by renowned researchers in the fields of General Surgery and Computer Science at the University of Wisconsin-Madison led by Dr. William H Wolberg with contributions from Professor W Nick Street and Olvi L Mangasarian this dataset was used in some groundbreaking research to predict breast cancer prognosis using linear programming methods. More recently statistical methods such as support vector machines have been employed to classify tumour types from this dataset as well other tasks such as identify hidden patterns through pattern recognition techniques like Artificial Neural Networks (ANN).
It has also been used for studies exploring unsupervised classification tools like Ant Colony Optimization for discovering meaningful relationships among different variables which can help physicians better understand the progression of certain types of tumors over time. For example types cardinality analysis allowed researchers to determine tumor’s heterogeneity before deciding on appropriate treatments potentially leading to improved prognosis success rates overall. This Wisconsin Breast Cancer Diagnostic dataset provides an invaluable resource to scientists working on preventing or curing this dreaded disease - a goal we all eagerly hope to achieve someday soon!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
- Developing a classifier that can accurately predict breast cancer diagnoses based on the provided features.
- Clustering patient data with similar diagnosis to discover trends or connections between certain symptoms and diagnoses.
- Optimizing feature selection algorithms to identify the most relevant predictors of breast cancer diagnosis from a set of given cell nuclei features
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: unformatted-data.csv
File: wpbc.data.csv | Column name | Description | |:--------------|:--------------------------------| | 119513 | ID number (Integer) | | N | Diagnosis (Binary) | | 31 | Radius (Real-valued) | | 18.02 | Texture (Real-valued) | | 27.6 | Perimeter (Real-valued) | | 117.5 | Area (Real-valued) | | 1013 | Smoothness (Real-valued) | | 0.09489 | Compactness (Real-valued) | | 0.1036 | Concavity (Real-valued) | | 0.1086 | Symmetry (Real-valued) | | 0.07055 | Fractal Dimension (Real-valued) | | 0.1865 | Mean Intensity (Real-valued) | | 0.06333 | Standard Error (Real-valued) | | 0.6249 | Worst Radius (Real-valued) | | 1.89 | Worst Texture (Real-valued) | | 3.972 | Worst Perimeter (Real-valued) | | 71.55 | Worst Area (Real-valued) | | 0.004433 | Worst Smoothness (Real-valued) | | 0.01421 | Worst Compactness (Real-valued) | | 0.03233 | Worst Concavity (Real-valued) |
File: breast-cancer-wisconsin.data.csv | Column name | Description | |:--------------|:--------------------------------------| | 119513 | ID number (Integer) | | 1000025 | ID number (Integer) | | 1.1 | Uniformity of Cell Size (Integer) | | 1.2 | Uniformity of Cell Shape (Integer) | | 1.3 | Single Epithelial Cell Size (Integer) | | 1.4 | Bland Chromatin (Integer) | | 1.5 | Normal Nucleoli (Integer) | | 2.1 | Mitoses (Integer) |
File: wdbc.data.csv | Column name | Description | |:--------------|:----------------------------------------| | 842302 | Patient ID number (Integer Type) | | M | Diagnosis (Binary Type) | | **...
Mortality Rates for Lake County, Illinois. Explanation of field attributes: Average Age of Death – The average age at which a people in the given zip code die. Cancer Deaths – Cancer deaths refers to individuals who have died of cancer as the underlying cause. This is a rate per 100,000. Heart Disease Related Deaths – Heart Disease Related Deaths refers to individuals who have died of heart disease as the underlying cause. This is a rate per 100,000. COPD Related Deaths – COPD Related Deaths refers to individuals who have died of chronic obstructive pulmonary disease (COPD) as the underlying cause. This is a rate per 100,000.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset presents the footprint of cancer incidence statistics in Australia for all cancers combined and the 5 top cancer groupings (breast - female only, colorectal, lung, melanoma of the skin and prostate) and their respective ICD-10 codes. The data spans the years 2006-2010 and is aggregated to Statistical Area Level 3 (SA3) from the 2011 Australian Statistical Geography Standard (ASGS). Incidence data refer to the number of new cases of cancer diagnosed in a given time period. It does not refer to the number of people newly diagnosed (because one person can be diagnosed with more than one cancer in a year). Cancer incidence data come from the Australian Institute of Health and Welfare (AIHW) 2012 Australian Cancer Database (ACD).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cervical cancer is one of the leading causes of cancer-related deaths among women worldwide. Early detection and accurate prediction of cervical cancer can significantly improve the chances of successful treatment and save lives. This dataset help to develop a predictive model using machine learning techniques to identify individuals at high risk of cervical cancer, allowing for timely intervention and medical care.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Age-standardised rate of mortality from oral cancer (ICD-10 codes C00-C14) in persons of all ages and sexes per 100,000 population.RationaleOver the last decade in the UK (between 2003-2005 and 2012-2014), oral cancer mortality rates have increased by 20% for males and 19% for females1Five year survival rates are 56%. Most oral cancers are triggered by tobacco and alcohol, which together account for 75% of cases2. Cigarette smoking is associated with an increased risk of the more common forms of oral cancer. The risk among cigarette smokers is estimated to be 10 times that for non-smokers. More intense use of tobacco increases the risk, while ceasing to smoke for 10 years or more reduces it to almost the same as that of non-smokers3. Oral cancer mortality rates can be used in conjunction with registration data to inform service planning as well as comparing survival rates across areas of England to assess the impact of public health prevention policies such as smoking cessation.References:(1) Cancer Research Campaign. Cancer Statistics: Oral – UK. London: CRC, 2000.(2) Blot WJ, McLaughlin JK, Winn DM et al. Smoking and drinking in relation to oral and pharyngeal cancer. Cancer Res 1988; 48: 3282-7. (3) La Vecchia C, Tavani A, Franceschi S et al. Epidemiology and prevention of oral cancer. Oral Oncology 1997; 33: 302-12.Definition of numeratorAll cancer mortality for lip, oral cavity and pharynx (ICD-10 C00-C14) in the respective calendar years aggregated into quinary age bands (0-4, 5-9,…, 85-89, 90+). This does not include secondary cancers or recurrences. Data are reported according to the calendar year in which the cancer was diagnosed.Counts of deaths for years up to and including 2019 have been adjusted where needed to take account of the MUSE ICD-10 coding change introduced in 2020. Detailed guidance on the MUSE implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/causeofdeathcodinginmortalitystatisticssoftwarechanges/january2020Counts of deaths for years up to and including 2013 have been double adjusted by applying comparability ratios from both the IRIS coding change and the MUSE coding change where needed to take account of both the MUSE ICD-10 coding change and the IRIS ICD-10 coding change introduced in 2014. The detailed guidance on the IRIS implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/impactoftheimplementationofirissoftwareforicd10causeofdeathcodingonmortalitystatisticsenglandandwales/2014-08-08Counts of deaths for years up to and including 2010 have been triple adjusted by applying comparability ratios from the 2011 coding change, the IRIS coding change and the MUSE coding change where needed to take account of the MUSE ICD-10 coding change, the IRIS ICD-10 coding change and the ICD-10 coding change introduced in 2011. The detailed guidance on the 2011 implementation is available at https://webarchive.nationalarchives.gov.uk/ukgwa/20160108084125/http://www.ons.gov.uk/ons/guide-method/classifications/international-standard-classifications/icd-10-for-mortality/comparability-ratios/index.htmlDefinition of denominatorPopulation-years (aggregated populations for the three years) for people of all ages, aggregated into quinary age bands (0-4, 5-9, …, 85-89, 90+)
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
One woman in nine can expect to develop breast cancer during her lifetime and one in 25 will die from the disease. Statistically low incidences of breast cancer are found in Newfoundland and Labrador, the territories, and northern areas of most provinces. Otherwise, each province has one or more pockets of significantly high breast cancer incidence. These are often located in more southerly areas, but they do not seem to be restricted to either urban or rural areas alone. Breast cancer rates are a health status indicator. They can be used to help assess health conditions. Health status refers to the state of health of a person or group, and measures causes of sickness and death. It can also include people’s assessment of their own health.
This dataset presents the footprint of male cancer incidence statistics in Australia for all cancers combined and the 11 top cancer groupings (bladder, colorectal, head and neck, kidney, leukaemia, …Show full descriptionThis dataset presents the footprint of male cancer incidence statistics in Australia for all cancers combined and the 11 top cancer groupings (bladder, colorectal, head and neck, kidney, leukaemia, lung, lymphoma, melanoma of the skin, pancreas, prostate and stomach) and their respective ICD-10 codes. The data spans the years 2006-2010 and is aggregated to Statistical Area Level 4 (SA4) from the 2011 Australian Statistical Geography Standard (ASGS). Incidence data refer to the number of new cases of cancer diagnosed in a given time period. It does not refer to the number of people newly diagnosed (because one person can be diagnosed with more than one cancer in a year). Cancer incidence data come from the Australian Institute of Health and Welfare (AIHW) 2012 Australian Cancer Database (ACD). For further information about this dataset, please visit: Australian Institute of Health and Welfare - Cancer Incidence and Mortality Across Regions (CIMAR) books. Australian Cancer Database 2012 Data Quality Statement. Please note: AURIN has spatially enabled the original data. Due to changes in geographic classifications over time, long-term trends are not available. Values assigned to "n.p." in the original data have been removed from the data. The Australian and jurisdictional totals include people who could not be assigned an SA4 category. The number of people who could not be assigned an SA4 category is less than 1% of the total. The Australian total also includes residents of Other Territories (Cocos (Keeling) Islands, Christmas Island and Jervis Bay Territory). The ACD records all primary cancers except for basal and squamous cell carcinomas of the skin (BCCs and SCCs). These cancers are not notifiable diseases and are not collected by the state and territory cancer registries. The diseases coded to ICD-10 codes D45-D46, D47.1 and D47.3-D47.5, which cover most of the myelodysplastic and myeloproliferative cancers, were not considered cancer at the time the ICD-10 was first published and were not routinely registered by all Australian cancer registries. The ACD contains all cases of these cancers which were diagnosed from 1982 onwards and which have been registered but the collection is not considered complete until 2003 onwards. Note that the incidence data presented are for 2006-2010 because 2011 and 2012 data for NSW and ACT were not able to be provided for the 2012 ACD. Copyright attribution: Government of the Commonwealth of Australia - Australian Institute of Health and Welfare, (2016): ; accessed from AURIN on 12/3/2020. Licence type: Creative Commons Attribution 3.0 Australia (CC BY 3.0 AU)
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset presents the footprint of cancer incidence statistics in Australia for all cancers combined and the 6 top cancer groupings (colorectal, leukaemia, lung, lymphoma, melanoma of the skin and pancreas) and their respective ICD-10 codes. The data spans the years 2006-2010 and is aggregated to 2015 Department of Health Primary Health Network (PHN) areas, based on the 2011 Australian Statistical Geography Standard (ASGS).
Incidence data refer to the number of new cases of cancer diagnosed in a given time period. It does not refer to the number of people newly diagnosed (because one person can be diagnosed with more than one cancer in a year). Cancer incidence data come from the Australian Institute of Health and Welfare (AIHW) 2012 Australian Cancer Database (ACD).
For further information about this dataset, please visit:
Please note:
AURIN has spatially enabled the original data using the Department of Health - PHN Areas.
Due to changes in geographic classifications over time, long-term trends are not available.
Values assigned to "n.p." in the original data have been removed from the data.
The Australian and jurisdictional totals include people who could not be assigned a PHN. The number of people who could not be assigned a PHN is less than 1% of the total.
The Australian total also includes residents of Other Territories (Cocos (Keeling) Islands, Christmas Island and Jervis Bay Territory).
The ACD records all primary cancers except for basal and squamous cell carcinomas of the skin (BCCs and SCCs). These cancers are not notifiable diseases and are not collected by the state and territory cancer registries.
The diseases coded to ICD-10 codes D45-D46, D47.1 and D47.3-D47.5, which cover most of the myelodysplastic and myeloproliferative cancers, were not considered cancer at the time the ICD-10 was first published and were not routinely registered by all Australian cancer registries. The ACD contains all cases of these cancers which were diagnosed from 1982 onwards and which have been registered but the collection is not considered complete until 2003 onwards.
Note that the incidence data presented are for 2006-2010 because 2011 and 2012 data for NSW and ACT were not able to be provided for the 2012 ACD.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This paper demonstrates the flexibility of a general approach for the analysis of discrete time competing risks data that can accommodate complex data structures, different time scales for different causes, and nonstandard sampling schemes. The data may involve a single data source where all individuals contribute to analyses of both cause-specific hazard functions, overlapping datasets where some individuals contribute to the analysis of the cause-specific hazard function of only one cause while other individuals contribute to analyses of both cause-specific hazard functions, or separate data sources where each individual contributes to the analysis of the cause-specific hazard function of only a single cause. The approach is modularized into estimation and prediction. For the estimation step, the parameters and the variance-covariance matrix can be estimated using widely available software. The prediction step utilizes a generic program with plug-in estimates from the estimation step. The approach is illustrated with three prognostic models for stage IV male oral cancer using different data structures. The first model uses only men with stage IV oral cancer from population-based registry data. The second model strategically extends the cohort to improve the efficiency of the estimates. The third model improves the accuracy for those with a lower risk of other causes of death, by bringing in an independent data source collected under a complex sampling design with additional other-cause covariates. These analyses represent novel extensions of existing methodology, broadly applicable for the development of prognostic models capturing both the cancer and non-cancer aspects of a patient's health.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset presents the footprint of female cancer incidence statistics in Australia for all cancers combined and the 11 top cancer groupings (breast, cervical, colorectal, leukaemia, lung, lymphoma, melanoma of the skin, ovary, pancreas, thyroid and uterus) and their respective ICD-10 codes. The data spans the years 2006-2010 and is aggregated to Greater Capital City Statistical Areas (GCCSA) from the 2011 Australian Statistical Geography Standard (ASGS). Incidence data refer to the number of new cases of cancer diagnosed in a given time period. It does not refer to the number of people newly diagnosed (because one person can be diagnosed with more than one cancer in a year). Cancer incidence data come from the Australian Institute of Health and Welfare (AIHW) 2012 Australian Cancer Database (ACD).
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Directly age-standardised registration rate for oral cancer (ICD-10 C00-C14), in persons of all ages, per 100,000 2013 European Standard PopulationRationaleTobacco is a known risk factor for oral cancers (1). In England, 65% of hospital admissions (2014–15) for oral cancer and 64 % of deaths (2014) due to oral cancer were attributed to smoking (2). Oral cancer registration is therefore a direct measure of smoking-related harm. Given the high proportion of these registrations that are due to smoking, a reduction in the prevalence of smoking would reduce the incidence of oral cancer.Towards a Smokefree Generation: A Tobacco Control Plan for England states that tobacco use remains one of our most significant public health challenges and that smoking is the single biggest cause of inequalities in death rates between the richest and poorest in our communities (3).In January 2012 the Public Health Outcomes Framework was published, then updated in 2016. Smoking and smoking related death plays a key role in two of the four domains: Health Improvement and Preventing premature mortality (4).References:(1) GBD 2013 Risk Factors Collaborators. Global, regional and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risk factors in 188 countries, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013. The Lancet 2015; 386:10010 2287–2323. (2) Statistics on smoking, England 2016, May 2016; http://content.digital.nhs.uk/catalogue/PUB20781 (3) Towards a Smokefree Generation: A Tobacco Control Plan for England, July 2017 https://www.gov.uk/government/publications/towards-a-smoke-free-generation-tobacco-control-plan-for-england (4) Public Health Outcomes Framework 2016 to 2019, August 2016; https://www.gov.uk/government/publications/public-health-outcomes-framework-2016-to-2019 Definition of numeratorCancer registrations for oral cancer (ICD-10, C00-C14) in the calendar years 2007-09 to 2017-2019. The National Cancer Registration and Analysis Service collects data relating to each new diagnosis of cancer that occurs in England. This does not include secondary cancers. Data are reported according to the calendar year in which the cancer was diagnosed.Definition of denominatorPopulation-years (ONS mid-year population estimates aggregated for the respective years) for people of all ages, aggregated into quinary age bands (0-4, 5-9,…, 85-89, 90+).CaveatsReviews of the quality of UK cancer registry data 1, 2 have concluded that registrations are largely complete, accurate and reliable. The data on cancer registration ‘quality indicators’ (mortality to incidence ratios, zero survival cases and unspecified site) demonstrate that although there is some variability, overall ascertainment and reliability is good. However cancer registrations are continuously being updated, so the number of registrations for each year may not be complete, as there is a small but steady stream of late registrations, some of which only come to light through death certification.1. Huggett C (1995). Review of the Quality and Comparability of Data held by Regional Cancer Registries. Bristol: Bristol Cancer Epidemiology Unit incorporating the South West Cancer Registry. 2. Seddon DJ, Williams EMI (1997). Data quality in population based cancer registration. British Journal of Cancer 76: 667-674.The data presented here replace versions previously published. Population data and the European Standard Population have been revised. ONS have provided an explanation of the change in standard population (available at http://www.ons.gov.uk/ons/guide-method/user-guidance/health-and-life-events/revised-european-standard-population-2013--2013-esp-/index.html )
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Purpose: The dataset is designed to explore the potential relationship between lifestyle habits and the probability of developing cancer. Variables: Sr No.: A unique identifier for each observation. Smoking Habit: Categorizes individuals based on their smoking frequency (e.g., Heavy, Moderate, Occasional, None). Drinking Habit: Categorizes individuals based on their alcohol consumption frequency (e.g., Frequent, Occasional, None). Biking Habit: Measures the frequency of biking activity (e.g., High, Medium, Low). Walking Habit: Measures the frequency of walking activity (e.g., High, Medium, Low). Jogging Habit: Measures the frequency of jogging activity (e.g., High, Medium, Low). Probability of Cancer: A numerical value representing the estimated likelihood of developing cancer, ranging from 0 to 1. Assumptions: The dataset assumes a causal relationship between lifestyle habits and cancer risk. However, correlation does not necessarily imply causation, and other factors may influence cancer development. The probability of cancer is a simplified representation and may vary based on individual factors, genetics, and environmental influences. Potential Use Cases: Exploratory Analysis: To identify potential correlations between lifestyle habits and cancer risk. Predictive Modeling: To build models that predict the probability of cancer based on lifestyle factors. Public Health Initiatives: To inform public health campaigns and interventions aimed at promoting healthy lifestyles and reducing cancer risk.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.