64 datasets found
  1. Cancer County-Level

    • kaggle.com
    zip
    Updated Dec 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Cancer County-Level [Dataset]. https://www.kaggle.com/datasets/thedevastator/exploring-county-level-correlations-in-cancer-ra
    Explore at:
    zip(146998 bytes)Available download formats
    Dataset updated
    Dec 3, 2022
    Authors
    The Devastator
    Description

    Exploring County-Level Correlations in Cancer Rates and Trends

    A Multivariate Ordinary Least Squares Regression Model

    By Noah Rippner [source]

    About this dataset

    This dataset offers a unique opportunity to examine the pattern and trends of county-level cancer rates in the United States at the individual county level. Using data from cancer.gov and the US Census American Community Survey, this dataset allows us to gain insight into how age-adjusted death rate, average deaths per year, and recent trends vary between counties – along with other key metrics like average annual counts, met objectives of 45.5?, recent trends (2) in death rates, etc., captured within our deep multi-dimensional dataset. We are able to build linear regression models based on our data to determine correlations between variables that can help us better understand cancers prevalence levels across different counties over time - making it easier to target health initiatives and resources accurately when necessary or desired

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This kaggle dataset provides county-level datasets from the US Census American Community Survey and cancer.gov for exploring correlations between county-level cancer rates, trends, and mortality statistics. This dataset contains records from all U.S counties concerning the age-adjusted death rate, average deaths per year, recent trend (2) in death rates, average annual count of cases detected within 5 years, and whether or not an objective of 45.5 (1) was met in the county associated with each row in the table.

    To use this dataset to its fullest potential you need to understand how to perform simple descriptive analytics which includes calculating summary statistics such as mean, median or other numerical values; summarizing categorical variables using frequency tables; creating data visualizations such as charts and histograms; applying linear regression or other machine learning techniques such as support vector machines (SVMs), random forests or neural networks etc.; differentiating between supervised vs unsupervised learning techniques etc.; reviewing diagnostics tests to evaluate your models; interpreting your findings; hypothesizing possible reasons and patterns discovered during exploration made through data visualizations ; Communicating and conveying results found via effective presentation slides/documents etc.. Having this understanding will enable you apply different methods of analysis on this data set accurately ad effectively.

    Once these concepts are understood you are ready start exploring this data set by first importing it into your visualization software either tableau public/ desktop version/Qlikview / SAS Analytical suite/Python notebooks for building predictive models by loading specified packages based on usage like Scikit Learn if Python is used among others depending on what tool is used . Secondly a brief description of the entire table's column structure has been provided above . Statistical operations can be carried out with simple queries after proper knowledge of basic SQL commands is attained just like queries using sub sets can also be performed with good command over selecting columns while specifying conditions applicable along with sorting operations being done based on specific attributes as required leading up towards writing python codes needed when parsing specific portion of data desired grouping / aggregating different categories before performing any kind of predictions / models can also activated create post joining few tables possible , when ever necessary once again varying across tools being used Thereby diving deep into analyzing available features determined randomly thus creating correlation matrices figures showing distribution relationships using correlation & covariance matrixes , thus making evaluations deducing informative facts since revealing trends identified through corresponding scatter plots from a given metric gathered from appropriate fields!

    Research Ideas

    • Building a predictive cancer incidence model based on county-level demographic data to identify high-risk areas and target public health interventions.
    • Analyzing correlations between age-adjusted death rate, average annual count, and recent trends in order to develop more effective policy initiatives for cancer prevention and healthcare access.
    • Utilizing the dataset to construct a machine learning algorithm that can predict county-level mortality rates based on socio-economic factors such as poverty levels and educational attainment rates

    Acknowledgements

    If you use this dataset i...

  2. p

    Cervical Cancer Risk Classification - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Cervical Cancer Risk Classification - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/cervical-cancer-risk-classification
    Explore at:
    Dataset updated
    Oct 7, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cervical Cancer Risk Factors for Biopsy: This Dataset is Obtained from UCI Repository and kindly acknowledged! This file contains a List of Risk Factors for Cervical Cancer leading to a Biopsy Examination! About 11,000 new cases of invasive cervical cancer are diagnosed each year in the U.S. However, the number of new cervical cancer cases has been declining steadily over the past decades. Although it is the most preventable type of cancer, each year cervical cancer kills about 4,000 women in the U.S. and about 300,000 women worldwide. In the United States, cervical cancer mortality rates plunged by 74% from 1955 - 1992 thanks to increased screening and early detection with the Pap test. AGE Fifty percent of cervical cancer diagnoses occur in women ages 35 - 54, and about 20% occur in women over 65 years of age. The median age of diagnosis is 48 years. About 15% of women develop cervical cancer between the ages of 20 - 30. Cervical cancer is extremely rare in women younger than age 20. However, many young women become infected with multiple types of human papilloma virus, which then can increase their risk of getting cervical cancer in the future. Young women with early abnormal changes who do not have regular examinations are at high risk for localized cancer by the time they are age 40, and for invasive cancer by age 50. SOCIOECONOMIC AND ETHNIC FACTORS Although the rate of cervical cancer has declined among both Caucasian and African-American women over the past decades, it remains much more prevalent in African-Americans -- whose death rates are twice as high as Caucasian women. Hispanic American women have more than twice the risk of invasive cervical cancer as Caucasian women, also due to a lower rate of screening. These differences, however, are almost certainly due to social and economic differences. Numerous studies report that high poverty levels are linked with low screening rates. In addition, lack of health insurance, limited transportation, and language difficulties hinder a poor woman’s access to screening services. HIGH SEXUAL ACTIVITY Human papilloma virus (HPV) is the main risk factor for cervical cancer. In adults, the most important risk factor for HPV is sexual activity with an infected person. Women most at risk for cervical cancer are those with a history of multiple sexual partners, sexual intercourse at age 17 years or younger, or both. A woman who has never been sexually active has a very low risk for developing cervical cancer. Sexual activity with multiple partners increases the likelihood of many other sexually transmitted infections (chlamydia, gonorrhea, syphilis).Studies have found an association between chlamydia and cervical cancer risk, including the possibility that chlamydia may prolong HPV infection. FAMILY HISTORY Women have a higher risk of cervical cancer if they have a first-degree relative (mother, sister) who has had cervical cancer. USE OF ORAL CONTRACEPTIVES Studies have reported a strong association between cervical cancer and long-term use of oral contraception (OC). Women who take birth control pills for more than 5 - 10 years appear to have a much higher risk HPV infection (up to four times higher) than those who do not use OCs. (Women taking OCs for fewer than 5 years do not have a significantly higher risk.) The reasons for this risk from OC use are not entirely clear. Women who use OCs may be less likely to use a diaphragm, condoms, or other methods that offer some protection against sexual transmitted diseases, including HPV. Some research also suggests that the hormones in OCs might help the virus enter the genetic material of cervical cells. HAVING MANY CHILDREN Studies indicate that having many children increases the risk for developing cervical cancer, particularly in women infected with HPV. SMOKING Smoking is associated with a higher risk for precancerous changes (dysplasia) in the cervix and for progression to invasive cervical cancer, especially for women infected with HPV. IMMUNOSUPPRESSION Women with weak immune systems, (such as those with HIV / AIDS), are more susceptible to acquiring HPV. Immunocompromised patients are also at higher risk for having cervical precancer develop rapidly into invasive cancer. DIETHYLSTILBESTROL (DES) From 1938 - 1971, diethylstilbestrol (DES), an estrogen-related drug, was widely prescribed to pregnant women to help prevent miscarriages. The daughters of these women face a higher risk for cervical cancer. DES is no longer prsecribed.

  3. c

    National Lung Screening Trial

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    dicom, docx, n/a +2
    Updated Sep 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2021). National Lung Screening Trial [Dataset]. http://doi.org/10.7937/TCIA.HMQ8-J677
    Explore at:
    docx, svs, dicom, n/a, sas, zip, and docAvailable download formats
    Dataset updated
    Sep 24, 2021
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 24, 2021
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    https://www.cancerimagingarchive.net/wp-content/uploads/nctn-logo-300x108.png" alt="" width="300" height="108" />

    Demographic Summary of Available Imaging

    CharacteristicValue (N = 26254)
    Age (years)Mean ± SD: 61.4± 5
    Median (IQR): 60 (57-65)
    Range: 43-75
    SexMale: 15512 (59%)
    Female: 10742 (41%)
    Race

    White: 23969 (91.3%)
    Black: 1135 (4.3%)
    Asian: 547 (2.1%)
    American Indian/Alaska Native: 88 (0.3%)
    Native Hawaiian/Other Pacific Islander: 87 (0.3%)
    Unknown: 428 (1.6%)

    Ethnicity

    Not Available

    Background: The aggressive and heterogeneous nature of lung cancer has thwarted efforts to reduce mortality from this cancer through the use of screening. The advent of low-dose helical computed tomography (CT) altered the landscape of lung-cancer screening, with studies indicating that low-dose CT detects many tumors at early stages. The National Lung Screening Trial (NLST) was conducted to determine whether screening with low-dose CT could reduce mortality from lung cancer.

    Methods: From August 2002 through April 2004, we enrolled 53,454 persons at high risk for lung cancer at 33 U.S. medical centers. Participants were randomly assigned to undergo three annual screenings with either low-dose CT (26,722 participants) or single-view posteroanterior chest radiography (26,732). Data were collected on cases of lung cancer and deaths from lung cancer that occurred through December 31, 2009. This dataset includes the low-dose CT scans from 26,254 of these subjects, as well as digitized histopathology images from 451 subjects.

    Results: The rate of adherence to screening was more than 90%. The rate of positive screening tests was 24.2% with low-dose CT and 6.9% with radiography over all three rounds. A total of 96.4% of the positive screening results in the low-dose CT group and 94.5% in the radiography group were false positive results. The incidence of lung cancer was 645 cases per 100,000 person-years (1060 cancers) in the low-dose CT group, as compared with 572 cases per 100,000 person-years (941 cancers) in the radiography group (rate ratio, 1.13; 95% confidence interval [CI], 1.03 to 1.23). There were 247 deaths from lung cancer per 100,000 person-years in the low-dose CT group and 309 deaths per 100,000 person-years in the radiography group, representing a relative reduction in mortality from lung cancer with low-dose CT screening of 20.0% (95% CI, 6.8 to 26.7; P=0.004). The rate of death from any cause was reduced in the low-dose CT group, as compared with the radiography group, by 6.7% (95% CI, 1.2 to 13.6; P=0.02).

    Conclusions: Screening with the use of low-dose CT reduces mortality from lung cancer. (Funded by the National Cancer Institute; National Lung Screening Trial ClinicalTrials.gov number, NCT00047385).

    Data Availability: A summary of the National Lung Screening Trial and its available datasets are provided on the Cancer Data Access System (CDAS). CDAS is maintained by Information Management System (IMS), contracted by the National Cancer Institute (NCI) as keepers and statistical analyzers of the NLST trial data. The full clinical data set from NLST is available through CDAS. Users of TCIA can download without restriction a publicly distributable subset of that clinical data, along with the CT and Histopathology images collected during the trial. (These previously were restricted.)

  4. Cancer incidence, by selected sites of cancer and sex, three-year average,...

    • www150.statcan.gc.ca
    • data.urbandatacentre.ca
    • +2more
    Updated Feb 14, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2018). Cancer incidence, by selected sites of cancer and sex, three-year average, census metropolitan areas [Dataset]. http://doi.org/10.25318/1310011201-eng
    Explore at:
    Dataset updated
    Feb 14, 2018
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Age standardized rate of cancer incidence, by selected sites of cancer and sex, three-year average, census metropolitan areas.

  5. r

    A geospatiotemporal and causal inference epidemiological exploration of...

    • researchdata.edu.au
    • data.mendeley.com
    Updated Aug 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Psychiatry; Albert Stuart Reece (2021). A geospatiotemporal and causal inference epidemiological exploration of substance and cannabinoid exposure as drivers of rising US pediatric cancer rates [Dataset] [Dataset]. http://doi.org/10.17632/CNWV9HDSPD.1
    Explore at:
    Dataset updated
    Aug 12, 2021
    Dataset provided by
    Edith Cowan University
    Authors
    Psychiatry; Albert Stuart Reece
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Background. Age-adjusted US total pediatric cancer incidence rates (TPCIR) rose 49% 1975-2015 for unknown reasons. Prenatal cannabis exposure has been linked with several pediatric cancers which together comprise the majority of pediatric cancer types. We investigated whether cannabis use was related spatiotemporally and causally to TPCIR.

    Methods. State-based age-adjusted TPCIR data was taken from the CDC Surveillance, Epidemiology and End Results cancer database 2003-2017. Drug exposure was taken from the nationally-representative National Survey of Drug Use and Health, response rate 74.1%. Drugs included were: tobacco, alcohol, cannabis, opioid analgesics and cocaine. This was supplemented by cannabinoid concentration data from the Drug Enforcement Agency and ethnicity and median household income data from US Census.

    Results. TPCIR rose while all drug use nationally fell, except for cannabis which rose. TPCIR in the highest cannabis use quintile was greater than in the lowest (β-estimate=1.31 (95%C.I. 0.82, 1.80), P=1.80x10-7) and the time:highest two quintiles interaction was significant (β-estimate=0.1395 (0.82, 1.80), P=1.00x10-14). In robust inverse probability weighted additive regression models cannabis was independently associated with TPCIR (β-estimate=9.55 (3.95, 15.15), P=0.0016). In interactive geospatiotemporal models including all drug, ethnic and income variables cannabis use was independently significant (β-estimate=45.67 (18.77, 72.56), P=0.0009). In geospatial models temporally lagged to 1,2,4 and 6 years interactive terms including cannabis were significant. Cannabis interactive terms at one and two degrees of spatial lagging were significant (from β-estimate=3954.04 (1565.01, 6343.09), P=0.0012). The interaction between the cannabinoids THC and cannabigerol was significant at zero, 2 and 6 years lag (from β-estimate=46.22 (30.06, 62.38), P=2.10x10-8). Cannabis legalization was associated with higher TPCIR (β-estimate=1.51 (0.68, 2.35), P=0.0004) and cannabis-liberal regimes were associated with higher time:TPCIR interaction (β-estimate=1.87x10-4, (2.9x10-5, 2.45x10-4), P=0.0208). 33/56 minimum e-Values were >5 and 6 were infinite.

    Conclusion. Data confirm a close relationship across space and lagged time between cannabis and TPCIR which was robust to adjustment, supported by inverse probability weighting procedures and accompanied by high e-Values making confounding unlikely and establishing the causal relationship. Cannabis-liberal jurisdictions were associated with higher rates of TPCIR and a faster rate of TPCIR increase. Data inform the broader general consideration of cannabinoid-induced genotoxicity.

  6. Cancer Statistics in US States

    • kaggle.com
    zip
    Updated Jun 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ms. Nancy Al Aswad (2022). Cancer Statistics in US States [Dataset]. https://www.kaggle.com/nancyalaswad90/cancer-statistics-in-us-states
    Explore at:
    zip(3328656 bytes)Available download formats
    Dataset updated
    Jun 17, 2022
    Authors
    Ms. Nancy Al Aswad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    What are Cancer Statistics in US States?

    The circled group of good survivors has genetic indicators of poor survivors (i.e. low ESR1 levels, which is typically the prognostic indicator of poor outcomes in breast cancer) – understanding this group could be critical for helping improve mortality rates for this disease. Why this group survived was quickly analysed by using the Outcome Column (here Event Death - which is binary - 0,1) as a Data Lens (which we term Supervised vs Unsupervised analyses).

    How to use this dataset

    • A network was built using only gene expression with 272 breast cancer patients (as rows), and 1570 columns.

    • Metadata includes patient info, treatment, and survival.

    • Each node is a group of patients similar to each other. Flares (left) represent sub-populations that are distinct from the larger population. (One differentiating factor between the two flares is estrogen expression (low = top flare, high = bottom flare)).

    • A bottom flare is a group of patients with 100% survival. The top flare shows a range of survival – very poor towards the tip (red), and very good near the base (circled).

    Acknowledgments

    When we use this dataset in our research, we credit the authors as :

    The main idea for uploading this dataset is to practice data analysis with my students, as I am working in college and want my student to train our studying ideas in a big dataset, It may be not up to date and I mention the collecting years, but it is a good resource of data to practice

  7. b

    One year survival from all cancers - ICP Outcomes Framework - Registered...

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Sep 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). One year survival from all cancers - ICP Outcomes Framework - Registered Locality [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/one-year-survival-from-all-cancers-icp-outcomes-framework-registered-locality/
    Explore at:
    csv, excel, json, geojsonAvailable download formats
    Dataset updated
    Sep 9, 2025
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This dataset provides insights into one-year survival rates from all cancers, serving as a key indicator of early cancer outcomes. It measures the proportion of individuals diagnosed with an invasive cancer who survive for at least one year following their diagnosis. The dataset includes all invasive tumours classified under ICD-10 codes C00 to C97, excluding non-melanoma skin cancer (C44). It supports analysis across different population groups and geographies, including ethnicity, deprivation levels, and the Birmingham and Solihull (BSol) area.

    Rationale

    Improving one-year survival rates is a critical goal in cancer care, as it reflects the effectiveness of early diagnosis and initial treatment. This indicator helps monitor progress in reducing early mortality from cancer and supports targeted interventions to improve outcomes.

    Numerator

    The numerator includes individuals who were diagnosed with a specific type of cancer and died from the same type of cancer within one year of diagnosis. Only invasive cancers are included, as defined by ICD-10 codes C00 to C97, excluding non-melanoma skin cancer (C44). Data is sourced from the National Cancer Registration and Analysis Service (NCRAS).

    Denominator

    The denominator comprises all individuals diagnosed with an invasive cancer (ICD-10 codes C00 to C97, excluding C44) within a five-year period. This data is also sourced from the National Cancer Registration and Analysis Service (NCRAS).

    Caveats

    This dataset uses a simplified methodology that differs from the national calculation of one-year cancer survival. As a result, the figures presented here may not align with nationally published statistics. However, this approach enables the provision of survival data disaggregated by ethnicity, deprivation, and local geographies such as BSol, which is not always possible with national data.

    External references

    For more information, visit the National Cancer Registration and Analysis Service (NCRAS).

    Localities ExplainedThis dataset contains data based on either the resident locality or registered locality of the patient, a distinction is made between resident locality and registered locality populations:Resident Locality refers to individuals who live within the defined geographic boundaries of the locality. These boundaries are aligned with official administrative areas such as wards and Lower Layer Super Output Areas (LSOAs).Registered Locality refers to individuals who are registered with GP practices that are assigned to a locality based on the Primary Care Network (PCN) they belong to. These assignments are approximate—PCNs are mapped to a locality based on the location of most of their GP surgeries. As a result, locality-registered patients may live outside the locality, sometimes even in different towns or cities.This distinction is important because some health indicators are only available at GP practice level, without information on where patients actually reside. In such cases, data is attributed to the locality based on GP registration, not residential address.

    Click here to explore more from the Birmingham and Solihull Integrated Care Partnerships Outcome Framework.

  8. Colorectal Cancer Global Dataset & Predictions

    • kaggle.com
    zip
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankush Panday (2025). Colorectal Cancer Global Dataset & Predictions [Dataset]. https://www.kaggle.com/datasets/ankushpanday2/colorectal-cancer-global-dataset-and-predictions
    Explore at:
    zip(4118299 bytes)Available download formats
    Dataset updated
    Feb 27, 2025
    Authors
    Ankush Panday
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains real-world information about colorectal cancer cases from different countries. It includes patient demographics, lifestyle risks, medical history, cancer stage, treatment types, survival chances, and healthcare costs. The dataset follows global trends in colorectal cancer incidence, mortality, and prevention.

    Use this dataset to build models for cancer prediction, survival analysis, healthcare cost estimation, and disease risk factors.

    Dataset Structure Each row represents an individual case, and the columns include:

    Patient_ID (Unique identifier) Country (Based on incidence distribution) Age (Following colorectal cancer age trends) Gender (M/F, considering men have 30-40% higher risk) Cancer_Stage (Localized, Regional, Metastatic) Tumor_Size_mm (Randomized within medical limits) Family_History (Yes/No) Smoking_History (Yes/No) Alcohol_Consumption (Yes/No) Obesity_BMI (Normal/Overweight/Obese) Diet_Risk (Low/Moderate/High) Physical_Activity (Low/Moderate/High) Diabetes (Yes/No) Inflammatory_Bowel_Disease (Yes/No) Genetic_Mutation (Yes/No) Screening_History (Regular/Irregular/Never) Early_Detection (Yes/No) Treatment_Type (Surgery/Chemotherapy/Radiotherapy/Combination) Survival_5_years (Yes/No) Mortality (Yes/No) Healthcare_Costs (Country-dependent, $25K-$100K+) Incidence_Rate_per_100K (Country-level prevalence) Mortality_Rate_per_100K (Country-level mortality) Urban_or_Rural (Urban/Rural) Economic_Classification (Developed/Developing) Healthcare_Access (Low/Moderate/High) Insurance_Status (Insured/Uninsured) Survival_Prediction (Yes/No, based on factors)

  9. Association of Arsenic Exposure with Lung Cancer Incidence Rates in the...

    • plos.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph J. Putila; Nancy Lan Guo (2023). Association of Arsenic Exposure with Lung Cancer Incidence Rates in the United States [Dataset]. http://doi.org/10.1371/journal.pone.0025886
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joseph J. Putila; Nancy Lan Guo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    BackgroundAlthough strong exposure to arsenic has been shown to be carcinogenic, its contribution to lung cancer incidence in the United States is not well characterized. We sought to determine if the low-level exposures to arsenic seen in the U.S. are associated with lung cancer incidence after controlling for possible confounders, and to assess the interaction with smoking behavior. MethodologyMeasurements of arsenic stream sediment and soil concentration obtained from the USGS National Geochemical Survey were combined, respectively, with 2008 BRFSS estimates on smoking prevalence and 2000 U.S. Census county level income to determine the effects of these factors on lung cancer incidence, as estimated from respective state-wide cancer registries and the SEER database. Poisson regression was used to determine the association between each variable and age-adjusted county-level lung cancer incidence. ANOVA was used to assess interaction effects between covariates. Principal FindingsSediment levels of arsenic were significantly associated with an increase in incident cases of lung cancer (P

  10. The Combined Effect of Individual and Neighborhood Socioeconomic Status on...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    doc
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chun-Ming Chang; Yu-Chieh Su; Ning-Sheng Lai; Kuang-Yung Huang; Sou-Hsin Chien; Yu-Han Chang; Wei-Cheng Lian; Ta-Wen Hsu; Ching-Chih Lee (2023). The Combined Effect of Individual and Neighborhood Socioeconomic Status on Cancer Survival Rates [Dataset]. http://doi.org/10.1371/journal.pone.0044325
    Explore at:
    docAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Chun-Ming Chang; Yu-Chieh Su; Ning-Sheng Lai; Kuang-Yung Huang; Sou-Hsin Chien; Yu-Han Chang; Wei-Cheng Lian; Ta-Wen Hsu; Ching-Chih Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThis population-based study investigated the relationship between individual and neighborhood socioeconomic status (SES) and mortality rates for major cancers in Taiwan. MethodsA population-based follow-up study was conducted with 20,488 cancer patients diagnosed in 2002. Each patient was traced to death or for 5 years. The individual income-related insurance payment amount was used as a proxy measure of individual SES for patients. Neighborhood SES was defined by income, and neighborhoods were grouped as living in advantaged or disadvantaged areas. The Cox proportional hazards model was used to compare the death-free survival rates between the different SES groups after adjusting for possible confounding and risk factors. ResultsAfter adjusting for patient characteristics (age, gender, Charlson Comorbidity Index Score, urbanization, and area of residence), tumor extent, treatment modalities (operation and adjuvant therapy), and hospital characteristics (ownership and teaching level), colorectal cancer, and head and neck cancer patients under 65 years old with low individual SES in disadvantaged neighborhoods conferred a 1.5 to 2-fold higher risk of mortality, compared with patients with high individual SES in advantaged neighborhoods. A cross-level interaction effect was found in lung cancer and breast cancer. Lung cancer and breast cancer patients less than 65 years old with low SES in advantaged neighborhoods carried the highest risk of mortality. Prostate cancer patients aged 65 and above with low SES in disadvantaged neighborhoods incurred the highest risk of mortality. There was no association between SES and mortality for cervical cancer and pancreatic cancer. ConclusionsOur findings indicate that cancer patients with low individual SES have the highest risk of mortality even under a universal health-care system. Public health strategies and welfare policies must continue to focus on this vulnerable group.

  11. Data from: Factors that affect survival in vaginal cancer: a seer analysis

    • tandf.figshare.com
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Batuhan Bakirarar; Muberra Namli Kalem; Ziya Kalem (2024). Factors that affect survival in vaginal cancer: a seer analysis [Dataset]. http://doi.org/10.6084/m9.figshare.19745627.v1
    Explore at:
    application/x-dosexecAvailable download formats
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Batuhan Bakirarar; Muberra Namli Kalem; Ziya Kalem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study aimed to investigate the factors that affect survival in vaginal cancer by means of a large population-based database that had been monitored over a 42 year period (1975–2017). It was investigated which factors were most predictive in survival. This study evaluated the factors that affect survival in primary vaginal cancer as one of the rarest gynaecological cancers. Relationships were explored between survival and age and race of patient, in situ/invasive behaviour of tumour, histological type, stage, grade, surgical treatment, and year of diagnosis. Survival rate was found to be higher at younger ages and earlier stages, in in situ and squamous cell carcinomas, in the presence of previous surgery, and diagnosis from 2000 onward. It was shown that other causes were more predictive of mortality in older patients and that mortality due to other causes decreased in patients diagnosed from 2000 onward. Mortalities due to cancer were found to be lower in the patients who had underwent surgery. At the end of this study, an estimation model was developed for 10-year survival in vaginal cancer and software was created for the model. Impact StatementWhat is already known on this subject? Primary vaginal cancer is very rare, accounting for 2% of female genital tract malignancies. Due to its low incidence and difficulty of its final diagnosis, vaginal cancer has the least amount of data among all female genital tract malignancies. It is difficult for clinicians to estimate the survival with already limited data on vaginal cancer in the literature.What do the results of this study add? Survival rate was found to be higher at younger ages and earlier stages, in in situ and squamous cell carcinomas, in the presence of previous surgery, and diagnosis from 2000 onward. It was shown that other causes were more effective in mortality with older age and that mortality due to other causes decreased in patients diagnosed from 2000 onward. Mortalities due to cancer were found to be lower in the patients who had underwent surgery.What are the implications of these findings for clinical practice and/or further research? It is anticipated that such studies will contribute to the transformation of societal data collection methods into a prospective nature and lead the way for stronger survival estimation models to be developed in days to come. What is already known on this subject? Primary vaginal cancer is very rare, accounting for 2% of female genital tract malignancies. Due to its low incidence and difficulty of its final diagnosis, vaginal cancer has the least amount of data among all female genital tract malignancies. It is difficult for clinicians to estimate the survival with already limited data on vaginal cancer in the literature. What do the results of this study add? Survival rate was found to be higher at younger ages and earlier stages, in in situ and squamous cell carcinomas, in the presence of previous surgery, and diagnosis from 2000 onward. It was shown that other causes were more effective in mortality with older age and that mortality due to other causes decreased in patients diagnosed from 2000 onward. Mortalities due to cancer were found to be lower in the patients who had underwent surgery. What are the implications of these findings for clinical practice and/or further research? It is anticipated that such studies will contribute to the transformation of societal data collection methods into a prospective nature and lead the way for stronger survival estimation models to be developed in days to come.

  12. The associations of sitting time and physical activity on total and...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vegar Rangul; Erik R. Sund; Paul Jarle Mork; Oluf Dimitri Røe; Adrian Bauman (2023). The associations of sitting time and physical activity on total and site-specific cancer incidence: Results from the HUNT study, Norway [Dataset]. http://doi.org/10.1371/journal.pone.0206015
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Vegar Rangul; Erik R. Sund; Paul Jarle Mork; Oluf Dimitri Røe; Adrian Bauman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Norway
    Description

    BackgroundSedentary behavior is thought to pose different risks to those attributable to physical inactivity. However, few studies have examined the association between physical activity and sitting time with cancer incidence within the same population.MethodsWe followed 38,154 healthy Norwegian adults in the Nord-Trøndelag Health Study (HUNT) for cancer incidence from 1995–97 to 2014. Cox proportional hazards regression was used to estimate risk of site-specific and total cancer incidence by baseline sitting time and physical activity.ResultsDuring the 16-years follow-up, 4,196 (11%) persons were diagnosed with cancer. We found no evidence that people who had prolonged sitting per day or had low levels of physical activity had an increased risk of total cancer incidence, compared to those who had low sitting time and were physically active. In the multivariate model, sitting ≥8 h/day was associated with 22% (95% CI, 1.05–1.42) higher risk of prostate cancer compared to sitting 16.6 MET-h/week). The joint effects of physical activity and sitting time the indicated that prolonged sitting time increased the risk of CRC independent of physical activity in men.ConclusionsOur findings suggest that prolonged sitting and low physical activity are positively associated with colorectal-, prostate- and lung cancer among men. Sitting time and physical activity were not associated with cancer incidence among women. The findings emphasizing the importance of reducing sitting time and increasing physical activity.

  13. Table_1_Racial and regional disparities of triple negative breast cancer...

    • frontiersin.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Zhang; Yuhui Bai; Caixing Sun; Zhangchun Lv; Shihua Wang (2023). Table_1_Racial and regional disparities of triple negative breast cancer incidence rates in the United States: An analysis of 2011–2019 NPCR and SEER incidence data.docx [Dataset]. http://doi.org/10.3389/fpubh.2022.1058722.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Wei Zhang; Yuhui Bai; Caixing Sun; Zhangchun Lv; Shihua Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    ObjectiveTriple negative breast cancer (TNBC) is a more aggressive subtype resistant to conventional treatments with a poorer prognosis. This study was to update the status of TNBC and the temporal changes of its incidence rate in the US.MethodsWomen diagnosed with breast cancer during 2011–2019 were obtained from the National Program of Cancer Registries (NPCR) and Surveillance, Epidemiology and End Results (SEER) Program SEER*Stat Database which covers the entire population of the US. The TNBC incidence and its temporal trends by race, age, region (state) and disease stage were determined during the period.ResultsA total of 238,848 (or 8.8%) TNBC women were diagnosed during the study period. TNBC occurred disproportionally higher in women of Non-Hispanic Black, younger ages, with cancer at a distant stage or poorly/undifferentiated. The age adjusted incidence rate (AAIR) for TNBC in all races decreased from 14.8 per 100,000 in 2011 to 14.0 in 2019 (annual percentage change (APC) = −0.6, P = 0.024). Incidence rates of TNBC significantly decreased with APCs of −0.8 in Non-Hispanic White women, −1.3 in West and −0.7 in Northeastern regions. Women with TNBC at the age of 35–49, 50–59, and 60–69 years, and the disease at the regional stage displayed significantly decreased trends. Among state levels, Mississippi (20.6) and Louisiana (18.9) had the highest, while Utah (9.1) and Montana (9.6) had the lowest AAIRs in 2019. New Hampshire and Indiana had significant and highest decreases, while Louisiana and Arkansas had significant and largest increases in AAIR. In individual races, TNBC displayed disparities in temporal trends among age groups, regions and disease stages. Surprisingly, Non-Hispanic White and Hispanic TNBC women (0–34 years), and Non-Hispanic Black women (≥70 years) during the entire period, as well as Asian or Pacific Islander women in the South region had increased trends between 2011 and 2017.ConclusionOur study demonstrates an overall decreased trend of TNBC incidence in the past decade. Its incidence displayed disparities among races, age groups, regions and disease stages. Special attention is needed for a heavy burden in Non-Hispanic Black and increased trends in certain groups.

  14. Oncotype-DX Breast Cancer Dataset

    • kaggle.com
    zip
    Updated Aug 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryad Z (2022). Oncotype-DX Breast Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/rzemouri/oncotypedx-breast-cancer-dataset
    Explore at:
    zip(148884 bytes)Available download formats
    Dataset updated
    Aug 28, 2022
    Authors
    Ryad Z
    Description

    Oncotype DX (ODX) is a multi-gene expression signature designed for estrogen receptor (ER)-positive and human epidermal growth factor receptor 2 (HER2)-negative breast cancer patients to predict the recurrence score (RS) and chemotherapy (CT) benefit. The aim of our study is to develop a prediction tool for the three RS’s categories based on deep multi-layer perceptrons (DMLP) and using only the morphoimmunohistological variables. We performed a retrospective cohort of 320 patients who underwent ODX testing from three French hospitals. Clinico-pathological characteristics were recorded. We built a supervised machine learning classification model using Matlab software with 152 cases for the training and 168 cases for the testing. Three classifiers were used to learn the three risk categories of the ODX, namely the low, intermediate, and high risk. Experimental results provide the area under the curve (AUC), respectively, for the three risk categories: 0.63 [95% confidence interval: (0.5446, 0.7154), p < 0.001], 0.59 [95% confidence interval: (0.5031, 0.6769), p < 0.001], 0.75 [95% confidence interval: (0.6184, 0.8816), p < 0.001]. Concordance rate between actual RS and predicted RS ranged from 53 to 56% for each class between DMLP and ODX. The concordance rate of low and intermediate combined risk group was 85%.We developed a predictive machine learning model that could help to define patient’s RS. Moreover, we integrated histopathological data and DMLP results to select tumor for ODX testing. Thus, this process allows more relevant use of histopathological data, and optimizes and enhances this information.

    Relevant Papers and Citation Request:

    Prediction of Oncotype DX recurrence score using deep multi-layer perceptrons in estrogen receptor-positive, HER2-negative breast cancer May 2020 Breast Cancer 27(5) DOI: 10.1007/s12282-020-01100-4

    Breast cancer diagnosis based on joint variable selection and Constructive Deep Neural Network, March 2018, Conference: 2018 IEEE 4th Middle East Conference on Biomedical Engineering (MECBME), DOI:10.1109/MECBME.2018.8402426

    Constructive Deep Neural Network for Breast Cancer Diagnosis, January 2018, IFAC-PapersOnLine 51(27):98-103, DOI:10.1016/j.ifacol.2018.11.660

  15. Duke Lung Cancer Screening Dataset 2024 - part 2

    • zenodo.org
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Lo; Joseph Lo; Fakrul Islam Tushar; Fakrul Islam Tushar (2025). Duke Lung Cancer Screening Dataset 2024 - part 2 [Dataset]. http://doi.org/10.5281/zenodo.12784601
    Explore at:
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joseph Lo; Joseph Lo; Fakrul Islam Tushar; Fakrul Islam Tushar
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Note - This is part 2 of the dataset.

    Part 1 can be found at : https://zenodo.org/records/13799069
    Part 2 can be found at : https://zenodo.org/records/12784601
    Part 3 can be found at : https://zenodo.org/records/14659131

    Background: Lung cancer risk classification is an increasingly important area of research as low-dose thoracic CT screening programs have become standard of care for patients at high risk for lung cancer. There is limited availability of large, annotated public databases for the training and testing of algorithms for lung nodule classification.

    Methods: Screening chest CT scans done between January 1, 2015 and June 30, 2021 at Duke University Health System were considered for this study. Efficient nodule annotation was performed semi-automatically by using a publicly available deep learning nodule detection algorithm trained on the LUNA16 dataset to identify initial candidates, which were then accepted based on nodule location in the radiology text report or manually annotated by a medical student and a fellowship-trained cardiothoracic radiologist.

    Results: The dataset contains 1613 CT volumes with 2487 annotated nodules, selected from a total dataset of 2061 patients, with the remaining data reserved for future testing. Radiologist spot-checking confirmed the semi-automated annotation had an accuracy rate of >90%.

    Conclusions: The Duke Lung Cancer Screening Dataset 2024 is the first large dataset for CT screening for lung cancer reflecting the use of current CT technology. This represents a useful resource of lung cancer risk classification research, and the efficient annotation methods described for its creation may be used to generate similar databases for research in the future.

    Dataset part Details:
    Part 1: DLCS subset 1 to 7 and, metadata and Annotations.
    Part 2: DLCS subset 8,9 and CT image info metadata.
    Part 3: DLCS subset 10.

    Updates and Versions:

    1. Part 1, Version 1.0 (Published on [03/05/2024]): Released initial dataset, including partial data subsets 1 to 7 and 3D bounding box annotations of the lung nodules.
    2. Part 1, Version 1.1 (Published on [09/19/2024]): Added metadata file (DLCSD24_metadata_v1.1.xlsx) and updated the dataset description and title. 10.5281/zenodo.13799069
    3. Part 2, Version 1.0 (Published on [02/04/2025]): Released DLCS subset 8,9, CT image info metadata (DLCSD24_CT_ImageInfo_v1.csv and metadata documentation).
    4. Part 3, Version 1.0 (Published on [02/04/2025]): Released DLCS subset 10.


    Code Repository:

    To support reproducible open-access research and benchmarking, we have shared several pre-trained models and baseline results in a GitHub and GitLab repository.

    GitLab: https://gitlab.oit.duke.edu/cvit-public/ai_lung_health_benchmarking
    GitHub:
    https://github.com/fitushar/AI-in-Lung-Health-Benchmarking-Detection-and-Diagnostic-Models-Across-Multiple-CT-Scan-Datasets

    Funding:
    This work was supported by the Duke Department of Radiology Charles E. Putman Vision Award, NIH/NIBIB P41-EB028744, and NIH/NCI R01-CA261457.

  16. f

    Table_1_Evaluating Neighborhood Correlates and Geospatial Distribution of...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aracelis Z. Torres; Darcy Phelan-Emrick; Carlos Castillo-Salgado (2023). Table_1_Evaluating Neighborhood Correlates and Geospatial Distribution of Breast, Cervical, and Colorectal Cancer Incidence.pdf [Dataset]. http://doi.org/10.3389/fonc.2018.00471.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Aracelis Z. Torres; Darcy Phelan-Emrick; Carlos Castillo-Salgado
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: Though cancer research has traditionally centered on individual-level exposures, there is growing interest in the geography of both cancer and its risk factors. This geographic and epidemiological research has consistently shown that cancer outcomes and their known causal exposures exhibit geographic variation that coincide with area-level socioeconomic status and the composition of neighborhoods. A retrospective study was conducted to evaluate geospatial variation for female breast, cervical, and colorectal cancer incidence in Baltimore City.Materials and Methods: Using a Maryland Cancer Registry dataset of incident breast, cervical, and colorectal cancers (N = 4,966) among Baltimore City female residents diagnosed from 2000 to 2010, spatial and epidemiological analyses were conducted through choropleth maps, spatial cluster identification, and local Moran's I. Ordinary least squares regression models identified characteristics associated with the geospatial clusters.Results: Each cancer type exhibited geographic variation across Baltimore City with the neighborhoods showing high incidence differing by cancer type. Specifically, breast cancer had significant low incidence in downtown Baltimore while cervical cancer had high incidence. The neighborhood covariates associated with the geographic variation also differed by cancer type while local Moran's I identified discordant clusters.Discussion: Cancer incidence varied geographically by cancer type within a single city (county). Small area estimates are needed to detect local patterns of disease when developing health and preventative programs. Given the observed variability of community-level characteristics associated with each cancer type incidence, local information is essential for developing place-, social-, and outcome-specific interventions.

  17. Z

    Dataset related to article "Incidence and predictors of hepatocellular...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maisonneuve, P (2024). Dataset related to article "Incidence and predictors of hepatocellular carcinoma in patients with autoimmune hepatitis" [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_10532882
    Explore at:
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    Dalekos, GN
    van den Berg, AP
    de Boer, YS
    Colapietro, D
    Aghemo, Alessio
    van der Meer, AJ
    Lytvyak, E
    Macedo, G
    van den Brand, FF
    Carella, F
    Slooter, CD
    Muratori, P
    Beuers, U
    Di Zeo-Sánchez, DE
    Andrade, RJ
    International Autoimmune Hepatitis Group
    Dutch AIH Study Group
    Maisonneuve, P
    LLEO, Ana
    Verdonk, RC
    Montano-Loza, AJ
    Liberal, R
    Brouwer, JT
    Kuiken, SD
    van Hoek, B
    Robles, M
    Zachou, K
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains raw data related to article “Incidence and predictors of hepatocellular carcinoma in patients with autoimmune hepatitis"

    Abstract

    Background and aims: Autoimmune hepatitis (AIH) is a rare chronic liver disease of unknown aetiology; the risk of hepatocellular carcinoma (HCC) remains unclear and risk factors are not well-defined. We aimed to investigate the risk of HCC across a multicentre AIH cohort and to identify predictive factors.

    Methods: We performed a retrospective, observational, multicentric study of patients included in the International Autoimmune Hepatitis Group Retrospective Registry. The assessed clinical outcomes were HCC development, liver transplantation, and death. Fine and Gray regression analysis stratified by centre was applied to determine the effects of individual covariates; the cumulative incidence of HCC was estimated using the competing risk method with death as a competing risk.

    Results: A total of 1,428 patients diagnosed with AIH from 1980 to 2020 from 22 eligible centres across Europe and Canada were included, with a median follow-up of 11.1 years (interquartile range 5.2-15.9). Two hundred and ninety-three (20.5%) patients had cirrhosis at diagnosis. During follow-up, 24 patients developed HCC (1.7%), an incidence rate of 1.44 cases/1,000 patient-years; the cumulative incidence of HCC increased over time (0.6% at 5 years, 0.9% at 10 years, 2.7% at 20 years, and 6.6% at 30 years of follow-up). Patients who developed cirrhosis during follow-up had a significantly higher incidence of HCC. The cumulative incidence of HCC was 2.6%, 4.6%, 5.6% and 6.6% at 5, 10, 15, and 20 years after the development of cirrhosis, respectively. Obesity (hazard ratio [HR] 2.94, p = 0.04), cirrhosis (HR 3.17, p = 0.01), and AIH/PSC variant syndrome (HR 5.18, p = 0.007) at baseline were independent risk factors for HCC development.

    Conclusions: HCC incidence in AIH is low even after cirrhosis development and is associated with risk factors including obesity, cirrhosis, and AIH/PSC variant syndrome.

    Impact and implications: The risk of developing hepatocellular carcinoma (HCC) in individuals with autoimmune hepatitis (AIH) seems to be lower than for other aetiologies of chronic liver disease. Yet, solid data for this specific patient group remain elusive, given that most of the existing evidence comes from small, single-centre studies. In our study, we found that HCC incidence in patients with AIH is low even after the onset of cirrhosis. Additionally, factors such as advanced age, obesity, cirrhosis, alcohol consumption, and the presence of the AIH/PSC variant syndrome at the time of AIH diagnosis are linked to a higher risk of HCC. Based on these findings, there seems to be merit in adopting a specialized HCC monitoring programme for patients with AIH based on their individual risk factors.

  18. f

    Data from: Multivariate Analyses to Assess the Effects of Surgeon and...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jul 17, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lee, Ching-Chih; Huang, Kuang-Yung; Chen, Ting-Chang; Chang, Chun-Ming; Hsu, Ta-Wen; Su, Yu-Chieh; Yang, Wei-Zhen; Chou, Pesus (2012). Multivariate Analyses to Assess the Effects of Surgeon and Hospital Volume on Cancer Survival Rates: A Nationwide Population-Based Study in Taiwan [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001144908
    Explore at:
    Dataset updated
    Jul 17, 2012
    Authors
    Lee, Ching-Chih; Huang, Kuang-Yung; Chen, Ting-Chang; Chang, Chun-Ming; Hsu, Ta-Wen; Su, Yu-Chieh; Yang, Wei-Zhen; Chou, Pesus
    Area covered
    Taiwan
    Description

    BackgroundPositive results between caseloads and outcomes have been validated in several procedures and cancer treatments. However, there is limited information available on the combined effects of surgeon and hospital caseloads. We used nationwide population-based data to explore the association between surgeon and hospital caseloads and survival rates for major cancers. MethodologyA total of 11677 patients with incident cancer diagnosed in 2002 were identified from the Taiwan National Health Insurance Research Database. Survival analysis, the Cox proportional hazards model, and propensity scores were used to assess the relationship between 5-year survival rates and different caseload combinations. ResultsBased on the Cox proportional hazard model, cancer patients treated by low-volume surgeons in low-volume hospitals had poorer survival rates, and hazard ratios ranged from 1.3 in head and neck cancer to 1.8 in lung cancer after adjusting for patients’ demographic variables, co-morbidities, and treatment modality. When analyzed using the propensity scores, the adjusted 5-year survival rates were poorer for patients treated by low-volume surgeons in low-volume hospitals, compared to those treated by high-volume surgeons in high-volume hospitals (P<0.005). ConclusionsAfter adjusting for differences in the case mix, cancer patients treated by low-volume surgeons in low-volume hospitals had poorer 5-year survival rates. Payers may implement quality care improvement in low-volume surgeons.

  19. a

    LGA15 Breast and Cervical Cancer Screening Program - 2010-2012 - Dataset -...

    • data.aurin.org.au
    Updated Mar 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). LGA15 Breast and Cervical Cancer Screening Program - 2010-2012 - Dataset - AURIN [Dataset]. https://data.aurin.org.au/dataset/tua-phidu-tua-phidu-2015-lga-aust-scr-asgc-exc-tas-nt-2010-12-lga2011
    Explore at:
    Dataset updated
    Mar 6, 2025
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    The number of females who participated in a breast cancer screening program and there proportion of the relevant population, as well as the number of people diagnosed with breast cancer as a rate of those who participated, 2010-2011 (NSW, Vic, Qld, SA & WA). Source: Compiled by PHIDU based on data from BreastScreen NSW, BreastScreen Vic, BreastScreen Qld, BreastScreen WA - 2010 and 2011.The Dataset also contains the number of females who participated in a cervical cancer screening program and there proportion of the relevant population, as well as the number of the people diagnosed with low/high cervical cancer as a rate of those who participated, 2010-2011 (NSW, Vic, Qld, SA, WA & ACT). Source: Compiled by PHIDU based on data from the NSW Department of Health and NSW Central Cancer Registry, 2011 and 2012; Victorian Cervical Cytology Registry, 2011 and 2012; Queensland Health Cancer Services Screening Branch, 2011 and 2012; SA Cervix Screening Program, 2011 and 2012; Western Australia Cervical Cytology Register, 2011 and 2012; and ACT Cytology Register, 2011 and 2012.For both sets of screening if a women was screened more than twice in the two year period she is counted once only (all entries that were classified as not shown, not published or not applicable were assigned a null value; no data was provided for Maralinga Tjarutja LGA, in South Australia). The data is by LGA 2015 profile (based on the LGA 2011 geographic boundaries). For more information on statistics used please refer to the PHIDU website, available from: http://phidu.torrens.edu.au/

  20. f

    Counties rankings for each risk factor associated with breast cancer...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Feb 19, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fan, Kang-Hsien; Cook, Rebecca S.; Brantley-Sieders, Dana M.; Shyr, Yu; Deming-Halverson, Sandra L. (2013). Counties rankings for each risk factor associated with breast cancer generated an integrated quartile score. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001663528
    Explore at:
    Dataset updated
    Feb 19, 2013
    Authors
    Fan, Kang-Hsien; Cook, Rebecca S.; Brantley-Sieders, Dana M.; Shyr, Yu; Deming-Halverson, Sandra L.
    Description

    Counties were ranked for each risk factor associated with breast cancer in numerical order according to data presented in Table 2 and Table 3. Based on their numerical ranking in each dataset category, each county was assigned a risk factor quartile score, with 1 indicating the lowest quartile, and 4 indicating the highest quartile. The quartile score for breast cancer mortality rate and breast cancer incidence rate was weighted double. The sum of the quartile scores of each category was caluclated for each county to generate the integrated quartile score. A high integrated quartile score is intended to reflect the county with the greatest need of breast cancer-related resources aimed at reducing breast cancer mortality.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2022). Cancer County-Level [Dataset]. https://www.kaggle.com/datasets/thedevastator/exploring-county-level-correlations-in-cancer-ra
Organization logo

Cancer County-Level

Study country level cancer correlations

Explore at:
21 scholarly articles cite this dataset (View in Google Scholar)
zip(146998 bytes)Available download formats
Dataset updated
Dec 3, 2022
Authors
The Devastator
Description

Exploring County-Level Correlations in Cancer Rates and Trends

A Multivariate Ordinary Least Squares Regression Model

By Noah Rippner [source]

About this dataset

This dataset offers a unique opportunity to examine the pattern and trends of county-level cancer rates in the United States at the individual county level. Using data from cancer.gov and the US Census American Community Survey, this dataset allows us to gain insight into how age-adjusted death rate, average deaths per year, and recent trends vary between counties – along with other key metrics like average annual counts, met objectives of 45.5?, recent trends (2) in death rates, etc., captured within our deep multi-dimensional dataset. We are able to build linear regression models based on our data to determine correlations between variables that can help us better understand cancers prevalence levels across different counties over time - making it easier to target health initiatives and resources accurately when necessary or desired

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This kaggle dataset provides county-level datasets from the US Census American Community Survey and cancer.gov for exploring correlations between county-level cancer rates, trends, and mortality statistics. This dataset contains records from all U.S counties concerning the age-adjusted death rate, average deaths per year, recent trend (2) in death rates, average annual count of cases detected within 5 years, and whether or not an objective of 45.5 (1) was met in the county associated with each row in the table.

To use this dataset to its fullest potential you need to understand how to perform simple descriptive analytics which includes calculating summary statistics such as mean, median or other numerical values; summarizing categorical variables using frequency tables; creating data visualizations such as charts and histograms; applying linear regression or other machine learning techniques such as support vector machines (SVMs), random forests or neural networks etc.; differentiating between supervised vs unsupervised learning techniques etc.; reviewing diagnostics tests to evaluate your models; interpreting your findings; hypothesizing possible reasons and patterns discovered during exploration made through data visualizations ; Communicating and conveying results found via effective presentation slides/documents etc.. Having this understanding will enable you apply different methods of analysis on this data set accurately ad effectively.

Once these concepts are understood you are ready start exploring this data set by first importing it into your visualization software either tableau public/ desktop version/Qlikview / SAS Analytical suite/Python notebooks for building predictive models by loading specified packages based on usage like Scikit Learn if Python is used among others depending on what tool is used . Secondly a brief description of the entire table's column structure has been provided above . Statistical operations can be carried out with simple queries after proper knowledge of basic SQL commands is attained just like queries using sub sets can also be performed with good command over selecting columns while specifying conditions applicable along with sorting operations being done based on specific attributes as required leading up towards writing python codes needed when parsing specific portion of data desired grouping / aggregating different categories before performing any kind of predictions / models can also activated create post joining few tables possible , when ever necessary once again varying across tools being used Thereby diving deep into analyzing available features determined randomly thus creating correlation matrices figures showing distribution relationships using correlation & covariance matrixes , thus making evaluations deducing informative facts since revealing trends identified through corresponding scatter plots from a given metric gathered from appropriate fields!

Research Ideas

  • Building a predictive cancer incidence model based on county-level demographic data to identify high-risk areas and target public health interventions.
  • Analyzing correlations between age-adjusted death rate, average annual count, and recent trends in order to develop more effective policy initiatives for cancer prevention and healthcare access.
  • Utilizing the dataset to construct a machine learning algorithm that can predict county-level mortality rates based on socio-economic factors such as poverty levels and educational attainment rates

Acknowledgements

If you use this dataset i...

Search
Clear search
Close search
Google apps
Main menu