100+ datasets found
  1. i

    SEER Breast Cancer Data

    • ieee-dataport.org
    • zenodo.org
    • +1more
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jing teng (2025). SEER Breast Cancer Data [Dataset]. https://ieee-dataport.org/open-access/seer-breast-cancer-data
    Explore at:
    Dataset updated
    Jun 16, 2025
    Authors
    jing teng
    Description

    examined regional LNs

  2. NCI State Late Stage Breast Cancer Incidence Rates

    • hub.arcgis.com
    Updated Jan 21, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Cancer Institute (2020). NCI State Late Stage Breast Cancer Incidence Rates [Dataset]. https://hub.arcgis.com/datasets/9dd0d923f8034cc8806173fdc224777d
    Explore at:
    Dataset updated
    Jan 21, 2020
    Dataset authored and provided by
    National Cancer Institutehttp://www.cancer.gov/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Description

    This dataset contains Cancer Incidence data for Breast Cancer (Late Stage^) including: Age-Adjusted Rate, Confidence Interval, Average Annual Count, and Trend field information for US States for the average 5 year span from 2016 to 2020.Data are for females segmented by age (All Ages, Ages Under 50, Ages 50 & Over, Ages Under 65, and Ages 65 & Over), with field names and aliases describing the sex and age group tabulated.For more information, visit statecancerprofiles.cancer.govData NotationsState Cancer Registries may provide more current or more local data.TrendRising when 95% confidence interval of average annual percent change is above 0.Stable when 95% confidence interval of average annual percent change includes 0.Falling when 95% confidence interval of average annual percent change is below 0.† Incidence rates (cases per 100,000 population per year) are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84, 85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Rates calculated using SEER*Stat. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used for SEER and NPCR incidence rates.‡ Incidence Trend data come from different sources. Due to different years of data availability, most of the trends are AAPCs based on APCs but some are APCs calculated in SEER*Stat. Please refer to the source for each area for additional information.Rates and trends are computed using different standards for malignancy. For more information see malignant.^ Late Stage is defined as cases determined to be regional or distant. Due to changes in stage coding, Combined Summary Stage (2004+) is used for data from Surveillance, Epidemiology, and End Results (SEER) databases and Merged Summary Stage is used for data from National Program of Cancer Registries databases. Due to the increased complexity with staging, other staging variables maybe used if necessary.Data Source Field Key(1) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(5) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(6) Source: National Program of Cancer Registries SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention (based on the 2022 submission).(7) Source: SEER November 2022 submission.(8) Source: Incidence data provided by the SEER Program. AAPCs are calculated by the Joinpoint Regression Program and are based on APCs. Data are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84,85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used with SEER November 2022 data.Some data are not available, see Data Not Available for combinations of geography, cancer site, age, and race/ethnicity.Data for the United States does not include data from Nevada.Data for the United States does not include Puerto Rico.

  3. p

    Breast Cancer Prediction Dataset - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Breast Cancer Prediction Dataset - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/breast-cancer-prediction-dataset
    Explore at:
    Dataset updated
    Oct 7, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Worldwide, breast cancer is the most common type of cancer in women and the second highest in terms of mortality rates.Diagnosis of breast cancer is performed when an abnormal lump is found (from self-examination or x-ray) or a tiny speck of calcium is seen (on an x-ray). After a suspicious lump is found, the doctor will conduct a diagnosis to determine whether it is cancerous and, if so, whether it has spread to other parts of the body. This breast cancer dataset was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.

  4. p

    BREAST CANCER - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). BREAST CANCER - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/breast-cancer
    Explore at:
    Dataset updated
    Oct 7, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset of breast cancer patients was obtained from the 2017 November update of the SEER Program of the NCI, which provides information on population-based cancer statistics. The dataset involved female patients with infiltrating duct and lobular carcinoma breast cancer (SEER primary cites recode NOS histology codes 8522/3) diagnosed in 2006-2010. Patients with unknown tumour size, examined regional LNs, positive regional LNs, and patients whose survival months were less than 1 month were excluded; thus, 4024 patients were ultimately included.

  5. Number and rates of new cases of primary cancer, by cancer type, age group...

    • www150.statcan.gc.ca
    • datasets.ai
    • +2more
    Updated May 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2021). Number and rates of new cases of primary cancer, by cancer type, age group and sex [Dataset]. http://doi.org/10.25318/1310011101-eng
    Explore at:
    Dataset updated
    May 19, 2021
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.

  6. Late-Stage Female Breast Cancer Incidence Rate (cases per 100,000 females),...

    • healthdata.nj.gov
    • data.wu.ac.at
    application/rdfxml +5
    Updated May 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New Jersey State Cancer Registry, Cancer Epidemiology Services, New Jersey Department of Health (2019). Late-Stage Female Breast Cancer Incidence Rate (cases per 100,000 females), New Jersey, by year: Beginning 2010 [Dataset]. https://healthdata.nj.gov/dataset/Late-Stage-Female-Breast-Cancer-Incidence-Rate-cas/3hep-nd78
    Explore at:
    json, tsv, csv, application/rssxml, application/rdfxml, xmlAvailable download formats
    Dataset updated
    May 10, 2019
    Dataset provided by
    New Jersey Department of Healthhttps://www.nj.gov/health/
    Authors
    New Jersey State Cancer Registry, Cancer Epidemiology Services, New Jersey Department of Health
    Area covered
    New Jersey
    Description

    Rate: Number of new cases of breast cancer (per 100,000) diagnosed at the regional or distant stage among females.

    Definition: Age-adjusted incidence rate of invasive breast cancer per 100,000 female population.

    Data Sources:

    (1) NJ State Cancer Registry, Dec 31, 2015 Analytic File, using NCI SEER*Stat ver 8.2.1 (www.seer.cancer.gov/seerstat)

    (2) NJ population estimates as calculated by the NCI's SEER Program, released January 2015, http://www.seer.cancer.gov/popdata/download.html.

  7. Cancer Statistics in US States

    • kaggle.com
    Updated Jun 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ms. Nancy Al Aswad (2022). Cancer Statistics in US States [Dataset]. https://www.kaggle.com/datasets/nancyalaswad90/cancer-statistics-in-us-states
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 17, 2022
    Dataset provided by
    Kaggle
    Authors
    Ms. Nancy Al Aswad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    What are Cancer Statistics in US States?

    The circled group of good survivors has genetic indicators of poor survivors (i.e. low ESR1 levels, which is typically the prognostic indicator of poor outcomes in breast cancer) – understanding this group could be critical for helping improve mortality rates for this disease. Why this group survived was quickly analysed by using the Outcome Column (here Event Death - which is binary - 0,1) as a Data Lens (which we term Supervised vs Unsupervised analyses).

    How to use this dataset

    • A network was built using only gene expression with 272 breast cancer patients (as rows), and 1570 columns.

    • Metadata includes patient info, treatment, and survival.

    • Each node is a group of patients similar to each other. Flares (left) represent sub-populations that are distinct from the larger population. (One differentiating factor between the two flares is estrogen expression (low = top flare, high = bottom flare)).

    • A bottom flare is a group of patients with 100% survival. The top flare shows a range of survival – very poor towards the tip (red), and very good near the base (circled).

    Acknowledgments

    When we use this dataset in our research, we credit the authors as :

    The main idea for uploading this dataset is to practice data analysis with my students, as I am working in college and want my student to train our studying ideas in a big dataset, It may be not up to date and I mention the collecting years, but it is a good resource of data to practice

  8. f

    Data from: BreCaHAD: A Dataset for Breast Cancer Histopathological...

    • figshare.com
    png
    Updated Jan 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alper Aksac; Douglas J. Demetrick; Tansel Özyer; Reda Alhajj (2019). BreCaHAD: A Dataset for Breast Cancer Histopathological Annotation and Diagnosis [Dataset]. http://doi.org/10.6084/m9.figshare.7379186.v3
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jan 28, 2019
    Dataset provided by
    figshare
    Authors
    Alper Aksac; Douglas J. Demetrick; Tansel Özyer; Reda Alhajj
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of 1 .xlsx file, 2 .png files, 1 .json file and 1 .zip file:annotation_details.xlsx: The distribution of annotations in the previously mentioned six classes (mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule) is presented in a Excel spreadsheet.original.png: The input image.annotated.png: An example from the dataset. In the annotated image, blue circles indicate the tumor nuclei, pink circles show non-tumor nuclei such as blood cells, stroma nuclei, and lymphocytes; orange and green circles are mitosis and apoptosis, respectively; light blue circles are true lumen for tubules, and yellow circles represent white regions (non-lumen) such as fat, blood vessel, and broken tissues.data.json: The annotations for the BreCaHAD dataset are provided in JSON (JavaScript Object Notation) format. In the given example, the JSON file (ground truth) contains two mitosis and only one tumor nuclei annotations. Here, x and y are the coordinates of the centroid of the annotated object, and the values are between 0, 1.BreCaHAD.zip: An archive file containing dataset. Three folders are included: images (original images), groundTruth (json files), and groundTruth_display (groundTruth applied on original images)

  9. d

    Year, State-wise Estimated Incidence of Breast Cancer and Cervical Cancer

    • dataful.in
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataful (Factly) (2025). Year, State-wise Estimated Incidence of Breast Cancer and Cervical Cancer [Dataset]. https://dataful.in/datasets/20842
    Explore at:
    xlsx, application/x-parquet, csvAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset authored and provided by
    Dataful (Factly)
    License

    https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions

    Area covered
    States of India
    Variables measured
    Estimated incidence
    Description

    The dataset consists of the state wise estimated incidence of breast cancer and cervical cancer in India as per the National Cancer Registry Programme. The estimates are computer using age specific incidence Rate of 28 PBCRs of 2012-2016 and the projected population (person-years). NB: Incidence estimates of breast cancer is available since 2016 while that of cervical cancer is available since 2015.

  10. Cancer registration statistics, England

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Apr 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2019). Cancer registration statistics, England [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/cancerregistrationstatisticscancerregistrationstatisticsengland
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 26, 2019
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Cancer diagnoses and age-standardised incidence rates for all types of cancer by age and sex including breast, prostate, lung and colorectal cancer.

  11. r

    CSAW-CC (mammography) – a dataset for AI research to improve screening,...

    • researchdata.se
    • demo.researchdata.se
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fredrik Strand (2025). CSAW-CC (mammography) – a dataset for AI research to improve screening, diagnostics and prognostics of breast cancer [Dataset]. http://doi.org/10.5878/45vm-t798
    Explore at:
    (9211529), (29050)Available download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Karolinska Institutet
    Authors
    Fredrik Strand
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2008 - 2015
    Area covered
    Stockholm County
    Description

    The dataset contains x-ray images, mammography, from breast cancer screening at the Karolinska University Hospital, Stockholm, Sweden, collected by principal investigator Fredrik Strand at Karolinska Institutet. The purpose for compiling the dataset was to perform AI research to improve screening, diagnostics and prognostics of breast cancer.

    The dataset is based on a selection of cases with and without a breast cancer diagnosis, taken from a more comprehensive source dataset.

    1,103 cases of first-time breast cancer for women in the screening age range (40-74 years) during the included time period (November 2008 to December 2015) were included. Of these, a random selection of 873 cases have been included in the published dataset.

    A random selection of 10,000 healthy controls during the same time period were included. Of these, a random selection of 7,850 cases have been included in the published dataset.

    For each individual all screening mammograms, also repeated over time, were included; as well as the date of screening and the age. In addition, there are pixel-level annotations of the tumors created by a breast radiologist (small lesions such as micro-calcifications have been annotated as an area). Annotations were also drawn in mammograms prior to diagnosis; if these contain a single pixel it means no cancer was seen but the estimated location of the center of the future cancer was shown by a single pixel annotation.

    In addition to images, the dataset also contains cancer data created at the Karolinska University Hospital and extracted through the Regional Cancer Center Stockholm-Gotland. This data contains information about the time of diagnosis and cancer characteristics including tumor size, histology and lymph node metastasis.

    The precision of non-image data was decreased, through categorisation and jittering, to ensure that no single individual can be identified.

    The following types of files are available: - CSV: The following data is included (if applicable): cancer/no cancer (meaning breast cancer during 2008 to 2015), age group at screening, days from image to diagnosis (if any), cancer histology, cancer size group, ipsilateral axillary lymph node metastasis. There is one csv file for the entire dataset, with one row per image. Any information about cancer diagnosis is repeated for all rows for an individual who was diagnosed (i.e., it is also included in rows before diagnosis). For each exam date there is the assessment by radiologist 1, radiologist 2 and the consensus decision. - DICOM: Mammograms. For each screening, four images for the standard views were acuqired: left and right, mediolateral oblique and craniocaudal. There should be four files per examination date. - PNG: Cancer annotations. For each DICOM image containing a visible tumor.

    Access: The dataset is available upon request due to the size of the material. The image files in DICOM and PNG format comprises approximately 2.5 TB. Access to the CSV file including parametric data is possible via download as associated documentation.

  12. Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Cancer Institute (NCI), National Institutes of Health (NIH) (2025). Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER) Registries Limited-Use [Dataset]. https://catalog.data.gov/dataset/cancer-incidence-surveillance-epidemiology-and-end-results-seer-registries-limited-use
    Explore at:
    Dataset updated
    Jul 16, 2025
    Dataset provided by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    SEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.

  13. l

    Breast Cancer Mortality

    • data.lacounty.gov
    • geohub.lacity.org
    • +1more
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County of Los Angeles (2023). Breast Cancer Mortality [Dataset]. https://data.lacounty.gov/items/6caa94a1a238475fa5a33cab41fcd63b
    Explore at:
    Dataset updated
    Dec 19, 2023
    Dataset authored and provided by
    County of Los Angeles
    Area covered
    Description

    Death rate has been age-adjusted to the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Obesity can increase an individual’s lifetime risk of breast cancer. Promoting healthy food retail and physical activity and improving access to preventive care services are important measures that cities and communities can take to prevent breast cancer.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.

  14. Data from: Metadata and data files supporting the related article:...

    • springernature.figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maeve Mullooly; Babak Ehteshami Bejnordi; Ruth M. Pfeiffer; Shaoqi Fan; Maya Palakal; Manila Hada; Pamela M. Vacek; Donald L. Weaver; John A. Shepherd; Bo Fan; Amir Pasha Mahmoudzadeh; Jeff Wang; Serghei Malkov; Jason M. Johnson; Sally D. Herschorn; Brian L. Sprague; Stephen Hewitt; Louise A. Brinton; Nico Karssemeijer; Jeroen van der Laak; Andrew Beck; Mark E. Sherman; Gretchen L. Gierach (2023). Metadata and data files supporting the related article: Application of convolutional neural networks to breast biopsies to delineate tissue correlates of mammographic breast density [Dataset]. http://doi.org/10.6084/m9.figshare.9786152.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Maeve Mullooly; Babak Ehteshami Bejnordi; Ruth M. Pfeiffer; Shaoqi Fan; Maya Palakal; Manila Hada; Pamela M. Vacek; Donald L. Weaver; John A. Shepherd; Bo Fan; Amir Pasha Mahmoudzadeh; Jeff Wang; Serghei Malkov; Jason M. Johnson; Sally D. Herschorn; Brian L. Sprague; Stephen Hewitt; Louise A. Brinton; Nico Karssemeijer; Jeroen van der Laak; Andrew Beck; Mark E. Sherman; Gretchen L. Gierach
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Breast density is a radiologic feature that reflects fibroglandular tissue content relative to breast area or volume, and it is a breast cancer risk factor.

    This study employed deep learning approaches to identify histologic correlates in radiologically-guided biopsies that may underlie breast density and distinguish cancer among women with elevated and low density.

    Data access: Datasets supporting figure 2, tables 2 and 3 and supplementary table 2 of the published article are publicly available in the figshare repository, as part of this data record (https://doi.org/10.6084/m9.figshare.9786152). These datasets are contained in the zip file NPJ FigShare.zip. Datasets supporting figure 3, table 1 and supplementary table 1 of the published article are not publicly available to protect patient privacy, but can be made available on request from Dr. Gretchen L. Gierach, Senior Investigator, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA, email address: gierachg@mail.nih.gov.

    Study description and aims: The study aimed to identify tissue correlates of breast density that may be important for distinguishing malignant from benign biopsy diagnoses separately among women with high and low breast density, to help inform cancer risk stratification among women undergoing a biopsy following an abnormal mammogram.

    Haematoxylin and eosin (H&E)-stained digitized images from image-guided breast biopsies (n=852 patients) were evaluated. Breast density was assessed as global and localized fibroglandular volume (%). A convolutional neural network characterized H&E composition. 37 features were extracted from the network output, describing tissue quantities and morphological structure. A random forest regression model was trained to identify correlates most predictive of fibroglandular volume (n=588). Correlations between predicted and radiologically quantified fibroglandular volume were assessed in 264 independent patients. A second random forest classifier was trained to predict diagnosis (invasive vs. benign); performance was assessed using area under receiver-operating characteristics curves (AUC). For more details on the methodology please see the published article.

    Study approval: The Institutional Review Boards at the NCI and the University of Vermont approved the protocol for this project for either active consenting or a waiver of consent to enrol participants, link data and perform analytical studies.

    Dataset descriptions:

    Data supporting figure 2: Datasets Figure 2A H&E.jpg, Figure 2A Mammogram.jpg, Figure 2B H&E.jpg and Figure 2B Mammogram.jpg are in .jpg file format and consist of histological whole slide H&E images and corresponding full-field digital mammograms from patients whose biopsies yielded diagnoses of atypical ductal hyperplasia and invasive carcinoma.

    Data supporting figure 3: Dataset Figure 3.xls is in .xls file format and contains raw data used to generate the Receiver Operating Characteristic (ROC) curves for the prediction of invasive cancer among women with high percent global fibroglandular volume, low percent global fibroglandular volume, high percent localized fibroglandular volume and low percent localized fibroglandular volume.

    Data supporting table 1: Dataset Table1_analysis.sas7bdat is in SAS file format and contains the characteristics of study participants in the BREAST Stamp Project, who were referred for an image-guided breast biopsy, stratified by the training and testing sets (n = 852).

    Data supporting table 2: Datasets Global FGV.xls (accompanying Global FGV.png file) and Localized FGV.xls (accompanying Localized FGV.png file) are in .xls file format and the accompanying files are in .png file format. The data contain histologic features identified in the random forest model for the prediction of global and localized % fibroglandular volume.

    Data supporting table 3: Datasets HighGlobal_feature_importance.xls, HighGlobal_feature_importance.pdf, HighLocal_feature_importance.xls, HighLocal_feature_importance.pdf, LowGlobal_feature_importance.xls, LowGlobal_feature_importance.pdf, LowLocal_feature_importance.xls, LowLocal_feature_importance.pdf are in .xls file format. The accompanying figures generated from the data in the .xls files are in .pdf file format. These files contain histologic features identified in the random forest model for the prediction of invasive cancer status among women with high vs. low % fibroglandular volume.

    Data supporting supplementary table 1: Datasets testfeatures.xls and trainfeatures.xls are in .xls file format and include the distribution and description of the 37 histologic features extracted from the convolutional neural network deep learning output in the H&E stained whole slide images from the training and testing sets.

    Data supporting supplementary table 2: Datasets All_samples_global.xls, All_samples_global.png, All_samples_local.xls, All_samples_local.png, PostMeno_global.xls, PostMeno_global.png, PostMeno_local.xls, PostMeno_local.png, PreMeno_global.xls, PreMeno_global.png, PreMeno_local.xls, PreMeno_local.png are in .xls file format. The accompanying figures generated from the data in the .xls files are in .png file format. These data include the histologic features identified in the random forest model that included BMI for the prediction of global and localized % fibroglandular volume.Software needed to access the data: Data files in SAS file format require the SAS software to be accessed.

  15. W

    Breast cancer: Mortality rate

    • cloud.csiss.gmu.edu
    • data.europa.eu
    html
    Updated Dec 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Kingdom (2019). Breast cancer: Mortality rate [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/breast_cancer_-_mortality_rate
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 25, 2019
    Dataset provided by
    United Kingdom
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    Deaths from breast cancer - Directly age-Standardised Rates (DSR) per 100,000 population Source: Office for National Statistics (ONS) Publisher: Information Centre (IC) - Clinical and Health Outcomes Knowledge Base Geographies: Local Authority District (LAD), Government Office Region (GOR), National, Primary Care Trust (PCT), Strategic Health Authority (SHA) Geographic coverage: England Time coverage: 2005-07, 2007 Type of data: Administrative data

  16. f

    Breast cancer screening programmes and breast cancer mortality: An...

    • figshare.com
    pdf
    Updated May 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Mahfuz Al Hasan (2025). Breast cancer screening programmes and breast cancer mortality: An observational study of 194 countries: Supplementary materials (Online repository) [Dataset]. http://doi.org/10.6084/m9.figshare.29144678.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 25, 2025
    Dataset provided by
    figshare
    Authors
    Syed Mahfuz Al Hasan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary content:Table S1: GATHER statement checklist, with a description of compliance and location of information for "Breast Cancer Screening Programs and Breast Cancer Mortality: An Observational Study of 194 Countries”Table S2: Countries with regular breast cancer screening programs at the national level, 2015–2021Table S3: Countries with inconsistent breast cancer screening programs at the national level, 2015–21Table S4: Countries without any breast cancer screening programs at the national level, 2015–2021Table S5: Annualized changes in age-standardized breast cancer mortality across the countries from 2015–2021Table S6: European countries with mammographic screening coverage data, 2013–21Table S7: European countries with screening coverage data included or excluded from the analysisFig. S1. Annualized per cent changes in breast cancer deaths between countries with regular and irregular national breast cancer screening programmes, overall and by age group, 2015–2021 and 2015–2019 (analysis excluding the 21 outlier countries)Fig. S2. Annualized per cent changes in breast cancer deaths between countries with regular and irregular national breast cancer screening programmes, overall and by age group, 2015–2021 and 2015–2019 (among the 21 outlier countries)

  17. RSNA Screening Mammography Breast Cancer Detection (RSNA-SMBC) Dataset

    • registry.opendata.aws
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radiological Society of North America (https://www.rsna.org/) (2024). RSNA Screening Mammography Breast Cancer Detection (RSNA-SMBC) Dataset [Dataset]. https://registry.opendata.aws/rsna-screening-mammography-breast-cancer-detection/
    Explore at:
    Dataset updated
    Aug 1, 2024
    Dataset provided by
    Radiological Society of North America
    Description

    According to the WHO, breast cancer is the most commonly occurring cancer worldwide. In 2020 alone, there were 2.3 million new breast cancer diagnoses and 685,000 deaths. Yet breast cancer mortality in high-income countries has dropped by 40% since the 1980s when health authorities implemented regular mammography screening in age groups considered at risk. Early detection and treatment are critical to reducing cancer fatalities, and your machine learning skills could help streamline the process radiologists use to evaluate screening mammograms. Currently, early detection of breast cancer requires the expertise of highly-trained human observers, making screening mammography programs expensive to conduct. RSNA collected screening mammograms and supporting information from two sites, totaling just under 20,000 imaging studies.

  18. d

    Cancer Registration Statistics, England 2020

    • digital.nhs.uk
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Cancer Registration Statistics, England 2020 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/cancer-registration-statistics
    Explore at:
    Dataset updated
    Oct 20, 2022
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Area covered
    England
    Description

    This publication reports on newly diagnosed cancers registered in England in addition to cancer deaths registered in England during 2020. It includes this summary report showing key findings, spreadsheet tables with more detailed estimates, and a methodology document.

  19. S

    machine learning models on the WDBC dataset

    • scidb.cn
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi Aghaziarati (2025). machine learning models on the WDBC dataset [Dataset]. http://doi.org/10.57760/sciencedb.23537
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Mahdi Aghaziarati
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset used in this study is the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, originally provided by the University of Wisconsin and obtained via Kaggle. It consists of 569 observations, each corresponding to a digitized image of a fine needle aspirate (FNA) of a breast mass. The dataset contains 32 attributes: one identifier column (discarded during preprocessing), one diagnosis label (malignant or benign), and 30 continuous real-valued features that describe the morphology of cell nuclei. These features are grouped into three statistical descriptors—mean, standard error (SE), and worst (mean of the three largest values)—for ten morphological properties including radius, perimeter, area, concavity, and fractal dimension. All feature values were normalized using z-score standardization to ensure uniform scale across models sensitive to input ranges. No missing values were present in the original dataset. Label encoding was applied to the diagnosis column, assigning 1 to malignant and 0 to benign cases. The dataset was split into training (80%) and testing (20%) sets while preserving class balance via stratified sampling. The accompanying Python source code (breast_cancer_classification_models.py) performs data loading, preprocessing, model training, evaluation, and result visualization. Four lightweight classifiers—Decision Tree, Naïve Bayes, Perceptron, and K-Nearest Neighbors (KNN)—were implemented using the scikit-learn library (version 1.2 or later). Performance metrics including Accuracy, Precision, Recall, F1-score, and ROC-AUC were calculated for each model. Confusion matrices and ROC curves were generated and saved as PNG files for interpretability. All results are saved in a structured CSV file (classification_results.csv) that contains the performance metrics for each model. Supplementary visualizations include all_feature_histograms.png (distribution plots for all standardized features), model_comparison.png (metric-wise bar plot), and feature_correlation_heatmap.png (Pearson correlation matrix of all 30 features). The data files are in standard CSV and PNG formats and can be opened using any spreadsheet or image viewer, respectively. No rare file types are used, and all scripts are compatible with any Python 3.x environment. This data package enables reproducibility and offers a transparent overview of how baseline machine learning models perform in the domain of breast cancer diagnosis using a clinically-relevant dataset.

  20. Breast Cancer Death Rate (per 100,000 females), New Jersey, by year:...

    • healthdata.nj.gov
    application/rdfxml +5
    Updated Dec 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Death Certificate Database, Office of Vital Statistics and Registry, New Jersey Department of Health (2020). Breast Cancer Death Rate (per 100,000 females), New Jersey, by year: Beginning 2010 [Dataset]. https://healthdata.nj.gov/dataset/Breast-Cancer-Death-Rate-per-100-000-females-New-J/u6nf-k4bd
    Explore at:
    application/rdfxml, json, application/rssxml, tsv, csv, xmlAvailable download formats
    Dataset updated
    Dec 8, 2020
    Dataset provided by
    New Jersey Department of Healthhttps://www.nj.gov/health/
    Authors
    Death Certificate Database, Office of Vital Statistics and Registry, New Jersey Department of Health
    Area covered
    New Jersey
    Description

    Rate: Number of deaths among females due to breast cancer per 100,000 female population.

    Definition: Number of deaths per 100,000 with malignant neoplasm (cancer) of the female breast as the underlying cause (ICD-10 codes: C33-C34).

    Data Sources:

    (1) Centers for Disease Control and Prevention, National Center for Health Statistics. Compressed Mortality File. CDC WONDER On-line Database accessed at http://wonder.cdc.gov/cmf-icd10.html

    (2) Death Certificate Database, Office of Vital Statistics and Registry, New Jersey Department of Health

    (3) Population Estimates, State Data Center, New Jersey Department of Labor and Workforce Development

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
jing teng (2025). SEER Breast Cancer Data [Dataset]. https://ieee-dataport.org/open-access/seer-breast-cancer-data

SEER Breast Cancer Data

Explore at:
15 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 16, 2025
Authors
jing teng
Description

examined regional LNs

Search
Clear search
Close search
Google apps
Main menu