Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description: Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area. The key challenges against it’s detection is how to classify tumors into malignant (cancerous) or benign(non cancerous). We ask you to complete the analysis of classifying these tumors using machine learning (with SVMs) and the Breast Cancer Wisconsin (Diagnostic) Dataset. Acknowledgements: This dataset has been referred from Kaggle. Objective: Understand the Dataset & cleanup (if required). Build classification models to predict whether the cancer type is Malignant or Benign. Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Some racial and ethnic categories are suppressed for privacy and to avoid misleading estimates when the relative standard error exceeds 30% or the unweighted sample size is less than 50 respondents.
Data Source: Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey (BRFSS) Data
Why This Matters
Breast cancer is the most commonly diagnosed cancer in women and people assigned female at birth (AFAB) and the second leading cause of cancer death in the U.S. Breast cancer screenings can save lives by helping to detect breast cancer in its early stages when treatment is more effective.
While non-Hispanic white women and AFAB individuals are more likely to be diagnosed with breast cancer than their counterparts of other races and ethnicities, non-Hispanic Black women and AFAB individuals die from breast cancer at a significantly higher rate than their counterparts races and ethnicities.
Later-stage diagnoses and prolonged treatment duration partly explain these disparities in mortality rate. Structural barriers to quality health care, insurance, education, affordable housing, and sustainable income that disproportionately affect communities of color also drive racial inequities in breast cancer screenings and mortality.
The District Response
Project Women Into Staying Healthy (WISH) provides free breast and cervical cancer screenings to uninsured or underinsured women and AFAB adults aged 21 to 64. Patient navigation, transportation assistance, and cancer education are also provided.
DC Health’s Cancer and Chronic Disease Prevention Bureau works with healthcare providers to improve the use of preventative health services and provide breast cancer screening services.
DC Health maintains the District of Columbia Cancer Registry (DCCR) to track cancer incidences, examine environmental substances that cause cancer, and identify differences in cancer incidences by age, gender, race, and geographical location.
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
breastcanc-ultrasound-class
Background
Cancer is the second leading cause of death worldwide, according to IHME - Global Burden of Disease, with 10.7 mln casualties in 2019.
Amongst the various types of cancer, a huge role is played by breast cancer, which stands in 4th position among the deadliest tumors, with more than 700.000 deaths during 2019 (IHME - Global Burden of Disease).
Moreover, breast cancer has the highest share of number of cases/100 people worldwide… See the full description on the dataset page: https://huggingface.co/datasets/as-cle-bert/breastcancer-auto-segmentation.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Processing of the huge 314GB+ Dataset (Include 54713 Images) of this competition into TFRecords for fast dataloading during training.
All images are resized to 768x1280 and saved in 100 TFRecords, making each TFRecord contain roughly 548 images as 8.6GB+ Dataset.
TFRecords have the benefit of loading large chunks of data containing many samples instead of loading every image and label seperately.
Dataset Description
Note: The dataset for this challenge contains radiographic breast images of female subjects. The goal of this competition is to identify cases of breast cancer in mammograms from screening exams. It is important to identify cases of cancer for obvious reasons, but false positives also have downsides for patients. As millions of women get mammograms each year, a useful machine learning tool could help a great many people. This competition uses a hidden test. When your submitted notebook is scored the actual test data (including a full length sample submission) will be made available to your notebook.
Files
[train/test]_images/[patient_id]/[image_id].dcm The mammograms, in dicom format. You can expect roughly 8,000 patients in the hidden test set. There are usually but not always 4 images per patient. Note that many of the images use the jpeg 2000 format which may you may need special libraries to load.
sample_submission.csv A valid sample submission. Only the first few rows are available for download.
[train/test].csv Metadata for each patient and image. Only the first few rows of the test set are available for download.
site_id - ID code for the source hospital. patient_id - ID code for the patient. image_id - ID code for the image. laterality - Whether the image is of the left or right breast. view - The orientation of the image. The default for a screening exam is to capture two views per breast. age - The patient's age in years. implant - Whether or not the patient had breast implants. Site 1 only provides breast implant information at the patient level, not at the breast level. density - A rating for how dense the breast tissue is, with A being the least dense and D being the most dense. Extremely dense tissue can make diagnosis more difficult. Only provided for train. machine_id - An ID code for the imaging device. cancer - Whether or not the breast was positive for malignant cancer. The target value. Only provided for train. biopsy - Whether or not a follow-up biopsy was performed on the breast. Only provided for train. invasive - If the breast is positive for cancer, whether or not the cancer proved to be invasive. Only provided for train. BIRADS - 0 if the breast required follow-up, 1 if the breast was rated as negative for cancer, and 2 if the breast was rated as normal. Only provided for train. prediction_id - The ID for the matching submission row. Multiple images will share the same prediction ID. Test only. difficult_negative_case - True if the case was unusually difficult. Only provided for train.
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
A measure of the number of adults diagnosed with breast, lung or colorectal cancer in a year who are still alive five years after diagnosis.
ONS still publish survival percentages for individual types of cancers. These can be found at: http://www.ons.gov.uk/ons/rel/cancer-unit/cancer-survival/cancer-survival-in-england--patients-diagnosed-2007-2011-and-followed-up-to-2012/index.html
A time series for five-year survival figures for breast, lung and colorectal cancer individually (previous NHS Outcomes Framework indicators 1.4.ii, 1.4.iv and 1.4.vi) is still published and can be found under the link 'Indicator data - previous methodology (.xls)' below.
Purpose
This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with breast, lung or colorectal cancer.
Current version updated: May-14
Next version due: To be confirmed
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
breastcanc-ultrasound-class
Background
Cancer is the second leading cause of death worldwide, according to IHME - Global Burden of Disease, with 10.7 mln casualties in 2019.
Amongst the various types of cancer, a huge role is played by breast cancer, which stands in 4th position among the deadliest tumors, with more than 700.000 deaths during 2019 (IHME - Global Burden of Disease).
Moreover, breast cancer has the highest share of number of cases/100 people worldwide… See the full description on the dataset page: https://huggingface.co/datasets/as-cle-bert/breastcancer-auto-objdetect.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
One woman in nine can expect to develop breast cancer during her lifetime and one in 25 will die from the disease. Statistically low incidences of breast cancer are found in Newfoundland and Labrador, the territories, and northern areas of most provinces. Otherwise, each province has one or more pockets of significantly high breast cancer incidence. These are often located in more southerly areas, but they do not seem to be restricted to either urban or rural areas alone. Breast cancer rates are a health status indicator. They can be used to help assess health conditions. Health status refers to the state of health of a person or group, and measures causes of sickness and death. It can also include people’s assessment of their own health.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Breast Cancer (METABRIC)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/gunesevitan/breast-cancer-metabric on 12 November 2021.
--- Dataset description provided by original source is as follows ---
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
--- Original source retains full ownership of the source dataset ---
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
A measure of the number of adults diagnosed with breast, lung or colorectal cancer in a year who are still alive one year after diagnosis.
ONS still publish survival percentages for individual types of cancers. These can be found at: http://www.ons.gov.uk/ons/rel/cancer-unit/cancer-survival/cancer-survival-in-england--patients-diagnosed-2007-2011-and-followed-up-to-2012/index.html
A time series for one-year survival figures for breast, lung and colorectal cancer individually (previous NHS Outcomes Framework indicators 1.4.i, 1.4.iii and 1.4.v) is still published and can be found under the link 'Indicator data - previous methodology (.xls)' below.
Purpose
This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with breast, lung or colorectal cancer.
Current version updated: Feb-14
Next version due: To be confirmed
The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
THE TRAINING DATA SET CONTAIN 78 SAMPLES FROM 34 ARE THOSE PATIENTS WHO HAD DEVELOPED DISTANCE METASTASES WITHIN 5 YEARS (LABELLED AS RELAPSE) THE REST SAMPLES ARE FROM THOSE WHO ARE HEALTHY (LABELLED AS NON-RELAPSE), WHEREAS TEST DATA SET CONTAINING 19 SAMPLES 12(RELAPSE) 7(NON-RELAPSE), THERE ARE 2448 GENES (FEATURES) IN DATASET.
There's a story behind every dataset and here's your opportunity to share yours.
THE TRAINING DATA SET CONTAIN 78 SAMPLES FROM 34 ARE THOSE PATIENTS WHO HAD DEVELOPED DISTANCE METASTASES WITHIN 5 YEARS (LABELLED AS RELAPSE) THE REST SAMPLES ARE FROM THOSE WHO ARE HEALTHY (LABELLED AS NON-RELAPSE), WHEREAS TEST DATA SET CONTAINING 19 SAMPLES 12(RELAPSE) 7(NON-RELAPSE), THERE ARE 2448 GENES (FEATURES) IN DATASET.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
THE TRAINING DATA SET CONTAIN 78 SAMPLES FROM 34 ARE THOSE PATIENTS WHO HAD DEVELOPED DISTANCE METASTASES WITHIN 5 YEARS (LABELLED AS RELAPSE) THE REST SAMPLES ARE FROM THOSE WHO ARE HEALTHY (LABELLED AS NON-RELAPSE), WHEREAS TEST DATA SET CONTAINING 19 SAMPLES 12(RELAPSE) 7(NON-RELAPSE), THERE ARE 2448 GENES (FEATURES) IN DATASET. We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This dataset consists of 256 breast ultrasound scans collected from 256 patients and 266 benign and malignant segmented lesions. It includes patient-level labels, image-level annotations, and tumor-level labels with all cases confirmed by follow-up care or biopsy result. Each scan was manually annotated and labeled by a radiologist experienced in breast ultrasound examination. In particular, each tumor was identified in the image via a freehand annotation and labeled according to BIRADS features. The tumor histopathological classification is stated for patients who underwent a biopsy. Patient-level labels include clinical data such as age, breast tissue composition, signs and symptoms. Image-level freehand annotations identify the tumor and other abnormal areas in the image. The tumor and image are labeled with BIRADS category, 7 BIRADS descriptors, and interpretation of critical findings as presence of breast diseases. Additional labels include the method of verification, tumor classification and histopathological diagnosis.
Since the role of machine learning and theoretical computing towards the development of augmented inference in the field of cancer detection is indisputable, the quality of the data used to develop any explainable augmented inference methods is extremely important. This dataset can be used as an external testing set for assessing a model’s performance and for developing explainable AI or supervised machine learning models for the detection, segmentation and classification of breast abnormalities in ultrasound images.
A detailed description of this dataset can be found here and should be cited along with the citation of the data:
Pawłowska, A., Ćwierz-Pieńkowska, A., Domalik, A., Jaguś, D., Kasprzak, P., Matkowski, R., Fura, Ł., Nowicki, A., & Zolek, N. A Curated benchmark dataset for ultrasound based breast lesion analysis. Sci Data 11, 148 (2024). https://doi.org/10.1038/s41597-024-02984-z.
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
breastcanc-ultrasound-class
Background
Cancer is the second leading cause of death worldwide, according to IHME - Global Burden of Disease, with 10.7 mln casualties in 2019.
Amongst the various types of cancer, a huge role is played by breast cancer, which stands in 4th position among the deadliest tumors, with more than 700.000 deaths during 2019 (IHME - Global Burden of Disease).
Moreover, breast cancer has the highest share of number of cases/100 people worldwide… See the full description on the dataset page: https://huggingface.co/datasets/as-cle-bert/breastcanc-ultrasound-class.
##
The current study aims to build the first digitalized mammogram dataset for breast cancer in Saudi Arabia, depend on the BI-RADS categories, to solve the availability problem of local public datasets by collecting, categorizing, and annotating mammogram images, supporting the medical field by providing physicians with different diagnosed cases especially in Saudi Arabia The dataset was collected from Sheikh Mohammed Hussein Al-Amoudi Center of Excellence in Breast Cancer at King Abdulaziz University in Jeddah, Saudi Arabia, from April 2019 to March 2020 and the annotated was between April and June 2020. The dataset contains 1521 cases; all cases include images with two types of views (CC and MLO) for both breasts (right and left), making a total of 6109 mammogram images. The dataset was classified into 0 to 5 categories in accordance with BI-RADS
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundA rapid surge of female breast cancer has been observed in young women in several East Asian countries. The BIM deletion polymorphism, which confers cell resistance to apoptosis, was recently found exclusively in East Asian people with prevalence rate of 12%. We aimed to evaluate the possible role of this genetic alteration in carcinogenesis of breast cancer in East Asians.MethodFemale healthy volunteers (n = 307), patients in one consecutive stage I-III breast cancer cohort (n = 692) and one metastatic breast cancer cohort (n = 189) were evaluated. BIM wild-type and deletion alleles were separately genotyped in genomic DNAs.ResultsBoth cancer cohorts consistently showed inverse associations between the BIM deletion polymorphism and patient age (≤35 y vs. 36-50 y vs. >50 y: 29% vs. 22% vs. 15%, P = 0.006 in the consecutive cohort, and 40% vs. 23% vs. 13%, P = 0.023 in the metastatic cohort). In healthy volunteers, the frequencies of the BIM deletion polymorphism were similar (13%-14%) in all age groups. Further analyses indicated that the BIM deletion polymorphism was not associated with specific clinicopathologic features, but it was associated with poor overall survival (adjusted hazard ratio 1.71) in the consecutive cohort.ConclusionsBIM deletion polymorphism may be involved in the tumorigenesis of the early-onset breast cancer among East Asians.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This map uses age-standardized ratios to further aid in regional comparisons. A value of 1.0 would indicate that the region rate is identical to the overall Canadian rate; a value greater than 1.0 would indicate that the rate for that region is higher than the Canadian rate; and, in turn, a ratio value less than 1.0 would indicate that the rate for the specific region is lower than the Canadian rate. Statistically low incidences of breast cancer are found in Newfoundland and Labrador, the territories, and northern areas of most provinces. Otherwise, each province has one or more pockets of significantly high breast cancer incidence. Health status refers to the state of health of a person or group, and measures causes of sickness and death. It can also include people’s assessment of their own health.
This dataset is sourced from Public Health England and consists of the percentage of people in the resident population eligible for cervical screening who were screened adequately within the previous years (2010 to 2016) for bowel, cervical and breast cancer.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Characteristic | Value (N = 984) |
---|---|
Age (years) | Mean ± SD: 53.2 ± 11 Median (IQR): 53 (45-60) Range: 25-86 |
Sex | Female: 984 (100%) |
Race | White: 898 (91.3%) |
Ethnicity | Hispanic: 38 (3.9%) |
This dataset relates to NCI Clinical trial, "Magnetic Resonance Imaging in Women Recently Diagnosed With Unilateral Breast Cancer (ACRIN-6667)". The dataset consists of 984 patients but only 969 were included in the primary data analysis due to study criteria.
Even after careful clinical and mammographic evaluation, cancer is found in the contralateral breast in up to 10% of women who have received treatment for unilateral breast cancer. ACRIN 6667 was conducted to determine whether magnetic resonance imaging (MRI) could improve on clinical breast examination and mammography in detecting contralateral breast cancer soon after the initial diagnosis of unilateral breast cancer. Additional information about the trial is available in the Study Protocol and Case Report Forms.
METHODS
A total of 969 women with a recent diagnosis of unilateral breast cancer and no abnormalities on mammographic and clinical examination of the contralateral breast underwent breast MRI. The diagnosis of MRI-detected cancer was confirmed by means of biopsy within 12 months after study entry. The absence of breast cancer was determined by means of biopsy, the absence of positive findings on repeat imaging and clinical examination, or both at 1 year of follow-up.
RESULTS
MRI detected clinically and mammographically occult breast cancer in the contralateral breast in 30 of 969 women who were enrolled in the study (3.1%). The sensitivity of MRI in the contralateral breast was 91%, and the specificity was 88%. The negative predictive value of MRI was 99%. A biopsy was performed on the basis of a positive MRI finding in 121 of the 969 women (12.5%), 30 of whom had specimens that were positive for cancer (24.8%); 18 of the 30 specimens were positive for invasive cancer. The mean diameter of the invasive tumors detected was 10.9 mm. The additional number of cancers detected was not influenced by breast density, menopausal status, or the histologic features of the primary tumor.
CONCLUSIONS
MRI can detect cancer in the contralateral breast that is missed by mammography and clinical examination at the time of the initial breast-cancer diagnosis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description: Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area. The key challenges against it’s detection is how to classify tumors into malignant (cancerous) or benign(non cancerous). We ask you to complete the analysis of classifying these tumors using machine learning (with SVMs) and the Breast Cancer Wisconsin (Diagnostic) Dataset. Acknowledgements: This dataset has been referred from Kaggle. Objective: Understand the Dataset & cleanup (if required). Build classification models to predict whether the cancer type is Malignant or Benign. Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.