By UCI [source]
This dataset contains data on breast cancer diagnosis, a devastating medical condition that affects thousands of people around the world each year. The data is comprised of patient ID, diagnosis (Malignant or Benign), and 30 computed features extracted from a digitized image of a fine needle aspirate (FNA) of a breast mass. Features include radius, texture, perimeter, area, smoothness, compactness concavity and concave points as well as symmetry and fractal dimension.
Created by renowned researchers in the fields of General Surgery and Computer Science at the University of Wisconsin-Madison led by Dr. William H Wolberg with contributions from Professor W Nick Street and Olvi L Mangasarian this dataset was used in some groundbreaking research to predict breast cancer prognosis using linear programming methods. More recently statistical methods such as support vector machines have been employed to classify tumour types from this dataset as well other tasks such as identify hidden patterns through pattern recognition techniques like Artificial Neural Networks (ANN).
It has also been used for studies exploring unsupervised classification tools like Ant Colony Optimization for discovering meaningful relationships among different variables which can help physicians better understand the progression of certain types of tumors over time. For example types cardinality analysis allowed researchers to determine tumor’s heterogeneity before deciding on appropriate treatments potentially leading to improved prognosis success rates overall. This Wisconsin Breast Cancer Diagnostic dataset provides an invaluable resource to scientists working on preventing or curing this dreaded disease - a goal we all eagerly hope to achieve someday soon!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
- Developing a classifier that can accurately predict breast cancer diagnoses based on the provided features.
- Clustering patient data with similar diagnosis to discover trends or connections between certain symptoms and diagnoses.
- Optimizing feature selection algorithms to identify the most relevant predictors of breast cancer diagnosis from a set of given cell nuclei features
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: unformatted-data.csv
File: wpbc.data.csv | Column name | Description | |:--------------|:--------------------------------| | 119513 | ID number (Integer) | | N | Diagnosis (Binary) | | 31 | Radius (Real-valued) | | 18.02 | Texture (Real-valued) | | 27.6 | Perimeter (Real-valued) | | 117.5 | Area (Real-valued) | | 1013 | Smoothness (Real-valued) | | 0.09489 | Compactness (Real-valued) | | 0.1036 | Concavity (Real-valued) | | 0.1086 | Symmetry (Real-valued) | | 0.07055 | Fractal Dimension (Real-valued) | | 0.1865 | Mean Intensity (Real-valued) | | 0.06333 | Standard Error (Real-valued) | | 0.6249 | Worst Radius (Real-valued) | | 1.89 | Worst Texture (Real-valued) | | 3.972 | Worst Perimeter (Real-valued) | | 71.55 | Worst Area (Real-valued) | | 0.004433 | Worst Smoothness (Real-valued) | | 0.01421 | Worst Compactness (Real-valued) | | 0.03233 | Worst Concavity (Real-valued) |
File: breast-cancer-wisconsin.data.csv | Column name | Description | |:--------------|:--------------------------------------| | 119513 | ID number (Integer) | | 1000025 | ID number (Integer) | | 1.1 | Uniformity of Cell Size (Integer) | | 1.2 | Uniformity of Cell Shape (Integer) | | 1.3 | Single Epithelial Cell Size (Integer) | | 1.4 | Bland Chromatin (Integer) | | 1.5 | Normal Nucleoli (Integer) | | 2.1 | Mitoses (Integer) |
File: wdbc.data.csv | Column name | Description | |:--------------|:----------------------------------------| | 842302 | Patient ID number (Integer Type) | | M | Diagnosis (Binary Type) | | **...
Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Breast cancer is the most frequently diagnosed cancer and the most frequent cause for cancer-related deaths in women worldwide. Globally, breast cancer accounted for 2.08 million out of 18.08 million new cancer cases (incidence rate of 11.6%) and 626,679 out of 9.55 million cancer-related deaths (6.6% of all cancer-related deaths) in 2018. 1,2 In India, breast cancer has surpassed cancers of the cervix and the oral cavity to be the most common cancer and the leading cause of cancer deaths. In 2018, 159,500 new cases of breast cancer were diagnosed, representing 27.7% of all new cancers among Indian women and 11.1% of all cancer deaths.
In india breast cancer cases reporting and diagnotics have increased 10 times in past 3 years . All thanks to the various cancer awareness initiatives by both private and govt. organisations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionTriple-negative breast cancer (TNBC) is linked to a poorer outlook, heightened aggressiveness relative to other breast cancer variants, and limited treatment choices. The absence of conventional treatment methods makes TNBC patients susceptible to metastasis. The objective of this research was to assess the clinical and pathological traits of TNBC patients, predict the influence of risk elements on their outlook, and create a prediction model to assist doctors in treating TNBC patients and enhancing their prognosis.MethodsWe included 23,394 individuals with complete baseline clinical data and survival information who were diagnosed with primary TNBC between 2010 and 2015 based on the SEER database. External validation utilised a group from The Affiliated Lihuili Hospital of Ningbo University. Independent risk factors linked to TNBC prognosis were identified through univariate, multivariate, and least absolute shrinkage and selection operator regression methods. These characteristics were chosen as parameters to develop 3- and 5-year overall survival (OS) and breast cancer-specific survival (BCSS) nomogram models. Model accuracy was assessed using calibration curves, consistency indices (C-indices), receiver operating characteristic curves (ROCs), and decision curve analyses (DCAs). Finally, TNBC patients were divided into groups of high, medium, and low risk, employing the nomogram model for conducting a Kaplan-Meier survival analysis.ResultsIn the training cohort, variables such as age at diagnosis, marital status, grade, T stage, N stage, M stage, surgery, radiation, and chemotherapy were linked to OS and BCSS. For the nomogram, the C-indices stood at 0.762, 0.747, and 0.764 in forecasting OS across the training, internal validation, and external validation groups, respectively. Additionally, the C-index values for the training, internal validation, and external validation groups in BCSS prediction stood at 0.793, 0.755, and 0.811, in that order. The findings revealed that the calibration of our nomogram model was successful, and the time-variant ROC curves highlighted its effectiveness in clinical settings. Ultimately, the clinical DCA showcased the prospective clinical advantages of the suggested model. Furthermore, the online version was simple to use, and nomogram classification may enhance the differentiation of TNBC prognosis and distinguish risk groups more accurately.ConclusionThese nomograms are precise tools for assessing risk in patients with TNBC and forecasting survival. They can help doctors identify prognostic markers and create more effective treatment plans for patients with TNBC, providing more accurate assessments of their 3- and 5-year OS and BCSS.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Cancer affects people of different ages, ethnicities and sex. Collecting and storing data from these people assists in the development, understanding and analysis of statistics on the disease. In Brazil, the oncology hospital units, whichreceive patients diagnosed with cancer, store the information in a national database, called Hospital Registry of Cancer (RHC). Were selected the folowing variables: age, sex, race, alcohol consumption, tobacco consumption and cancer staging.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
A measure of the number of adults diagnosed with breast, lung or colorectal cancer in a year who are still alive one year after diagnosis.
ONS still publish survival percentages for individual types of cancers. These can be found at: http://www.ons.gov.uk/ons/rel/cancer-unit/cancer-survival/cancer-survival-in-england--patients-diagnosed-2007-2011-and-followed-up-to-2012/index.html
A time series for one-year survival figures for breast, lung and colorectal cancer individually (previous NHS Outcomes Framework indicators 1.4.i, 1.4.iii and 1.4.v) is still published and can be found under the link 'Indicator data - previous methodology (.xls)' below.
Purpose
This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with breast, lung or colorectal cancer.
Current version updated: Feb-14
Next version due: To be confirmed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Breast Cancer Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/breast-cancer-dataset on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area.
The key challenges against it’s detection is how to classify tumors into malignant (cancerous) or benign(non cancerous). We ask you to complete the analysis of classifying these tumors using machine learning (with SVMs) and the Breast Cancer Wisconsin (Diagnostic) Dataset.
This dataset has been referred from Kaggle.
--- Original source retains full ownership of the source dataset ---
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Some racial and ethnic categories are suppressed for privacy and to avoid misleading estimates when the relative standard error exceeds 30% or the unweighted sample size is less than 50 respondents.
Data Source: Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey (BRFSS) Data
Why This Matters
Breast cancer is the most commonly diagnosed cancer in women and people assigned female at birth (AFAB) and the second leading cause of cancer death in the U.S. Breast cancer screenings can save lives by helping to detect breast cancer in its early stages when treatment is more effective.
While non-Hispanic white women and AFAB individuals are more likely to be diagnosed with breast cancer than their counterparts of other races and ethnicities, non-Hispanic Black women and AFAB individuals die from breast cancer at a significantly higher rate than their counterparts races and ethnicities.
Later-stage diagnoses and prolonged treatment duration partly explain these disparities in mortality rate. Structural barriers to quality health care, insurance, education, affordable housing, and sustainable income that disproportionately affect communities of color also drive racial inequities in breast cancer screenings and mortality.
The District Response
Project Women Into Staying Healthy (WISH) provides free breast and cervical cancer screenings to uninsured or underinsured women and AFAB adults aged 21 to 64. Patient navigation, transportation assistance, and cancer education are also provided.
DC Health’s Cancer and Chronic Disease Prevention Bureau works with healthcare providers to improve the use of preventative health services and provide breast cancer screening services.
DC Health maintains the District of Columbia Cancer Registry (DCCR) to track cancer incidences, examine environmental substances that cause cancer, and identify differences in cancer incidences by age, gender, race, and geographical location.
A measure of the number of adults diagnosed with breast, lung or colorectal cancer in a year who are still alive five years after diagnosis. ONS still publish survival percentages for individual types of cancers. These can be found at: http://www.ons.gov.uk/ons/rel/cancer-unit/cancer-survival/cancer-survival-in-england--patients-diagnosed-2007-2011-and-followed-up-to-2012/index.html A time series for five-year survival figures for breast, lung and colorectal cancer individually (previous NHS Outcomes Framework indicators 1.4.ii, 1.4.iv and 1.4.vi) is still published and can be found under the link 'Indicator data - previous methodology (.xls)' below. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with breast, lung or colorectal cancer. Current version updated: May-14 Next version due: To be confirmed
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Welcome to the the CSAW-M dataset homepageThis page includes the files and metadata related to the CSAW-M, a curated dataset of mammograms with expert assessments of the masking of cancer. CSAW-M is collected from over 10,000 individuals and annotated with potential masking. In contrast to the previous approaches which measure breast image density as a proxy, our dataset directly provides annotations of masking potential assessments from five specialists. We trained deep learning models on CSAW-M to estimate the masking level, and showed that the estimated masking is significantly more predictive of screening participants diagnosed with interval and large invasive cancers — without being explicitly trained for these tasks — than its breast density counterparts. Please find the paper corresponding to our work here and the GitHub repo here.CSAW-M Research Use LicensePlease read carefully all the terms and conditions of the CSAW-M Research Use License. How to access the dataset:If you want to get access to the data, please use the "Request access to files" option above (currently, non-Swedish researchers need to have a general figshare account to be able to to request access). We will ask you to agree to our terms of conditions and provide us with some information about what you will use the data for. We will then receive the request and process it, after which you would be able to download all the files.If you use this Work, please cite our paper:@article{sorkhei2021csaw, title={CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer}, author={Sorkhei, Moein and Liu, Yue and Azizpour, Hossein and Azavedo, Edward and Dembrower, Karin and Ntoula, Dimitra and Zouzos, Athanasios and Strand, Fredrik and Smith, Kevin}, year={2021} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThyroid and breast cancers are the two most frequent types of endocrine-related tumors among women worldwide, and their incidence is still on the rise. Observational studies have shown a relationship between thyroid and breast cancers. Nevertheless, many confounders predispose the results to interference. Accordingly, we performed a two-sample Mendelian randomization (MR) study to investigate the causal association between thyroid and breast cancers.MethodsWe acquired breast cancer data from the UK Biobank (13,879 breast cancer cases and 198,523 controls) and the Breast Cancer Association Consortium (BCAC; 122,977 breast cancer cases and 105,974 controls), and thyroid cancer data from FinnGen Biobank (989 thyroid cancer and 217,803 controls). Then, the multiplicative random effects inverse variance weighting (IVW), weight median (WM), and MR Egger methods were executed for MR analysis.ResultsOverall, IVW showed a causal effect of breast cancer on thyroid cancer using the BCAC dataset (odds ratio [OR] = 1.17; 95% confidence interval [CI] = 1.036–1.322; P = 0.011), and this relationship was also supported by the UK Biobank dataset (OR = 23.899; 95% CI = 2.331–245.003; P = 0.007), which showed that breast cancer patients were more likely to be diagnosed with thyroid cancer. On the whole, the reverse MR analysis did not show a causal effect of breast cancer on thyroid cancer. However, IVW showed a causal effect of thyroid cancer on estrogen receptor -negative breast cancer using the BCAC dataset (OR = 1.019; 95% CI = 1.001–1.038; P = 0.043), which suggested that people with thyroid cancer were more likely to develop breast cancer.ConclusionsBreast cancer represents a possible risk factor for thyroid cancer and thyroid cancer also represents a possible risk factor for ER-negative breast cancer. Future studies using powerful genetic tools to determine the causal relationship between breast and thyroid cancers are required.
The Breast Cancer Histopathological Image Classification (BreakHis) is composed of 9,109 microscopic images of breast tumor tissue collected from 82 patients using different magnifying factors (40X, 100X, 200X, and 400X). It contains 2,480 benign and 5,429 malignant samples (700X460 pixels, 3-channel RGB, 8-bit depth in each channel, PNG format). This database has been built in collaboration with the P&D Laboratory - Pathological Anatomy and Cytopathology, Parana, Brazil.
Paper: F. A. Spanhol, L. S. Oliveira, C. Petitjean and L. Heutte, "A Dataset for Breast Cancer Histopathological Image Classification," in IEEE Transactions on Biomedical Engineering, vol. 63, no. 7, pp. 1455-1462, July 2016, doi: 10.1109/TBME.2015.2496264
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
This publication reports on newly diagnosed cancers registered in England in addition to cancer deaths registered in England during 2020. It includes this summary report showing key findings, spreadsheet tables with more detailed estimates, and a methodology document.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The number of females who participated in a breast cancer screening program and there proportion of the relevant population, as well as the number of people diagnosed with breast cancer as a rate of those who participated, 2010-2011 (NSW, Vic, Qld, SA & WA). Source: Compiled by PHIDU based on data from BreastScreen NSW, BreastScreen Vic, BreastScreen Qld, BreastScreen WA - 2010 and 2011.The Dataset also contains the number of females who participated in a cervical cancer screening program and there proportion of the relevant population, as well as the number of the people diagnosed with low/high cervical cancer as a rate of those who participated, 2010-2011 (NSW, Vic, Qld, SA, WA & ACT). Source: Compiled by PHIDU based on data from the NSW Department of Health and NSW Central Cancer Registry, 2011 and 2012; Victorian Cervical Cytology Registry, 2011 and 2012; Queensland Health Cancer Services Screening Branch, 2011 and 2012; SA Cervix Screening Program, 2011 and 2012; Western Australia Cervical Cytology Register, 2011 and 2012; and ACT Cytology Register, 2011 and 2012.For both sets of screening if a women was screened more than twice in the two year period she is counted once only (all entries that were classified as not shown, not published or not applicable were assigned a null value; no data was provided for Maralinga Tjarutja LGA, in South Australia). The data is by LGA 2015 profile (based on the LGA 2011 geographic boundaries). For more information on statistics used please refer to the PHIDU website, available from: http://phidu.torrens.edu.au/
Sister Study is a prospective cohort of 50,884 U.S. women aged 35 to 74 years old conducted by the NIEHS. Eligible participants are women without a history of breast cancer but with at least one sister diagnosed with breast cancer at enrollment during 2003 - 2009. Datasets used in this research effort include health outcomes, lifestyle factors, socioeconomic factors, medication history, and built and natural environment factors. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Contact NIEHS Sister Study (https://sisterstudy.niehs.nih.gov/English/index1.htm) for data access. Format: Datasets are provided in SAS and/or CSV format.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset presents the footprint of cancer incidence statistics in Australia for all cancers combined and the 5 top cancer groupings (breast - female only, colorectal, lung, melanoma of the skin and prostate) and their respective ICD-10 codes. The data spans the years 2006-2010 and is aggregated to Statistical Area Level 3 (SA3) from the 2011 Australian Statistical Geography Standard (ASGS). Incidence data refer to the number of new cases of cancer diagnosed in a given time period. It does not refer to the number of people newly diagnosed (because one person can be diagnosed with more than one cancer in a year). Cancer incidence data come from the Australian Institute of Health and Welfare (AIHW) 2012 Australian Cancer Database (ACD).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present study assessed the incidence of drug-induced interstitial lung disease (ILD) recurrence among breast cancer patients who underwent rechallenge with cancer-directed therapy. Japanese insurance claims data and the Diagnosis Procedure Combination database (2009–2022) involving 81,601 patients were analyzed to evaluate 1,042 breast cancer patients who developed ILD. Of these, 566 patients underwent cancer-directed therapy rechallenge, and 42.1% of them were re-challenged with the same therapeutic regimen that caused the initial ILD. ILD recurrence was observed in 18.9% of the patients, with a median recurrence time of 40 days. The drugs most commonly causing ILD were cytotoxic agents, and those most frequently used for rechallenge were cytotoxic agents. A notable ILD recurrence rate was observed in breast cancer patients after a cancer-directed therapy rechallenge, highlighting the need for cautious treatment planning and personalised strategies to balance cancer control while mitigating ILD risk. This article discusses a study that researched a lung condition known as interstitial lung disease (ILD) in individuals with breast cancer who were treated again with cancer-directed therapy after recovering from ILD. ILD involves inflammation and damage to the lungs and can be a serious side effect of cancer treatment. We analyzed data from Japanese health insurance claims and hospital records from 2009 to 2022, of 81,601 patients with breast cancer, 1,042 developed ILD that required treatment with steroids. Of the 1,042 patients, 566 underwent cancer-directed therapy again after their initial ILD episode. The findings showed that approximately 19% of these patients experienced ILD recurrence, typically approximately 40 days after they had started cancer-directed therapy again. The most common rechallenge therapy for these patients was cytotoxic drugs, which are powerful and used to kill cancer cells. The results of this study highlight the risk of ILD recurrence in patients with breast cancer who undergo cancer-directed therapy again. This insight is crucial for doctors and patients when deciding on cancer treatments, especially after a patient has already had ILD. This demonstrates the importance of careful planning and personalized treatment strategies to manage the risk of ILD while attempting to effectively control cancer. This study helps in understanding the trade-off between treating cancer to protect patients while not causing serious side effects such as ILD.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset presents the footprint of female cancer incidence statistics in Australia for all cancers combined and the 11 top cancer groupings (breast, cervical, colorectal, leukaemia, lung, lymphoma, melanoma of the skin, ovary, pancreas, thyroid and uterus) and their respective ICD-10 codes. The data spans the years 2006-2010 and is aggregated to Statistical Area Level 4 (SA4) from the 2011 Australian Statistical Geography Standard (ASGS). Incidence data refer to the number of new cases of cancer diagnosed in a given time period. It does not refer to the number of people newly diagnosed (because one person can be diagnosed with more than one cancer in a year). Cancer incidence data come from the Australian Institute of Health and Welfare (AIHW) 2012 Australian Cancer Database (ACD).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Lombardy Region has developed a network of Senology Centres that integrate screening, diagnosis and treatment services in the field of breast cancer, with the aim of governing the development of the services offered and meeting the qualitative, structural, technological and quantitative standards identified by current legislation. To date, in-house and inter-company Breast Units have been set up in the Lombardy Region. The Breast Units provide for a single, homogeneous process of taking charge of the patient, with the recognition of a multidisciplinary team that refers to a single person responsible for the specific activity.
The collection took place between 1980 and 2013, and the cohort consists of 1800 individuals. The study population is based on women with consecutive breast cancer, who all visited the same doctor at the oncology department at Skåne University Hospital in Lund. All patients were interviewed by the doctor and the questions covered areas such as occupation, smoking, previous diseases, previous breast cancer screening, previous pregnancies, breast feeding, age at first menstrual period and menopause and the use of contraceptives. Information about the tumor and date of diagnosis were collected from pathology reports and patient records. For about 30 to 40% of the individuals tumor samples are collected. As controls, various control populations have been used, eg healthy controls from Dalby primary care (an urban area of the municipality of Lund), women at Scania University Hospital with benign breast disease and controls from the MISS cohort (SND study No. EXT 0102).
Purpose:
To study the risk factors for women with consecutive breast cancer.
By UCI [source]
This dataset contains data on breast cancer diagnosis, a devastating medical condition that affects thousands of people around the world each year. The data is comprised of patient ID, diagnosis (Malignant or Benign), and 30 computed features extracted from a digitized image of a fine needle aspirate (FNA) of a breast mass. Features include radius, texture, perimeter, area, smoothness, compactness concavity and concave points as well as symmetry and fractal dimension.
Created by renowned researchers in the fields of General Surgery and Computer Science at the University of Wisconsin-Madison led by Dr. William H Wolberg with contributions from Professor W Nick Street and Olvi L Mangasarian this dataset was used in some groundbreaking research to predict breast cancer prognosis using linear programming methods. More recently statistical methods such as support vector machines have been employed to classify tumour types from this dataset as well other tasks such as identify hidden patterns through pattern recognition techniques like Artificial Neural Networks (ANN).
It has also been used for studies exploring unsupervised classification tools like Ant Colony Optimization for discovering meaningful relationships among different variables which can help physicians better understand the progression of certain types of tumors over time. For example types cardinality analysis allowed researchers to determine tumor’s heterogeneity before deciding on appropriate treatments potentially leading to improved prognosis success rates overall. This Wisconsin Breast Cancer Diagnostic dataset provides an invaluable resource to scientists working on preventing or curing this dreaded disease - a goal we all eagerly hope to achieve someday soon!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
- Developing a classifier that can accurately predict breast cancer diagnoses based on the provided features.
- Clustering patient data with similar diagnosis to discover trends or connections between certain symptoms and diagnoses.
- Optimizing feature selection algorithms to identify the most relevant predictors of breast cancer diagnosis from a set of given cell nuclei features
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: unformatted-data.csv
File: wpbc.data.csv | Column name | Description | |:--------------|:--------------------------------| | 119513 | ID number (Integer) | | N | Diagnosis (Binary) | | 31 | Radius (Real-valued) | | 18.02 | Texture (Real-valued) | | 27.6 | Perimeter (Real-valued) | | 117.5 | Area (Real-valued) | | 1013 | Smoothness (Real-valued) | | 0.09489 | Compactness (Real-valued) | | 0.1036 | Concavity (Real-valued) | | 0.1086 | Symmetry (Real-valued) | | 0.07055 | Fractal Dimension (Real-valued) | | 0.1865 | Mean Intensity (Real-valued) | | 0.06333 | Standard Error (Real-valued) | | 0.6249 | Worst Radius (Real-valued) | | 1.89 | Worst Texture (Real-valued) | | 3.972 | Worst Perimeter (Real-valued) | | 71.55 | Worst Area (Real-valued) | | 0.004433 | Worst Smoothness (Real-valued) | | 0.01421 | Worst Compactness (Real-valued) | | 0.03233 | Worst Concavity (Real-valued) |
File: breast-cancer-wisconsin.data.csv | Column name | Description | |:--------------|:--------------------------------------| | 119513 | ID number (Integer) | | 1000025 | ID number (Integer) | | 1.1 | Uniformity of Cell Size (Integer) | | 1.2 | Uniformity of Cell Shape (Integer) | | 1.3 | Single Epithelial Cell Size (Integer) | | 1.4 | Bland Chromatin (Integer) | | 1.5 | Normal Nucleoli (Integer) | | 2.1 | Mitoses (Integer) |
File: wdbc.data.csv | Column name | Description | |:--------------|:----------------------------------------| | 842302 | Patient ID number (Integer Type) | | M | Diagnosis (Binary Type) | | **...