Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Breast Cancer Wisconsin Diagnostic Dataset
Following description was retrieved from breast cancer dataset on UCI machine learning repository. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at here. Separating plane described above was obtained using Multisurface Method-Tree (MSM-T), a classification method which uses linear… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/breast-cancer-wisconsin.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
BREAST CANCER WISCONSIN (DIAGNOSTIC) DATA SET Predict whether the cancer is benign or malignant. It consists of features that are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)
Breast Cancer Wisconsin (Original) dataset consists of 699 observations and 11 features
Source:
Copied from the original dataset
Creators:
Dr. William H. Wolberg, General Surgery Dept. University of Wisconsin, Clinical Sciences Center Madison, WI 53792 wolberg '@' eagle.surgery.wisc.edu
W. Nick Street, Computer Sciences Dept. University of Wisconsin, 1210 West Dayton St., Madison, WI 53706 street '@' cs.wisc.edu 608-262-6619
Olvi L. Mangasarian, Computer Sciences Dept. University of Wisconsin, 1210 West Dayton St., Madison, WI 53706 olvi '@' cs.wisc.edu… See the full description on the dataset page: https://huggingface.co/datasets/wwydmanski/wisconsin-breast-cancer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Breast Cancer Wisconsin (Diagnostic) Data Set’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/uciml/breast-cancer-wisconsin-data on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].
This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/
Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
Attribute Information:
1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32)
Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)
The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.
All feature values are recoded with four significant digits.
Missing attribute values: none
Class distribution: 357 benign, 212 malignant
--- Original source retains full ownership of the source dataset ---
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by IAM THE ONE AJ
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Wisconsin Diagnostic Breast Cancer (WDBC)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mohaiminul101/wisconsin-diagnostic-breast-cancer-wdbc on 30 September 2021.
--- Dataset description provided by original source is as follows ---
Breast cancer is a disease in which cells in the breast grow out of control. There are different kinds of breast cancer. The kind of breast cancer depends on which cells in the breast turn into cancer. Wisconsin Diagnostic Breast Cancer (WDBC) dataset obtained by the university of Wisconsin Hospital is used to classify tumors as benign or malignant.
Attribute Information:
1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32)
Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)
The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.
All feature values are recoded with four significant digits.
Missing attribute values: none
Class distribution: 357 benign, 212 malignant
Creator: Dr. WIlliam H. Wolberg (physician) University of Wisconsin Hospitals Madison, Wisconsin, USA
This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/
Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Hansel D'Souza
Released under CC0: Public Domain
Explore the field of breast cancer diagnosis with the insightful Wisconsin Breast Cancer dataset (Original). This dataset provides detailed attributes representing tumor characteristics observed in breast tissue samples. By analyzing these attributes, researchers and medical professionals can gain insights into tumor behavior and develop predictive models for cancer detection and prognosis.
Features | |
---|---|
1. Sample code number: Unique identifier for each tissue sample. | |
2. Clump Thickness: Assessment of the thickness of tumor cell clusters (1 - 10). | |
3. Uniformity of Cell Size: Uniformity in the size of tumor cells (1 - 10). | |
4. Uniformity of Cell Shape: Uniformity in the shape of tumor cells (1 - 10). | |
5. Marginal Adhesion: Degree of adhesion of tumor cells to surrounding tissue (1 - 10). | |
6. Single Epithelial Cell Size: Size of individual tumor cells (1 - 10). | |
7. Bare Nuclei: Presence of nuclei without surrounding cytoplasm (1 - 10). | |
8. Bland Chromatin: Assessment of chromatin structure in tumor cells (1 - 10). | |
9. Normal Nucleoli: Presence of normal-looking nucleoli in tumor cells (1 - 10). | |
10. Mitoses: Frequency of mitotic cell divisions (1 - 10). | |
11. Class: Classification of tumor type (2 for benign, 4 for malignant). |
The Breast Cancer Wisconsin dataset is sourced from tissue samples collected for diagnostic purposes, with attributes derived from microscopic examination. The dataset is anonymized and made available for research purposes, contributing to advancements in cancer diagnosis and treatment.
The dataset used in the paper is the Breast Cancer Wisconsin (Original) dataset, which contains 699 entries, 9 dimensions, and 2 classes.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Aadarsh Vani
Released under Apache 2.0
This dataset was created by PAVAN KUMAR D
This dataset was created by Fatima Sohail Shaukat
Released under Other (specified in description)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset used in this study is the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, originally provided by the University of Wisconsin and obtained via Kaggle. It consists of 569 observations, each corresponding to a digitized image of a fine needle aspirate (FNA) of a breast mass. The dataset contains 32 attributes: one identifier column (discarded during preprocessing), one diagnosis label (malignant or benign), and 30 continuous real-valued features that describe the morphology of cell nuclei. These features are grouped into three statistical descriptors—mean, standard error (SE), and worst (mean of the three largest values)—for ten morphological properties including radius, perimeter, area, concavity, and fractal dimension. All feature values were normalized using z-score standardization to ensure uniform scale across models sensitive to input ranges. No missing values were present in the original dataset. Label encoding was applied to the diagnosis column, assigning 1 to malignant and 0 to benign cases. The dataset was split into training (80%) and testing (20%) sets while preserving class balance via stratified sampling. The accompanying Python source code (breast_cancer_classification_models.py) performs data loading, preprocessing, model training, evaluation, and result visualization. Four lightweight classifiers—Decision Tree, Naïve Bayes, Perceptron, and K-Nearest Neighbors (KNN)—were implemented using the scikit-learn library (version 1.2 or later). Performance metrics including Accuracy, Precision, Recall, F1-score, and ROC-AUC were calculated for each model. Confusion matrices and ROC curves were generated and saved as PNG files for interpretability. All results are saved in a structured CSV file (classification_results.csv) that contains the performance metrics for each model. Supplementary visualizations include all_feature_histograms.png (distribution plots for all standardized features), model_comparison.png (metric-wise bar plot), and feature_correlation_heatmap.png (Pearson correlation matrix of all 30 features). The data files are in standard CSV and PNG formats and can be opened using any spreadsheet or image viewer, respectively. No rare file types are used, and all scripts are compatible with any Python 3.x environment. This data package enables reproducibility and offers a transparent overview of how baseline machine learning models perform in the domain of breast cancer diagnosis using a clinically-relevant dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Worldwide, breast cancer is the most common type of cancer in women and the second highest in terms of mortality rates.Diagnosis of breast cancer is performed when an abnormal lump is found (from self-examination or x-ray) or a tiny speck of calcium is seen (on an x-ray). After a suspicious lump is found, the doctor will conduct a diagnosis to determine whether it is cancerous and, if so, whether it has spread to other parts of the body. This breast cancer dataset was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.
This dataset was created by Sony Augustine@123
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Breast Cancer Diagnostic Dataset (BCD)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/devraikwar/breast-cancer-diagnostic on 14 February 2022.
--- Dataset description provided by original source is as follows ---
The resources for this dataset can be found at https://www.openml.org/d/13 and https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
This data set includes 201 instances of one class and 85 instances of another class. The instances are described by 9 attributes, some of which are linear and some are nominal.
Number of Instances: 286
Number of Attributes: 9 + the class attribute
Attribute Information:
Class: no-recurrence-events, recurrence-events age: 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99. menopause: lt40, ge40, premeno. tumor-size: 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59. inv-nodes: 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, 27-29, 30-32, 33-35, 36-39. node-caps: yes, no. deg-malig: 1, 2, 3. breast: left, right. breast-quad: left-up, left-low, right-up, right-low, central. irradiat: yes, no.
Missing Attribute Values: (denoted by “?”) Attribute #: Number of instances with missing values: 6. 8 9. 1.
Class Distribution:
no-recurrence-events: 201 instances recurrence-events: 85 instances
Original data https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
With the attributes described above, can you predict if a patient has recurrence event ?
--- Original source retains full ownership of the source dataset ---
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by monaheydary00
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].This database is also available through the UW CS ftp server:ftp ftp.cs.wisc.educd math-prog/cpo-dataset/machine-learn/WDBC/Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
This dataset was created by AKHIL BAWANKULE
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Breast Cancer Wisconsin Diagnostic Dataset
Following description was retrieved from breast cancer dataset on UCI machine learning repository. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at here. Separating plane described above was obtained using Multisurface Method-Tree (MSM-T), a classification method which uses linear… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/breast-cancer-wisconsin.