20 datasets found
  1. Breast Cancer Diagnosis Dataset - Wisconsin State

    • kaggle.com
    zip
    Updated Mar 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Badole (2024). Breast Cancer Diagnosis Dataset - Wisconsin State [Dataset]. https://www.kaggle.com/datasets/saurabhbadole/breast-cancer-wisconsin-state
    Explore at:
    zip(5844 bytes)Available download formats
    Dataset updated
    Mar 31, 2024
    Authors
    Saurabh Badole
    Area covered
    Wisconsin
    Description

    Description:

    Explore the field of breast cancer diagnosis with the insightful Wisconsin Breast Cancer dataset (Original). This dataset provides detailed attributes representing tumor characteristics observed in breast tissue samples. By analyzing these attributes, researchers and medical professionals can gain insights into tumor behavior and develop predictive models for cancer detection and prognosis.

    Features
    1. Sample code number: Unique identifier for each tissue sample.
    2. Clump Thickness: Assessment of the thickness of tumor cell clusters (1 - 10).
    3. Uniformity of Cell Size: Uniformity in the size of tumor cells (1 - 10).
    4. Uniformity of Cell Shape: Uniformity in the shape of tumor cells (1 - 10).
    5. Marginal Adhesion: Degree of adhesion of tumor cells to surrounding tissue (1 - 10).
    6. Single Epithelial Cell Size: Size of individual tumor cells (1 - 10).
    7. Bare Nuclei: Presence of nuclei without surrounding cytoplasm (1 - 10).
    8. Bland Chromatin: Assessment of chromatin structure in tumor cells (1 - 10).
    9. Normal Nucleoli: Presence of normal-looking nucleoli in tumor cells (1 - 10).
    10. Mitoses: Frequency of mitotic cell divisions (1 - 10).
    11. Class: Classification of tumor type (2 for benign, 4 for malignant).

    Usage:

    • Cancer diagnosis: Develop machine learning models to classify tumors as benign or malignant based on their characteristics, aiding in early detection and treatment planning.
    • Feature importance analysis: Identify key attributes contributing to tumor malignancy and understand their biological significance.
    • Clinical decision support: Assist healthcare professionals in interpreting biopsy results and making informed decisions about patient care.

    Acknowledgements:

    The Breast Cancer Wisconsin dataset is sourced from tissue samples collected for diagnostic purposes, with attributes derived from microscopic examination. The dataset is anonymized and made available for research purposes, contributing to advancements in cancer diagnosis and treatment.

  2. Breast Cancer Dataset [Wisconsin Diagnostic UCI]

    • kaggle.com
    zip
    Updated Jan 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhinav Mangalore (2024). Breast Cancer Dataset [Wisconsin Diagnostic UCI] [Dataset]. https://www.kaggle.com/datasets/abhinavmangalore/breast-cancer-dataset-wisconsin-diagnostic-uci
    Explore at:
    zip(49831 bytes)Available download formats
    Dataset updated
    Jan 22, 2024
    Authors
    Abhinav Mangalore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Wisconsin
    Description

    This dataset is taken from the UCI Machine Learning Repository (Link: https://data.world/health/breast-cancer-wisconsin) by the Donor: Nick Street

    The main idea and inspiration behind the upload was to provide datasets for Machine Learning as practice and reference for my peers at college. The main purpose is to analyze data and experiment with different machine learning ideas and techniques for this binary classification task. As such, this dataset is a very useful resource to practice on.

    Breast cancer is when breast cells mutate and become cancerous cells that multiply and form tumors. It accounts for 25% of all cancer cases and affected over 2.1 Million people in 2015 alone. Breast cancer typically affects women and people assigned female at birth (AFAB) age 50 and older, but it can also affect men and people assigned male at birth (AMAB), as well as younger women. Healthcare providers may treat breast cancer with surgery to remove tumors or treatment to kill cancerous cells.

    Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at http://www.cs.wisc.edu/~street/images/

    The task: To classify whether the tumor is benign (B) or malignant (M).

    Relevant information

    Features are computed from a digitized image of a fine needle
    aspirate (FNA) of a breast mass. They describe
    characteristics of the cell nuclei present in the image.
    A few of the images can be found at
    http://www.cs.wisc.edu/~street/images/
    
    Separating plane described above was obtained using
    Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree
    Construction Via Linear Programming." Proceedings of the 4th
    Midwest Artificial Intelligence and Cognitive Science Society,
    pp. 97-101, 1992], a classification method which uses linear
    programming to construct a decision tree. Relevant features
    were selected using an exhaustive search in the space of 1-4
    features and 1-3 separating planes.
    
    The actual linear program used to obtain the separating plane
    in the 3-dimensional space is that described in:
    [K. P. Bennett and O. L. Mangasarian: "Robust Linear
    Programming Discrimination of Two Linearly Inseparable Sets",
    Optimization Methods and Software 1, 1992, 23-34].
    
    
    This database is also available through the UW CS ftp server:
    
    ftp ftp.cs.wisc.edu
    cd math-prog/cpo-dataset/machine-learn/WDBC/
    

    Number of instances: 569

    Number of attributes: 32 (ID, diagnosis, 30 real-valued input features)

    Original Creators:

    Dr. William H. Wolberg, General Surgery Dept., University of
    Wisconsin, Clinical Sciences Center, Madison, WI 53792
    wolberg@eagle.surgery.wisc.edu
    
    W. Nick Street, Computer Sciences Dept., University of
    Wisconsin, 1210 West Dayton St., Madison, WI 53706
    street@cs.wisc.edu 608-262-6619
    
    Olvi L. Mangasarian, Computer Sciences Dept., University of
    Wisconsin, 1210 West Dayton St., Madison, WI 53706
    olvi@cs.wisc.edu 
    

    Donor: Nick Street

    Date: November 1995

    Past Usage:

    first usage:

    W.N. Street, W.H. Wolberg and O.L. Mangasarian 
    Nuclear feature extraction for breast tumor diagnosis.
    IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science
    and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.
    

    OR literature:

    O.L. Mangasarian, W.N. Street and W.H. Wolberg. 
    Breast cancer diagnosis and prognosis via linear programming. 
    Operations Research, 43(4), pages 570-577, July-August 1995.
    

    Medical literature:

    W.H. Wolberg, W.N. Street, and O.L. Mangasarian. 
    Machine learning techniques to diagnose breast cancer from
    fine-needle aspirates. 
    Cancer Letters 77 (1994) 163-171.
    
    W.H. Wolberg, W.N. Street, and O.L. Mangasarian. 
    Image analysis and machine learning applied to breast cancer
    diagnosis and prognosis. 
    Analytical and Quantitative Cytology and Histology, Vol. 17
    No. 2, pages 77-87, April 1995. 
    
    W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. 
    Computerized breast cancer diagnosis and prognosis from fine
    needle aspirates. 
    Archives of Surgery 1995;130:511-516.
    
    W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. 
    Computer-derived nuclear features distinguish malignant from
    benign breast cytology. 
    Human Pathology, 26:792--796, 1995.
    

    See also: http://www.cs.wisc.edu/~olvi/uwmp/mpml.html http://www.cs.wisc.edu/~olvi/uwmp/cancer.html

  3. Breast Cancer Wisconsin (Original)

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anik Chand (2025). Breast Cancer Wisconsin (Original) [Dataset]. https://www.kaggle.com/datasets/anikchand/breast-cancer-wisconsin-original
    Explore at:
    zip(5902 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    Anik Chand
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Anik Chand

    Released under CC BY-SA 4.0

    Contents

  4. Breast Cancer Wisconsin (Original)

    • kaggle.com
    zip
    Updated Jan 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Morgan (2024). Breast Cancer Wisconsin (Original) [Dataset]. https://www.kaggle.com/datasets/chrismorgan86/breast-cancer-wisconsin-original
    Explore at:
    zip(88338 bytes)Available download formats
    Dataset updated
    Jan 26, 2024
    Authors
    Chris Morgan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Additional Information

    Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself:

    Group 1: 367 instances (January 1989) Group 2: 70 instances (October 1989) Group 3: 31 instances (February 1990) Group 4: 17 instances (April 1990) Group 5: 48 instances (August 1990) Group 6: 49 instances (Updated January 1991) Group 7: 31 instances (June 1991)

    Group 8: 86 instances (November 1991)

    Total: 699 points (as of the donated datbase on 15 July 1992)

    Note that the results summarized above in Past Usage refer to a dataset of size 369, while Group 1 has only 367 instances. This is because it originally contained 369 instances; 2 were removed. The following statements summarizes changes to the original Group 1's set of data:

    Group 1 : 367 points: 200B 167M (January 1989)
    Revised Jan 10, 1991: Replaced zero bare nuclei in 1080185 & 1187805
    Revised Nov 22,1991: Removed 765878,4,5,9,7,10,10,10,3,8,1 no record
    : Removed 484201,2,7,8,8,4,3,10,3,4,1 zero epithelial
    : Changed 0 to 1 in field 6 of sample 1219406
    : Changed 0 to 1 in field 8 of following sample:
    : 1182404,2,3,1,1,1,2,0,1,1,1
  5. A

    ‘Breast Cancer Wisconsin (Diagnostic) ’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Breast Cancer Wisconsin (Diagnostic) ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-breast-cancer-wisconsin-diagnostic-4be8/0af307d3/?iid=022-284&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Breast Cancer Wisconsin (Diagnostic) ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/faroukbenarous/breast-cancer-wisconsin-diagnostic on 30 September 2021.

    --- No further description of dataset provided by original source ---

    --- Original source retains full ownership of the source dataset ---

  6. Breast Cancer Wisconsin (Original)

    • kaggle.com
    zip
    Updated May 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sony Augustine@123 (2023). Breast Cancer Wisconsin (Original) [Dataset]. https://www.kaggle.com/datasets/sonyaugustine123/breast-cancer-wisconsin-original/suggestions
    Explore at:
    zip(5902 bytes)Available download formats
    Dataset updated
    May 23, 2023
    Authors
    Sony Augustine@123
    Description

    Dataset

    This dataset was created by Sony Augustine@123

    Contents

  7. Breast Cancer Wisconsin (Original) Data Set

    • kaggle.com
    zip
    Updated Mar 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mario Lisboa (2022). Breast Cancer Wisconsin (Original) Data Set [Dataset]. https://www.kaggle.com/datasets/mariolisboa/breast-cancer-wisconsin-original-data-set/code
    Explore at:
    zip(6077 bytes)Available download formats
    Dataset updated
    Mar 22, 2022
    Authors
    Mario Lisboa
    Description

    Context

    This breast cancer databases was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.

    Attributes 1 through 10 have been used to represent instances. Each instance has one of 2 possible classes: benign or malignant.

    Content

    Attribute Domain 1.Sample code number id number 2.Clump Thickness 1 - 10 3.Uniformity of Cell Size 1 - 10 4.Uniformity of Cell Shape 1 - 10 5.Marginal Adhesion 1 - 10 6.Single Epithelial Cell Size 1 - 10 7.Bare Nuclei 1 - 10 8.Bland Chromatin 1 - 10 9.Normal Nucleoli 1 - 10 10.Mitoses 1 - 10 11.Class (2 for benign, 4 for malignant)

    Class distribution:

    Benign: 458 (65.5%) Malignant: 241 (34.5%)

    https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)

  8. UCI_Breast Cancer Wisconsin (Original)

    • kaggle.com
    zip
    Updated Jan 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubham Biswas (2018). UCI_Breast Cancer Wisconsin (Original) [Dataset]. https://www.kaggle.com/zzero0/uci-breast-cancer-wisconsin-original
    Explore at:
    zip(5852 bytes)Available download formats
    Dataset updated
    Jan 29, 2018
    Authors
    Shubham Biswas
    Description

    Dataset

    This dataset was created by Shubham Biswas

    Contents

  9. Data from: breast-cancer-wisconsin

    • kaggle.com
    zip
    Updated Mar 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salih ACUR (2020). breast-cancer-wisconsin [Dataset]. https://www.kaggle.com/salihacur/breastcancerwisconsin
    Explore at:
    zip(5996 bytes)Available download formats
    Dataset updated
    Mar 16, 2020
    Authors
    Salih ACUR
    Description

    Attribute Information:

    1. Sample code number: id number
    2. Clump Thickness: 1 - 10
    3. Uniformity of Cell Size: 1 - 10
    4. Uniformity of Cell Shape: 1 - 10
    5. Marginal Adhesion: 1 - 10
    6. Single Epithelial Cell Size: 1 - 10
    7. Bare Nuclei: 1 - 10
    8. Bland Chromatin: 1 - 10
    9. Normal Nucleoli: 1 - 10
    10. Mitoses: 1 - 10
    11. Class: (2 for benign, 4 for malignant)
  10. Breast_cancer_wisconsin

    • kaggle.com
    zip
    Updated Feb 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sagar bhaskar (2021). Breast_cancer_wisconsin [Dataset]. https://www.kaggle.com/sagarbhaskar/breast-cancer-wisconsin
    Explore at:
    zip(6048 bytes)Available download formats
    Dataset updated
    Feb 18, 2021
    Authors
    sagar bhaskar
    Description

    Dataset

    This dataset was created by sagar bhaskar

    Contents

  11. Breast Cancer Prognostics

    • kaggle.com
    zip
    Updated Dec 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Breast Cancer Prognostics [Dataset]. https://www.kaggle.com/datasets/thedevastator/improve-breast-cancer-prognostics-using-machine
    Explore at:
    zip(78356 bytes)Available download formats
    Dataset updated
    Dec 4, 2022
    Authors
    The Devastator
    Description

    Breast Cancer Prognostics

    Study the Wisconsin Dataset

    By UCI [source]

    About this dataset

    The Breast Cancer Wisconsin (Prognostic) dataset brings together data collected from hundreds of breast cancer cases, making it valuable for predictive prognosis. It includes 30 features such as radius, texture, area, compactness and concavity that were generated from the a digitized fine needle aspirate (FNA) of the mass to generate characteristics of the cell nuclei present in each case. It also includes outcomes such as recurrence and nonrecurrence and also time-to-recurrence information for those cases that relapse.

    This breaking dataset was created by some leading minds in medical science; Dr William H. Wolberg at the University Of Wisconsin Clinical Sciences Center alongside W. Nick Street at the university's Computer Sciences Dept., and Olvi L Mangasarian also based there - all credited with creating various decision tree construction systems using linear programming models to accurately predict disease recurrences within an incredibly short time frame.

    The data is freely available through UW CS ftp server or on Kaggle's website making use easier than ever before - giving all researchers access up-to-date information regarding breast cancer prognosis and diagnosis via images taken from FNA tests conducted on masses in diagnosed patients' bodies - allowing each participant instantaneous access to a powerful set of features versus outcomes within both recurrent and nonrecurrent situations.. Moreover papers such as 'An inductive learning approach to prognostic prediction.' by WN street et al have utilized this database extensively mapping out how Artificial Neural Networks can be used for predictive tasks with noteworthy success! Armed with these tested ideas consequently anyone has access level ground in understanding how decisions are made as it relates to predicting breast cancer outcome effectively utilizing this dataset helping us better understand how a predictive model can significantly improve patient care processes!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is designed to improve the prognostics of breast cancer using machine learning algorithms. The data consists of a time series of patient symptoms and various medical parameters, such as tumor size and malignancy, that can be used by programmatic algorithms to predict diagnosis and prognosis outcomes. Here are some steps on how to use this dataset:

    • Pre-process and clean the data: Since the dataset contains incomplete or missing values across various parameters, it is important to clean and pre-process the data before attempting any machine learning algorithm (MLA). This includes sorting out what type of values need imputation, standardizing features for better performance, encoding categorical variables for MLAs, and normalizing numerical values for accuracy.

    • Choose an appropriate MLA: Depending on your exact goal with this data set - for example if you wanted reliable classification results or weighted predictions based on factors - there are a variety of MLAs from which you may select; examples include logistic regression classifiers, least squares support vector machines (SVM), neural networks, nonsmooth optimization algorithms like A-Optimality or global optimization methods such as Extract M-of-N rule sets from trained neural nets.. It would be wise to read up on each algorithm in order to determine which one most appropriately meets your needs before starting experimentation with the dataset itself.

    • Train the model using your selected MLA: Once you have identified an MLA that fits your desired result outcome best – or if you decide on experimenting with multiple approaches –it’s time turn back towards the data itself in order run experiments actually examine outcomes based upon training models built upon it through cross validation methods such as k-fold splitting.. Then test these trained models against validation datasets taken from specified subsets within the original larger data set structure held by Kaggle in order get general outputs results determining performance rates over various conditions presented by parameter combinations relevant when predicting breast cancer diagnostic &/or prognostic outcomes .. Establishing any trends revealed during these experiments will help inform future model selections during training process associated implementing an effective predictive solution fitting specific user requirements especially where particular MLA are not tailored handle purpose generally falling outside scope designing said model so guaranteeing ac...

  12. Breast Cancer Wisconsin - benign or malignant

    • kaggle.com
    zip
    Updated Jul 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ninja Coding (2020). Breast Cancer Wisconsin - benign or malignant [Dataset]. https://www.kaggle.com/datasets/ninjacoding/breast-cancer-wisconsin-benign-or-malignant/versions/3
    Explore at:
    zip(5923 bytes)Available download formats
    Dataset updated
    Jul 20, 2020
    Authors
    Ninja Coding
    Description

    Context

    It is quite common to find ML-based applications embedded with real-time patient data available from different healthcare systems in multiple countries, thereby increasing the efficacy of new treatment options which were unavailable before. This data set is all about predicting whether the cancer cells are benign or malignant.

    Content

    Information about attributes:

    There are total 10 attributes(int)- Sample code number: id number Clump Thickness: 1 - 10 Uniformity of Cell Size: 1 - 10 Uniformity of Cell Shape: 1 - 10 Marginal Adhesion: 1 - 10 Single Epithelial Cell Size: 1 - 10 Bare Nuclei: 1 - 10 Bland Chromatin: 1 - 10 Normal Nucleoli: 1 - 10 Mitoses: 1 - 10 Predicted class: 2 for benign and 4 for malignant

    Acknowledgements

    This data set(Original Wisconsin Breast Cancer Database) is taken from UCI Machine Learning Repository.

    Inspiration

    This is the first ever data set I am sharing in Kaggle. It would be a great pleasure if you find this data set useful to develop your own model. Hope this simple data set will help beginners to develop their own models for classification and learn how to make their model even better.

  13. Breast Cancer Diagnostic Dataset (BCD)

    • kaggle.com
    zip
    Updated Oct 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dev Raikwar (2021). Breast Cancer Diagnostic Dataset (BCD) [Dataset]. https://www.kaggle.com/datasets/devraikwar/breast-cancer-diagnostic
    Explore at:
    zip(2081 bytes)Available download formats
    Dataset updated
    Oct 26, 2021
    Authors
    Dev Raikwar
    Description

    Context

    The resources for this dataset can be found at https://www.openml.org/d/13 and https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

    Content

    This data set includes 201 instances of one class and 85 instances of another class. The instances are described by 9 attributes, some of which are linear and some are nominal.

    Number of Instances: 286

    Number of Attributes: 9 + the class attribute

    Attribute Information:

    Class: no-recurrence-events, recurrence-events age: 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99. menopause: lt40, ge40, premeno. tumor-size: 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59. inv-nodes: 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, 27-29, 30-32, 33-35, 36-39. node-caps: yes, no. deg-malig: 1, 2, 3. breast: left, right. breast-quad: left-up, left-low, right-up, right-low, central. irradiat: yes, no.

    Missing Attribute Values: (denoted by “?”) Attribute #: Number of instances with missing values: 6. 8 9. 1.

    Class Distribution:

    no-recurrence-events: 201 instances recurrence-events: 85 instances

    Acknowledgements

    Original data https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

    Inspiration

    With the attributes described above, can you predict if a patient has recurrence event ?

  14. Breast Cancer Prediction

    • kaggle.com
    zip
    Updated Aug 3, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adhyan Maji (2020). Breast Cancer Prediction [Dataset]. https://www.kaggle.com/datasets/adhyanmaji31/breast-cancer-prediction/discussion
    Explore at:
    zip(5924 bytes)Available download formats
    Dataset updated
    Aug 3, 2020
    Authors
    Adhyan Maji
    Description

    Dataset Information

    Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself:

    Group 1: 367 instances (January 1989) Group 2: 70 instances (October 1989) Group 3: 31 instances (February 1990) Group 4: 17 instances (April 1990) Group 5: 48 instances (August 1990) Group 6: 49 instances (Updated January 1991) Group 7: 31 instances (June 1991)

    Group 8: 86 instances (November 1991)

    Total: 699 points (as of the donated database on 15 July 1992)

    Note that the results summarized above in Past Usage refer to a dataset of size 369, while Group 1 has only 367 instances. This is because it originally contained 369 instances; 2 were removed. The following statements summarize changes to the original Group 1's set of data:

    Group 1 : 367 points: 200B 167M (January 1989)
    Revised Jan 10, 1991: Replaced zero bare nuclei in 1080185 & 1187805
    Revised Nov 22,1991: Removed 765878,4,5,9,7,10,10,10,3,8,1 no record
    : Removed 484201,2,7,8,8,4,3,10,3,4,1 zero epithelial
    : Changed 0 to 1 in field 6 of sample 1219406
    : Changed 0 to 1 in field 8 of following sample:
    : 1182404,2,3,1,1,1,2,0,1,1,1

    Attribute Information

    1. Sample code number: id number
    2. Clump Thickness: 1 - 10
    3. Uniformity of Cell Size: 1 - 10
    4. Uniformity of Cell Shape: 1 - 10
    5. Marginal Adhesion: 1 - 10
    6. Single Epithelial Cell Size: 1 - 10
    7. Bare Nuclei: 1 - 10
    8. Bland Chromatin: 1 - 10
    9. Normal Nucleoli: 1 - 10
    10. Mitoses: 1 - 10
    11. Class: (2 for benign, 4 for malignant)

    Acknowledgements

    Wolberg, W.H., & Mangasarian, O.L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, 87, 9193--9196.

    Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference (pp. 470--479). Aberdeen, Scotland: Morgan Kaufmann.

    Inspiration

    Predict from the dataset whether a person has Breast Cancer: Benign or Malignant .

  15. Wisconsin breast cancer cytology features

    • kaggle.com
    zip
    Updated Mar 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johnson Thomas (2018). Wisconsin breast cancer cytology features [Dataset]. https://www.kaggle.com/johnyquest/wisconsin-breast-cancer-cytology-features
    Explore at:
    zip(5987 bytes)Available download formats
    Dataset updated
    Mar 19, 2018
    Authors
    Johnson Thomas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Wisconsin
    Description

    Context

    Cytology features of breast cancer biopsy. It can be used to predict breast cancer from cytology features.

    The data was obtained from https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)

    Data description can be found at https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.names

    Content

    Data contains cytology features of breast cancer biopsies - clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nuceloli, mitosis. The class variable denotes whether it was cancer or not. Cancer = 1 and not cancer = 0

    Attribute Information:

    1. Sample code number: id number
    2. Clump Thickness: 1 - 10
    3. Uniformity of Cell Size: 1 - 10
    4. Uniformity of Cell Shape: 1 - 10
    5. Marginal Adhesion: 1 - 10
    6. Single Epithelial Cell Size: 1 - 10
    7. Bare Nuclei: 1 - 10
    8. Bland Chromatin: 1 - 10
    9. Normal Nucleoli: 1 - 10
    10. Mitoses: 1 - 10
    11. Class: (0 for benign, 1 for malignant)

    Acknowledgements

    Data obtained from : UCI machine learning repository Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

    Picture courtesy: Photo by Pablo Heimplatz on Unsplash

  16. Breast Cancer Wisconsin (Prognostic) Data Set

    • kaggle.com
    zip
    Updated Mar 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah VCH (2017). Breast Cancer Wisconsin (Prognostic) Data Set [Dataset]. https://www.kaggle.com/sarahvch/breast-cancer-wisconsin-prognostic-data-set
    Explore at:
    zip(49800 bytes)Available download formats
    Dataset updated
    Mar 31, 2017
    Authors
    Sarah VCH
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    Data From: UCI Machine Learning Repository http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wpbc.names

    Content

    "Each record represents follow-up data for one breast cancer case. These are consecutive patients seen by Dr. Wolberg since 1984, and include only those cases exhibiting invasive breast cancer and no evidence of distant metastases at the time of diagnosis.

    The first 30 features are computed from a digitized image of a
    fine needle aspirate (FNA) of a breast mass. They describe
    characteristics of the cell nuclei present in the image.
    A few of the images can be found at
    http://www.cs.wisc.edu/~street/images/
    
    The separation described above was obtained using
    Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree
    Construction Via Linear Programming." Proceedings of the 4th
    Midwest Artificial Intelligence and Cognitive Science Society,
    pp. 97-101, 1992], a classification method which uses linear
    programming to construct a decision tree. Relevant features
    were selected using an exhaustive search in the space of 1-4
    features and 1-3 separating planes.
    
    The actual linear program used to obtain the separating plane
    in the 3-dimensional space is that described in:
    [K. P. Bennett and O. L. Mangasarian: "Robust Linear
    Programming Discrimination of Two Linearly Inseparable Sets",
    Optimization Methods and Software 1, 1992, 23-34].
    
    The Recurrence Surface Approximation (RSA) method is a linear
    programming model which predicts Time To Recur using both
    recurrent and nonrecurrent cases. See references (i) and (ii)
    above for details of the RSA method. 
    
    This database is also available through the UW CS ftp server:
    
    ftp ftp.cs.wisc.edu
    cd math-prog/cpo-dataset/machine-learn/WPBC/
    

    1) ID number 2) Outcome (R = recur, N = nonrecur) 3) Time (recurrence time if field 2 = R, disease-free time if field 2 = N) 4-33) Ten real-valued features are computed for each cell nucleus:

    a) radius (mean of distances from center to points on the perimeter)
    b) texture (standard deviation of gray-scale values)
    c) perimeter
    d) area
    e) smoothness (local variation in radius lengths)
    f) compactness (perimeter^2 / area - 1.0)
    g) concavity (severity of concave portions of the contour)
    h) concave points (number of concave portions of the contour)
    i) symmetry 
    j) fractal dimension ("coastline approximation" - 1)"
    

    Acknowledgements

    Creators:

    Dr. William H. Wolberg, General Surgery Dept., University of
    Wisconsin, Clinical Sciences Center, Madison, WI 53792
    wolberg@eagle.surgery.wisc.edu
    
    W. Nick Street, Computer Sciences Dept., University of
    Wisconsin, 1210 West Dayton St., Madison, WI 53706
    street@cs.wisc.edu 608-262-6619
    
    Olvi L. Mangasarian, Computer Sciences Dept., University of
    Wisconsin, 1210 West Dayton St., Madison, WI 53706
    olvi@cs.wisc.edu 
    

    Inspiration

    I'm really interested in trying out various machine learning algorithms on some real life science data.

  17. Cancer Data

    • kaggle.com
    Updated Mar 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erdem Taha (2023). Cancer Data [Dataset]. https://www.kaggle.com/datasets/erdemtaha/cancer-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 22, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Erdem Taha
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    🦠 Breast Cancer Data Set

    This dataset contains the characteristics of patients diagnosed with cancer. The dataset contains a unique ID for each patient, the type of cancer (diagnosis), the visual characteristics of the cancer and the average values of these characteristics.

    📚 The main features of the dataset are as follows:

    1. id: Represents a unique ID of each patient.
    2. diagnosis: Indicates the type of cancer. This property can take the values "M" (Malignant - Benign) or "B" (Benign - Malignant).
    3. radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, concavity_mean, concave points_mean: Represents the mean values of the cancer's visual characteristics.

    There are also several categorical features where patients in the dataset are labeled with numerical values. You can examine them in the Chart area.

    Other features contain specific ranges of average values of the features of the cancer image:

    • radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, concavity_mean, concave points_mean

    Each of these features is mapped to a table containing the number of values in a given range. You can examine the Chart Tables

    Each sample contains the patient's unique ID, the cancer diagnosis and the average values of the cancer's visual characteristics.

    Such a dataset can be used to train or test models and algorithms used to make cancer diagnoses. Understanding and analyzing the dataset can contribute to the improvement of cancer-related visual features and diagnosis.

    ✨ Examples of Projects that can be done with the Data Set

    Logistic Regression: This algorithm can be used effectively for binary classification problems. In this dataset, logistic regression may be an appropriate choice since there are "Malignant" (benign) and "Benign" (malignant) classes. It can be used to predict cancer type with the visual features in the dataset.

    K-Nearest Neighbors (KNN): KNN classifies an example by looking at the k closest examples around it. This algorithm assumes that patients with similar characteristics tend to have similar types of cancer. KNN can be used for cancer diagnosis by taking into account neighborhood relationships in the data set.

    Support Vector Machines (SVM): SVM is effective for classification tasks, especially for two-class problems. Focusing on the clear separation of classes in the dataset, SVM is a powerful algorithm that can be used for cancer diagnosis.

    Data Set Related Training Notebooks 😊 ("I Recommend You Review")

    K-NN Project: https://www.kaggle.com/code/erdemtaha/prediction-cancer-data-with-k-nn-95

    Logistic Regressüon: https://www.kaggle.com/code/erdemtaha/cancer-prediction-96-5-with-logistic-regression

    💖 Acknowledgements and Information

    This is a copy of content that has been elaborated for educational purposes and published to reach more people, you can access the original source from the link below, please do not forget to support that data

    🔗 https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data

    This database can also be accessed via the UW CS ftp server: 🔗 ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/

    It can also be found at the UCI Machine Learning Repository: 🔗 https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

    📩 Personal Information:

    If you have some questions or curiosities about the data or studies, you can contact me as you wish from the links below 😊

    LinkedIn: https://www.linkedin.com/in/erdem-taha-sokullu/

    Mail: erdemtahasokullu@gmail.com

    Github: https://github.com/Prometheussx

    Kaggle: https://www.kaggle.com/erdemtaha

    📜 License:

    This Data has a CC BY-NC-SA 4.0 License You can review the license rules from the link below

    License Link: https://creativecommons.org/licenses/by-nc-sa/4.0/

  18. Breast Cancer Diagnosis

    • kaggle.com
    zip
    Updated Dec 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Breast Cancer Diagnosis [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-breast-cancer-diagnosis-with-wisconsi
    Explore at:
    zip(78356 bytes)Available download formats
    Dataset updated
    Dec 4, 2022
    Authors
    The Devastator
    Description

    Breast Cancer Diagnosis

    Study breast cancer diagnosis

    By UCI [source]

    About this dataset

    This dataset contains data on breast cancer diagnosis, a devastating medical condition that affects thousands of people around the world each year. The data is comprised of patient ID, diagnosis (Malignant or Benign), and 30 computed features extracted from a digitized image of a fine needle aspirate (FNA) of a breast mass. Features include radius, texture, perimeter, area, smoothness, compactness concavity and concave points as well as symmetry and fractal dimension.

    Created by renowned researchers in the fields of General Surgery and Computer Science at the University of Wisconsin-Madison led by Dr. William H Wolberg with contributions from Professor W Nick Street and Olvi L Mangasarian this dataset was used in some groundbreaking research to predict breast cancer prognosis using linear programming methods. More recently statistical methods such as support vector machines have been employed to classify tumour types from this dataset as well other tasks such as identify hidden patterns through pattern recognition techniques like Artificial Neural Networks (ANN).

    It has also been used for studies exploring unsupervised classification tools like Ant Colony Optimization for discovering meaningful relationships among different variables which can help physicians better understand the progression of certain types of tumors over time. For example types cardinality analysis allowed researchers to determine tumor’s heterogeneity before deciding on appropriate treatments potentially leading to improved prognosis success rates overall. This Wisconsin Breast Cancer Diagnostic dataset provides an invaluable resource to scientists working on preventing or curing this dreaded disease - a goal we all eagerly hope to achieve someday soon!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Research Ideas

    • Developing a classifier that can accurately predict breast cancer diagnoses based on the provided features.
    • Clustering patient data with similar diagnosis to discover trends or connections between certain symptoms and diagnoses.
    • Optimizing feature selection algorithms to identify the most relevant predictors of breast cancer diagnosis from a set of given cell nuclei features

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: unformatted-data.csv

    File: wpbc.data.csv | Column name | Description | |:--------------|:--------------------------------| | 119513 | ID number (Integer) | | N | Diagnosis (Binary) | | 31 | Radius (Real-valued) | | 18.02 | Texture (Real-valued) | | 27.6 | Perimeter (Real-valued) | | 117.5 | Area (Real-valued) | | 1013 | Smoothness (Real-valued) | | 0.09489 | Compactness (Real-valued) | | 0.1036 | Concavity (Real-valued) | | 0.1086 | Symmetry (Real-valued) | | 0.07055 | Fractal Dimension (Real-valued) | | 0.1865 | Mean Intensity (Real-valued) | | 0.06333 | Standard Error (Real-valued) | | 0.6249 | Worst Radius (Real-valued) | | 1.89 | Worst Texture (Real-valued) | | 3.972 | Worst Perimeter (Real-valued) | | 71.55 | Worst Area (Real-valued) | | 0.004433 | Worst Smoothness (Real-valued) | | 0.01421 | Worst Compactness (Real-valued) | | 0.03233 | Worst Concavity (Real-valued) |

    File: breast-cancer-wisconsin.data.csv | Column name | Description | |:--------------|:--------------------------------------| | 119513 | ID number (Integer) | | 1000025 | ID number (Integer) | | 1.1 | Uniformity of Cell Size (Integer) | | 1.2 | Uniformity of Cell Shape (Integer) | | 1.3 | Single Epithelial Cell Size (Integer) | | 1.4 | Bland Chromatin (Integer) | | 1.5 | Normal Nucleoli (Integer) | | 2.1 | Mitoses (Integer) |

    File: wdbc.data.csv | Column name | Description | |:--------------|:----------------------------------------| | 842302 | Patient ID number (Integer Type) | | M | Diagnosis (Binary Type) | | **...

  19. Curated Healthcare and Genomics Datasets

    • kaggle.com
    zip
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fluffy (2024). Curated Healthcare and Genomics Datasets [Dataset]. https://www.kaggle.com/datasets/remyz5/curated-healthcare-and-genomics-datasets
    Explore at:
    zip(2547082 bytes)Available download formats
    Dataset updated
    Nov 25, 2024
    Authors
    Fluffy
    Description

    Overview

    This collection consists of curated datasets optimized for machine learning tasks such as binary classification, regression, and survival modeling. The datasets are derived from publicly available sources, cleaned, and preprocessed to support a variety of applications in healthcare and genomics research. Each dataset focuses on a specific domain and task, making it easier for practitioners to build and evaluate models.

    Files and Description

    1. Breast Cancer Data

    File Name: breast_cancer_data.xlsx
    Source Dataset: Breast Cancer Wisconsin Data by uciml

    Description:
    This dataset is designed for binary classification tasks aimed at diagnosing breast cancer. It contains measurements from fine-needle aspirates of breast masses, categorized into benign or malignant tumors. The target variable is the diagnosis outcome (benign = 0, malignant = 1).

    Applications:
    - Cancer diagnosis and prediction.
    - Feature selection to identify critical predictors of malignancy.

    2. Combined Data (Azithromycin)

    File Name: combined_data_Azithromycin.csv
    Source Dataset: Gonorrhea Unitigs by nwheeler443

    Description:
    This dataset is optimized for regression and survival modeling, focusing on genomic markers linked to gonorrhea's resistance to Azithromycin. It includes features derived from unitigs (genomic segments) and metadata related to antibiotic susceptibility.

    Applications:
    - Prediction of drug resistance for Azithromycin.

    3. Combined Data (Ciprofloxacin)

    File Name: combined_data_Ciprofloxacin.csv
    Source Dataset: Gonorrhea Unitigs by nwheeler443

    Description:
    This dataset is also tailored for regression and survival modeling, with a focus on Ciprofloxacin resistance. Similar to the Azithromycin dataset, it includes genomic unitig data and antibiotic susceptibility features, curated to predict resistance.

    Applications:
    - Prediction of drug resistance for Ciprofloxacin.

    4. Dementia Dataset

    File Name: dementia_dataset.csv
    Source Dataset: Dementia Prediction Dataset by shashwatwork

    Description:
    This dataset is optimized for binary classification tasks, focusing on predicting the presence of dementia based on clinical and demographic data. Features include cognitive test results, demographic information, and assessments of patient functionality. The target variable is a binary indicator of dementia diagnosis.

    Applications:
    - Dementia diagnosis prediction.
    - Exploratory data analysis for identifying high-impact predictors.

    Acknowledgements

    These datasets are based on publicly available resources and are credited to their respective original creators:
    1. Shashwatwork (Dementia Dataset).
    2. Nwheeler443 (Gonorrhea Unitigs).
    3. UCI ML Repository (Breast Cancer Wisconsin Dataset).

    The curated versions have been optimized for streamlined machine learning workflows.

  20. Basic datasets

    • kaggle.com
    zip
    Updated Apr 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pascal (2024). Basic datasets [Dataset]. https://www.kaggle.com/datasets/pyim59/basic-datasets
    Explore at:
    zip(2343887 bytes)Available download formats
    Dataset updated
    Apr 1, 2024
    Authors
    Pascal
    Description

    Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim (Image générée avec ideogram.ai)

    Régression

    "datareg_xxx_yyy.csv"

    Exemples simples pour la regression Par exemple "datareg_cos_300.csv" est un ensemble de 300 points suivant un cosinus bruité avec deux colonnes 'x' et 'y'

    "housing.csv"

    Estimation de la valeur moyenne des maisons (MEDV) par quartier en fonction de différentes données : - RM : nombre de chambres - LSTAT : mesure du taux de pauvreté - PTRATIO : mesure du taux d'encadrement par élève dans les écoles

    Version simplifiée du dataset original UCI

    Source : https://www.kaggle.com/datasets/schirmerchad/bostonhoustingmlnd

    "kc_house_data.csv"

    Prédiction de prix de maisons aux alentours de Seattle (district de King County)

    Source : https://www.kaggle.com/datasets/harlfoxem/housesalesprediction

    "house_prices.csv"

    Prédiction de prix de maisons - Compétition Kaggle

    Source : https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data?select=train.csv

    Classification

    "geyser.csv"

    Le geyser « Old Faithful » est un geyser en cône du parc de Yellowstone aux États-Unis

    On a mesuré : - duration : la durée de l’éruption - waiting : l’intervalle de temps depuis la dernière éruption - kind : une étiquette 'short' ou 'long' du type d’éruption

    "iris.csv"

    Dataset pour classifier les espèces d'Iris

    https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQM3aH4Q3AplfE1MR3ROAp9Ok35fafmNT59ddXkdEvNdMkT8X6E">

    On a les informations suivantes : - sepal_length : longueur du sépale (en cm) - sepal_width : largeur du sépale - length,petal : longueur du pétale - petal_width : largeur du pétale - species : 3 espèces d'iris : 'setosa', 'versicolor' ou 'virginica'

    Source : UCI (http://archive.ics.uci.edu/)

    "iris_basic.csv"

    Une version simplifiée du dataset des iris, avec seulement les mesures de pétales et 2 espèces : versicolor (0) et virginica (1)

    "heart.csv"

    Prédiction de malaise cardiaque (output) en fonction de différents paramètres comme l'âge, le taux de cholesterol, ...

    Source : https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset

    "cancer.csv"

    On veut prédire si une tumeur est maline ou non, en fonction de mesures sur une biopsie de la tumeur

    Source : https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

    "penguins.csv"

    Dataset comparable à celui des Iris. On veut prédire l'espèce de manchots

    • species : Adelie, Chinstrap, Gentoo
    • island : Biscoe, Dream, Torgersen
    • bill_length_mm : longueur du bec
    • bill_depth_mm : épaisseur du bec
      • flipper_length_mm : longueur de la nageoire
    • body_mass_g : poids
    • sex : “male” ou “female”

    Source : https://www.kaggle.com/ashkhagan/palmer-penguins-datasetalternative-iris-dataset

    "stars.csv"

    Classification d'étoiles

    Source : https://www.kaggle.com/datasets/deepu1109/star-dataset

    "mushrooms.csv"

    Prédire si un champignon est comestible ou non

    Source : https://www.kaggle.com/uciml/mushroom-classification

    "titanic.csv"

    Dataset très classique sur les survivants du Titanic

    Source : https://www.kaggle.com/c/titanic

    "diabetes.csv"

    Dataset "PIMA Indian diabete"

    Prédiction du diabète pour une population de femmes de la tribu Pima

    Source : https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

    "churn-small.csv"

    On veut prédire le départ de clients pour la concurrence de clients Orange telecom (problème de ‘churn’ ou ‘attrition’)

    Version "churn-big.csv" avec plus de données

    Source : https://www.kaggle.com/datasets/mnassrib/telecom-churn-datasets

    "stroke.csv"

    Prédiction d'attaque cérébrale

    Source : https://www.kaggle.com/datasets/shashwatwork/cerebral-stroke-predictionimbalaced-dataset

    "predictive_maintenance.csv"

    Prédiction de pannes (UCI)

    Source : https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification/code

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saurabh Badole (2024). Breast Cancer Diagnosis Dataset - Wisconsin State [Dataset]. https://www.kaggle.com/datasets/saurabhbadole/breast-cancer-wisconsin-state
Organization logo

Breast Cancer Diagnosis Dataset - Wisconsin State

Analyzing Tumor Characteristics for Cancer Detection

Explore at:
zip(5844 bytes)Available download formats
Dataset updated
Mar 31, 2024
Authors
Saurabh Badole
Area covered
Wisconsin
Description

Description:

Explore the field of breast cancer diagnosis with the insightful Wisconsin Breast Cancer dataset (Original). This dataset provides detailed attributes representing tumor characteristics observed in breast tissue samples. By analyzing these attributes, researchers and medical professionals can gain insights into tumor behavior and develop predictive models for cancer detection and prognosis.

Features
1. Sample code number: Unique identifier for each tissue sample.
2. Clump Thickness: Assessment of the thickness of tumor cell clusters (1 - 10).
3. Uniformity of Cell Size: Uniformity in the size of tumor cells (1 - 10).
4. Uniformity of Cell Shape: Uniformity in the shape of tumor cells (1 - 10).
5. Marginal Adhesion: Degree of adhesion of tumor cells to surrounding tissue (1 - 10).
6. Single Epithelial Cell Size: Size of individual tumor cells (1 - 10).
7. Bare Nuclei: Presence of nuclei without surrounding cytoplasm (1 - 10).
8. Bland Chromatin: Assessment of chromatin structure in tumor cells (1 - 10).
9. Normal Nucleoli: Presence of normal-looking nucleoli in tumor cells (1 - 10).
10. Mitoses: Frequency of mitotic cell divisions (1 - 10).
11. Class: Classification of tumor type (2 for benign, 4 for malignant).

Usage:

  • Cancer diagnosis: Develop machine learning models to classify tumors as benign or malignant based on their characteristics, aiding in early detection and treatment planning.
  • Feature importance analysis: Identify key attributes contributing to tumor malignancy and understand their biological significance.
  • Clinical decision support: Assist healthcare professionals in interpreting biopsy results and making informed decisions about patient care.

Acknowledgements:

The Breast Cancer Wisconsin dataset is sourced from tissue samples collected for diagnostic purposes, with attributes derived from microscopic examination. The dataset is anonymized and made available for research purposes, contributing to advancements in cancer diagnosis and treatment.

Search
Clear search
Close search
Google apps
Main menu