Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cervical cancer is one of the leading causes of cancer-related deaths among women worldwide. Early detection and accurate prediction of cervical cancer can significantly improve the chances of successful treatment and save lives. This dataset help to develop a predictive model using machine learning techniques to identify individuals at high risk of cervical cancer, allowing for timely intervention and medical care.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Breast Cancer Wisconsin Diagnostic Dataset
Following description was retrieved from breast cancer dataset on UCI machine learning repository. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at here. Separating plane described above was obtained using Multisurface Method-Tree (MSM-T), a classification method which uses linear… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/breast-cancer-wisconsin.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset was collected in the above-mentioned specialist hospitals over a period of three months in fall 2019. It includes CT scans of patients diagnosed with lung cancer in different stages, as well as healthy subjects. IQ-OTH/NCCD slides were marked by oncologists and radiologists in these two centers. The dataset contains a total of 1190 images representing CT scan slices of 110 cases. These cases are grouped into three classes: normal, benign, and malignant. of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases. The CT scans were originally collected in DICOM format. The scanner used is SOMATOM from Siemens. CT protocol includes: 120 kV, slice thickness of 1 mm, with window width ranging from 350 to 1200 HU and window center from 50 to 600 were used for reading. with breath hold at full inspiration. All images were de-identified before performing analysis. Written consent was waived by the oversight review board. The study was approved by the institutional review board of participating medical centers. Each scan contains several slices. The number of these slices range from 80 to 200 slices, each of them represents an image of the human chest with different sides and angles. The 110 cases vary in gender, age, educational attainment, area of residence and living status. Some of them are employees of the Iraqi ministries of Transport and Oil, others are farmers and gainers. Most of them come from places in the middle region of Iraq, particularly, the provinces of Baghdad, Wasit, Diyala, Salahuddin, and Babylon.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset of breast cancer patients was obtained from the 2017 November update of the SEER Program of the NCI, which provides information on population-based cancer statistics. The dataset involved female patients with infiltrating duct and lobular carcinoma breast cancer (SEER primary cites recode NOS histology codes 8522/3) diagnosed in 2006-2010. Patients with unknown tumour size, examined regional LNs, positive regional LNs, and patients whose survival months were less than 1 month were excluded; thus, 4024 patients were ultimately included.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
📄 Dataset Description: This dataset contains global cancer patient data reported from 2015 to 2024, designed to simulate the key factors influencing cancer diagnosis, treatment, and survival. It includes a variety of features that are commonly studied in the medical field, such as age, gender, cancer type, environmental factors, and lifestyle behaviors. The dataset is perfect for:
Exploratory Data Analysis (EDA)
Multiple Linear Regression and other modeling tasks
Feature Selection and Correlation Analysis
Predictive Modeling for cancer severity, treatment cost, and survival prediction
Data Visualization and creating insightful graphs
Key Features: Age: Patient's age (20-90 years)
Gender: Male, Female, or Other
Country/Region: Country or region of the patient
Cancer Type: Various types of cancer (e.g., Breast, Lung, Colon)
Cancer Stage: Stage 0 to Stage IV
Risk Factors: Includes genetic risk, air pollution, alcohol use, smoking, obesity, etc.
Treatment Cost: Estimated cost of cancer treatment (in USD)
Survival Years: Years survived since diagnosis
Severity Score: A composite score representing cancer severity
This dataset provides a broad view of global cancer trends, making it an ideal resource for those learning data science, machine learning, and statistical analysis in healthcare.
The United States Cancer Statistics (USCS) online databases in WONDER provide cancer incidence and mortality data for the United States for the years since 1999, by year, state and metropolitan areas (MSA), age group, race, ethnicity, sex, childhood cancer classifications and cancer site. Report case counts, deaths, crude and age-adjusted incidence and death rates, and 95% confidence intervals for rates. The USCS data are the official federal statistics on cancer incidence from registries having high-quality data and cancer mortality statistics for 50 states and the District of Columbia. USCS are produced by the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), in collaboration with the North American Association of Central Cancer Registries (NAACCR). Mortality data are provided by the Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), National Vital Statistics System (NVSS).
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Brain Tumors MRI Images - 2,000,000+ MRI studies
The dataset consists of MRI scans of human brains with medical reports and is designed to detection, classification, and segmentation of tumors in cancer patients. The data includes a variety of brain tumors such as gliomas, benign tumors, malignant tumors, and brain metastasis, along with clinical information for each patient - Get the data The MRI scans provide detailed medical imaging of different tissues and tumor regions… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/brain-cancer-dataset.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for Lung Cancer
Dataset Summary
The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/virtual10/lungs_cancer.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.
The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.
Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.
The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.
Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BIT/breast-cancer-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of 1 .xlsx file, 2 .png files, 1 .json file and 1 .zip file:annotation_details.xlsx: The distribution of annotations in the previously mentioned six classes (mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule) is presented in a Excel spreadsheet.original.png: The input image.annotated.png: An example from the dataset. In the annotated image, blue circles indicate the tumor nuclei, pink circles show non-tumor nuclei such as blood cells, stroma nuclei, and lymphocytes; orange and green circles are mitosis and apoptosis, respectively; light blue circles are true lumen for tubules, and yellow circles represent white regions (non-lumen) such as fat, blood vessel, and broken tissues.data.json: The annotations for the BreCaHAD dataset are provided in JSON (JavaScript Object Notation) format. In the given example, the JSON file (ground truth) contains two mitosis and only one tumor nuclei annotations. Here, x and y are the coordinates of the centroid of the annotated object, and the values are between 0, 1.BreCaHAD.zip: An archive file containing dataset. Three folders are included: images (original images), groundTruth (json files), and groundTruth_display (groundTruth applied on original images)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Images of histological sections with and without breast cancer, using the Biglycan biomarker.Jan 26, 2023
Biomarkers, Breast Cancer
Institutions
Universidade do Vale do Rio dos Sinos, Hospital de Clinicas de Porto Alegre, Universidade Federal do Rio Grande do Sul, Instituto Federal de Educacao Ciencia e Tecnologia de Mato Grosso
Image source: Vitro Vivo Biotech
Dataset Card for "breast-cancer"
Dataset was taken from the MedSAM project and used in this notebook which fine-tunes Meta's SAM model on the dataset. More Information needed
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data comes from two different sources. Population-based lung cancer incidence rates for the period 2010-2014 (most updated data) were abstracted from National Cancer Institute state cancer profiles (Schwartz et al. 1996).This national county-level database of cancer data is collected by state public health surveillance systems. The domain specific county level environmental quality index (EQI) data for the period 2000-2005 were abstracted from United States Environmental Protection Agency (USEPA) profile. Complete descriptions of the datasets used in the EQI are provided in Lobdell’s paper (Lobdell 2011). Data were merged based on the Federal Information Processing Standards (FIPS) code. Out of 3144 counties in United States this study has available information for 2602 counties: Data was not available for four states namely Kansas, Michigan, Minnesota and Nevada due to state legislation and regulations which prohibit the release of county-level data to outside entities, county whose lung cancer mortality information is missing were omitted from the data set, the Union county, Florida is an outlier in terms of mortality information which was deleted from the data set, in the process of local control analysis this study experiences two (cluster 28 and 29) non-informative clusters (non-informative cluster is one for which either treatment or control group information is missing). For analysis, non-informative clusters information was deleted from the data set. Three types of variables are used in this study: (i) lung cancer mortality as an outcome variable (ii) binary treatment indicator is the PM2.5 high (greater than 10.59 mg/m3) vs. low (less than 10.59 mg/m3) (iii) three potential X confounder for clustering namely land EQI, sociodemographic EQI and built EQI. For each index, higher values correspond to poorer environmental quality (Jagai et al. 2017). As PM2.5 is one of the indicators for measuring air EQI, that is why we do not consider the air EQI to avoid confounding effects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset of 100 patients to implement the machine learning algorithm and thereby interpreting results The data set consists of 100 observations and 10 variables (out of which 8 numeric variables and one categorical variable and is ID) which are as follows: Id 1.Radius 2.Texture 3.Perimeter 4.Area 5.Smoothness 6.Compactness 7.diagnosis_result 8.Symmetry 9.Fractal dimension
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • Lung Cancer dataset comprises medical imaging data of lung scans, annotated for binary classification indicating the Yes (1) or No(0) of lung cancer.
2) Data Utilization (1) Lung Cancer data has characteristics that: • The dataset includes 1 continuous variable, 15 category variables. (2) Lung Cancer data can be used to: • Model Learning: Deep learning models such as convolutional neural networks (CNNs) can be used to analyze lung scan images, and develop diagnostic systems that predict lung cancer. • Simulation Diagnostic Training: Using medical imaging data, doctors can perform simulation diagnostic training and improve diagnostic capabilities.
Cancer Rates for Lake County Illinois. Explanation of field attributes: Colorectal Cancer - Cancer that develops in the colon (the longest part of the large intestine) and/or the rectum (the last several inches of the large intestine). This is a rate per 100,000. Lung Cancer – Cancer that forms in tissues of the lung, usually in the cells lining air passages. This is a rate per 100,000. Breast Cancer – Cancer that forms in tissues of the breast. This is a rate per 100,000. Prostate Cancer – Cancer that forms in tissues of the prostate. This is a rate per 100,000. Urinary System Cancer – Cancer that forms in the organs of the body that produce and discharge urine. These include the kidneys, ureters, bladder, and urethra. This is a rate per 100,000. All Cancer – All cancers including, but not limited to: colorectal cancer, lung cancer, breast cancer, prostate cancer, and cancer of the urinary system. This is a rate per 100,000.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Cancer diagnoses and age-standardised incidence rates for all types of cancer by age and sex including breast, prostate, lung and colorectal cancer.
The dataset is maintained by VISION AND IMAGE PROCESSING LAB, University of Waterloo. The images of the dataset were extracted from the public databases DermIS and DermQuest, along with manual segmentations of the lesions.
The dataset was used in the following journal publication. [1] Glaister, J., A. Wong, and D. A. Clausi, "Automatic segmentation of skin lesions from dermatological photographs using a joint probabilistic texture distinctiveness approach", IEEE Transactions on Biomedical Engineering [2] Amelard, R., J. Glaister, A. Wong, and D. A. Clausi, "High-level intuitive features (HLIFs) for intuitive skin lesion descriptionpdf", IEEE Transactions on Biomedical Engineering, vol. 62, issue 3, pp. 820-831, October, 2015. [3] Glaister, J., R. Amelard, A. Wong, and D. A. Clausi, "MSIM: Multi-Stage Illumination Modeling of Dermatological Photographs for Illumination-Corrected Skin Lesion Analysis", IEEE Transactions on Biomedical Engineering, vol. 60, issue 7, pp. 1873 - 1883, November, 2013.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cervical cancer is one of the leading causes of cancer-related deaths among women worldwide. Early detection and accurate prediction of cervical cancer can significantly improve the chances of successful treatment and save lives. This dataset help to develop a predictive model using machine learning techniques to identify individuals at high risk of cervical cancer, allowing for timely intervention and medical care.