18 datasets found
  1. JAKEE Data Dictionary.xlsx

    • figshare.com
    xlsx
    Updated May 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Igor Makhlin; Nicholas McAndrew; E. Paul Wileyto; Amy S. Clark; Robin Holmes; Lisa N. Bottalico; Clementina Mesaros; Ian A. Blair; Grace Jeschke; Kevin Fox; Susan M. Domchek; Jennifer Matro; Angela R. Bradbury; Michael Feldman; Elizabeth O. Hexner; Jacqueline F. Bromberg; Angela DeMichele (2022). JAKEE Data Dictionary.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.19783000.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 19, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Igor Makhlin; Nicholas McAndrew; E. Paul Wileyto; Amy S. Clark; Robin Holmes; Lisa N. Bottalico; Clementina Mesaros; Ian A. Blair; Grace Jeschke; Kevin Fox; Susan M. Domchek; Jennifer Matro; Angela R. Bradbury; Michael Feldman; Elizabeth O. Hexner; Jacqueline F. Bromberg; Angela DeMichele
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Dictionary for accompanying de-identified JAKEE STATA data set.

  2. i

    SEER Breast Cancer Data

    • ieee-dataport.org
    • data.niaid.nih.gov
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jing teng (2025). SEER Breast Cancer Data [Dataset]. https://ieee-dataport.org/open-access/seer-breast-cancer-data
    Explore at:
    Dataset updated
    Jul 29, 2025
    Authors
    jing teng
    Description

    examined regional LNs

  3. Breast Cancer Prediction

    • kaggle.com
    zip
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatemeh Mehrparvar (2024). Breast Cancer Prediction [Dataset]. https://www.kaggle.com/datasets/fatemehmehrparvar/breast-cancer-prediction/code
    Explore at:
    zip(2060 bytes)Available download formats
    Dataset updated
    Jan 16, 2024
    Authors
    Fatemeh Mehrparvar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    Research Hypothesis: This study hypothesizes that there are significant associations between the diagnostic characteristics of patients, including age, menopause status, tumor size, presence of invasive nodes, affected breast, metastasis status, breast quadrant, history of breast conditions, and their breast cancer diagnosis result. Data Collection and Description:The dataset of 213 patient observations was obtained from the University of Calabar Teaching Hospital cancer registry over 24 months (January 2019–August 2021). The data includes eleven features: year of diagnosis, age, menopause status, tumor size in cm, number of invasive nodes, breast (left or right) affected, metastasis (yes or no), quadrant of the breast affected, history of breast disease, and diagnosis result (benign or malignant).Notable Findings:Upon preliminary examination, the data shows variations in diagnosis results across different patient features. A noticeable trend is the higher prevalence of malignant results among patients with larger tumor sizes and the presence of invasive nodes. Additionally, postmenopausal women seem to have a higher rate of malignant diagnoses.Interpretation and Usage:The data can be analyzed using statistical and machine learning techniques to determine the strength and significance of associations between patient characteristics and breast cancer diagnosis. This can contribute to predictive modeling for the early detection and diagnosis of breast cancer.However, the interpretation must consider potential limitations, such as missing data or bias in data collection. Furthermore, the data reflects patients from a single hospital, limiting the generalizability of the findings to wider populations.The data could be valuable for healthcare professionals, researchers, or policymakers interested in understanding breast cancer diagnosis factors and improving healthcare strategies for breast cancer. It could also be used in patient education about risk factors associated with breast cancer.

    About Dataset

    • S/N = Unique identification for each patient.

    • Year=The year diagnosis was conducted.

    • Age = Age of patient at the time of diagnose.

    • Menopause = Whether the patient is pro or postmenopausal at the time diagnose,0 MEANS THAT THE PATIENT HAS REACHED MENOPAUSE WHILE 1 MEANS THAT THE PATIENT HAS NOT REACHED MENOPAUSE YET.

    • Tumor size = The size in centimeter of the excised tumor.

    • Involved nodes = The number of axillary lymph nodes that contain metastatic,"CODED AS A BINARY DISTRI UTION OF EITHER PRESENT OR ASENT. 1 MEANS PRESENT, 0 MEANS ABSENT."

    • Breast = If it occurs on the left or right side,"CODED AS A BINARY DISTRIBUTION 1 MEANS THE CANCER HAS SPREAD, 0 MEANS IT HASN'T SPREAD YET."

    • Metastatic = If the cancer has spread to other part of the body or organ.

    • Breast quadrant = The gland is divided into 4 sections with nipple as a central point.

    • History = If the patient has any history or family history on cancer,"1 means there is a history of cancer , 0 means no history."

    • Diagnosis result = Instances of the breast cancer dataset.

  4. O

    ARCHIVED - Female Breast Cancer

    • data.sandiegocounty.gov
    csv, xlsx, xml
    Updated Dec 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County of San Diego (2019). ARCHIVED - Female Breast Cancer [Dataset]. https://data.sandiegocounty.gov/Health/ARCHIVED-Female-Breast-Cancer/c9w2-uiw2
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Dec 17, 2019
    Dataset authored and provided by
    County of San Diego
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Basic Metadata

    *Rates per 100,000 population. Age-adjusted rates per 100,000 2000 US standard population.

    **Blank Cells: Rates not calculated for fewer than 5 events. Rates not calculated in cases where zip code is unknown.

    ***API: Asian/Pacific Islander. ***AIAN: American Indian/Alaska Native.

    Prepared by: County of San Diego, Health & Human Services Agency, Public Health Services, Community Health Statistics Unit, 2019.

    Code Source: ICD-9CM - AHRQ HCUP CCS v2015. ICD-10CM - AHRQ HCUP CCS v2018. ICD-10 Mortality - California Department of Public Health, Group Cause of Death Codes 2013; NHCS ICD-10 2e-v1 2017.

    Data Guide, Dictionary, and Codebook: https://www.sandiegocounty.gov/content/dam/sdc/hhsa/programs/phs/CHS/Community%20Profiles/Public%20Health%20Services%20Codebook_Data%20Guide_Metadata_10.2.19.xlsx

  5. r

    Data from: Computational pathology annotation enhances the resolution and...

    • researchdata.se
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tianyi Li; Qiao Yang; Balazs Acs; Emmanouil G. Sifakis; Hosein Toosi; Camilla Engblom; Kim Thrane; Qirong Lin; Jeff E. Mold; Wenwen Sun; Ceren Boyaci; Sanna Steen; Jonas Frisén; Jens Lagergren; Joakim Lundeberg; Xinsong Chen; Johan Hartman (2025). Computational pathology annotation enhances the resolution and interpretation of breast cancer spatial transcriptomics data [Dataset]. http://doi.org/10.48723/f4v5-m008
    Explore at:
    (1312), (1213)Available download formats
    Dataset updated
    Sep 1, 2025
    Dataset provided by
    Karolinska Institutet
    Authors
    Tianyi Li; Qiao Yang; Balazs Acs; Emmanouil G. Sifakis; Hosein Toosi; Camilla Engblom; Kim Thrane; Qirong Lin; Jeff E. Mold; Wenwen Sun; Ceren Boyaci; Sanna Steen; Jonas Frisén; Jens Lagergren; Joakim Lundeberg; Xinsong Chen; Johan Hartman
    Area covered
    Sweden
    Description

    The samples in the dataset are connected to a study focusing on studying breast cancer intratumoral heterogeneity using spatial transcriptomic data and computational pathology. The dataset contains 14 samples from 3 patients (one triple negative breast cancer and two HER2-positive breast cancer). Multiple regions of the tumor were collected for analysis. Each sample is one tumor region from one of the patients.

    Libraries for spatial transcriptomics were prepared using Visium spatial gene expression kits (10x genomics). Sequencing was performed using the Illumina NovaSeq 6000 platform at the National Genomics Infrastructure, SciLifeLab in Solna, Sweden.

    The dataset contains 28 fastq files, compressed with GNUzip (gzip), from paired-end RNA sequencing (10X Visium spatial transcriptomics). The meta data is described in SND_metadata.xlsx file. The md5sum.txt file is provided for validation of data integrity. The total size of the dataset is approximately 300 GB.

  6. CWDataset

    • kaggle.com
    zip
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KeshawaUP (2025). CWDataset [Dataset]. https://www.kaggle.com/datasets/keshawaup/cwdataset
    Explore at:
    zip(53776 bytes)Available download formats
    Dataset updated
    Apr 19, 2025
    Authors
    KeshawaUP
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    E)Your Dataset This dataset of breast cancer patients was obtained from the 2017 November update of the SEER Program of the NCI, which provides information on population-based cancer statistics. The dataset contains the following attributes: Table.1 Data Dictionary Attribute Description Patient ID Unique identification for each patient Month of Birth A patient’s month of birth Age A patient’s month of birth in years Sex A patient’s genomic sex Occupation The field of a patient’s job role T Stage The T stage in breast cancer refers to the size of the tumour from T1, T2, T3 and T4 N Stage Used to indicate if the breast cancer has spread to surrounding lymph nodes (N), with a higher number representing a greater number of lymph nodes impacted, from N1, N2 and N3. 6th Stage Breast Imaging Reporting and Data System or BI-RADS Differentiated How the cancer cells look and are growing compared with normal cells.
    Grade Breast Cancer Grades (Nottingham Grading System) A Stage Breast cancer is staged based on how far it has spread.
    Regional: The cancer has spread to nearby lymph nodes or tissues. Distant: The cancer has spread to distant parts of the body, such as the lungs, liver, or bones Tumour Size Tumor size measured in millimeters Estrogen Status Cancer cells have estrogen hormone receptors or not.
    Progesterone Status Cancer cells have progesterone hormone receptors or not. Regional Node Examined Count of examined regional lymph nodes for cancer spread Regional Node Positive Count of cancer positive regional lymph nodes to contain metastases Survival Months Survival months based on date of last contact. Mortality Status Any patient that dies after the follow-up cut-off date is recoded to alive as of the cut-off date. If date of last contact > study cutoff date, vital status recoded = alive. Note: For general knowledge, further information about the collection of patients’ data can be found at https://ieee-dataport.org/open-access/seer-breast-cancer-data M. A. Aldraimli 5DATA002W.2 2024/2025 The survival calculations can be found at
    https://seer.cancer.gov/survivaltime/ https://seer.cancer.gov/survivaltime/SurvivalTimeCalculation.pdf

  7. Variable dictionaries of Brazil's notification systems

    • figshare.com
    application/x-rar
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lincoln Silva (2024). Variable dictionaries of Brazil's notification systems [Dataset]. http://doi.org/10.6084/m9.figshare.26517181.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Aug 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Lincoln Silva
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    The variable dictionaries of Brazil's notification systems are available in PDF format. These documents offer detailed descriptions of the variables used in the notification systems and illustrate how the data are structured, providing a comprehensive understanding of the data collected.

  8. A data-driven interactome of synergistic genes improves network-based cancer...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amin Allahyar; Joske Ubels; Jeroen de Ridder (2023). A data-driven interactome of synergistic genes improves network-based cancer outcome prediction [Dataset]. http://doi.org/10.1371/journal.pcbi.1006657
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Amin Allahyar; Joske Ubels; Jeroen de Ridder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome.

  9. f

    Metadata record for the manuscript: Effects of systemic inflammation on...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Dec 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ganguly, Tapan; DeMichele, Angela; McAndrew, Nicholas; Bottalico, Lisa; Mao, Jun J.; Blair, Ian A.; Mesaros, Clementina; Tsao, Patricia Y.; Gimotty, Phyllis A.; Rosado, Jennifer M.; Song, Sarah J. (2020). Metadata record for the manuscript: Effects of systemic inflammation on relapse in early breast cancer [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000551824
    Explore at:
    Dataset updated
    Dec 7, 2020
    Authors
    Ganguly, Tapan; DeMichele, Angela; McAndrew, Nicholas; Bottalico, Lisa; Mao, Jun J.; Blair, Ian A.; Mesaros, Clementina; Tsao, Patricia Y.; Gimotty, Phyllis A.; Rosado, Jennifer M.; Song, Sarah J.
    Description

    Summary This metadata record provides details of the data supporting the claims of the related manuscript: “Effects of systemic inflammation on relapse in early breast cancer”. The data consist of a single data spreadsheet in Stata and open format, and a supporting data dictionary in .txt format. The related study aimed to assess whether or not elevated serum inflammatory biomarkers (C-Reactive protein [CRP], interleukin-6 [IL-6], and serum amyloid A [SAA]) and/or the presence of a high-risk IL-6 promoter genotype were associated with recurrence of hormone receptor positive (HR+) early breast cancer. Data access The data generated and/or analysed during the related study is openly available as part of this metadata record. The data are as follows:- WABC Case Control Deidentified Dataset.xlsx – De-identified dataset for WABC-II Case Control population with serum and genomic biomarkers as well as outcomes.- WABC Case Control Deidentified Dataset.dta – Stata version of the above.- WABC_data_dictionary.txt – Data dictionary to support the above files. Name of Institutional Review Board or ethics committee that approved the study This study was approved by the Institutional Review Board of the University of Pennsylvania and all study procedures were conducted according to the institution’s code of ethics.

  10. Table_1_Multi-Omic Data Interpretation to Repurpose Subtype Specific Drug...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beste Turanli; Kubra Karagoz; Gholamreza Bidkhori; Raghu Sinha; Michael L. Gatza; Mathias Uhlen; Adil Mardinoglu; Kazim Yalcin Arga (2023). Table_1_Multi-Omic Data Interpretation to Repurpose Subtype Specific Drug Candidates for Breast Cancer.XLSX [Dataset]. http://doi.org/10.3389/fgene.2019.00420.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Beste Turanli; Kubra Karagoz; Gholamreza Bidkhori; Raghu Sinha; Michael L. Gatza; Mathias Uhlen; Adil Mardinoglu; Kazim Yalcin Arga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Triple-negative breast cancer (TNBC), which is largely synonymous with the basal-like molecular subtype, is the 5th leading cause of cancer deaths for women in the United States. The overall prognosis for TNBC patients remains poor given that few treatment options exist; including targeted therapies (not FDA approved), and multi-agent chemotherapy as standard-of-care treatment. TNBC like other complex diseases is governed by the perturbations of the complex interaction networks thereby elucidating the underlying molecular mechanisms of this disease in the context of network principles, which have the potential to identify targets for drug development. Here, we present an integrated “omics” approach based on the use of transcriptome and interactome data to identify dynamic/active protein-protein interaction networks (PPINs) in TNBC patients. We have identified three highly connected modules, EED, DHX9, and AURKA, which are extremely activated in TNBC tumors compared to both normal tissues and other breast cancer subtypes. Based on the functional analyses, we propose that these modules are potential drivers of proliferation and, as such, should be considered candidate molecular targets for drug development or drug repositioning in TNBC. Consistent with this argument, we repurposed steroids, anti-inflammatory agents, anti-infective agents, cardiovascular agents for patients with basal-like breast cancer. Finally, we have performed essential metabolite analysis on personalized genome-scale metabolic models and found that metabolites such as sphingosine-1-phosphate and cholesterol-sulfate have utmost importance in TNBC tumor growth.

  11. Metadata supporting data files in the published article: The acute effects...

    • springernature.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mary E. Sehl; Judith E. Carroll; Steve Horvath; Julienne E. Bower (2023). Metadata supporting data files in the published article: The acute effects of adjuvant radiation and chemotherapy on peripheral blood epigenetic age in early stage breast cancer patients [Dataset]. http://doi.org/10.6084/m9.figshare.11847369.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mary E. Sehl; Judith E. Carroll; Steve Horvath; Julienne E. Bower
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In this study, the authors examined epigenetic changes in peripheral whole blood cells in early stage breast cancer (BC) patients undergoing surgery followed by adjuvant radiotherapy, or surgery followed by adjuvant chemotherapy and radiotherapy.

    Data access: The dataset RISEdataTab1.xlsx supporting Table 1 of the published article is publicly available in the figshare repository as part of this data record. Methylation data supporting Figure 1 and Supplementary Figure 2 of the published article are publicly available in the Gene Expression Omnibus repository, via accession https://identifiers.org/geo:GSE140038. The filenames and types are as follows: NormalizedBetaNoob.csv, GSM4151814-GSM4151957 (in idat format), and Data Dictionary.docx.

    Study approval and patient consent:This study was approved by the UCLA Medical Institutional Review Board. Informed consent was obtained from all participants.

    Study aims and methodology:The aim of this study was to examine the acute effects of adjuvant radiation and chemotherapy on peripheral blood epigenetic age in early stage breast cancer patients.

    Women were eligible for the study if they had been recently diagnosed with Stage 0-IIIA BC and had not yet started adjuvant or neoadjuvant therapy with radiation, chemotherapy, or endocrine therapy. Assessments were conducted before onset of adjuvant therapy, after completion of radiation and/or chemotherapy, and over an 18-month follow-up. The current analysis focuses on a subset of women (n=72) who had blood samples available for epigenetic analyses at baseline and post-treatment. The women selected were treated with radiation alone (n=37), and with chemotherapy followed by radiation (n=35), to evaluate individual and combined effects of those treatment exposures. All women had completed surgery prior to the baseline assessment.

    Four measures of epigenetic age acceleration were examined: intrinsic (IEAA), extrinsic (EEAA), phenotypic (PEAA), and Grim (GEAA), based on weighted averages of methylation levels at 353, 71, 513, and 1,030 CpGs, respectively, with adjustment for chronologic age.

    Details of the epigenetic clock, DNA extraction/methylation experiments, and statistical analyses are provided in the supplementary methods.

  12. q

    Data from: Analyzing Original Experimental Data Through Peer Learning:...

    • qubeshub.org
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avital Swisa; Ella Tour; Ayelet Arbel-Eden* (2024). Analyzing Original Experimental Data Through Peer Learning: Cancer Research – From Animal Models to Clinical Trials [Dataset]. https://qubeshub.org/publications/5102
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    QUBES
    Authors
    Avital Swisa; Ella Tour; Ayelet Arbel-Eden*
    Description

    One of the challenges of teaching large-enrollment science courses is guiding students to interpret data presented in scientific articles and developing skills to understand research methods. Small-group learning promotes higher academic achievement in Science, Technology, Engineering, and Math (STEM) courses. The current lesson aims to implement small-group learning activities to facilitate analysis and interpretation of data from the scientific literature. This interactive activity focuses on personalized medicine, emphasizing cancer treatment and research—all the way from animal models to clinical trials. We harnessed these key topics in molecular biology to generate students’ collaborative work dissecting research data on the effect of a drug targeting HER2—a crucial biomarker and therapeutic target in breast cancer diagnostics. This approach can help students develop the skills needed to understand and analyze scientific data, communicate ideas, and ultimately promote higher academic achievement and persistence in STEM courses. In this activity, the class session begins with a task presented by the instructor, followed by group discussions of scientific data guided by open-ended questions. The activity ends with an all-class summary and post-lesson home assignment. Active learning is achieved through group discussions, peer-learning, and oral presentations of the data interpretation to the class. Our observations of the activity and students’ survey responses allow us to conclude that this activity effectively supports students in small and large classes in enhancing the skills of scientific data interpretation.

    Primary Image: Active learning and group discussion in class. Small groups brainstorm data interpretation in the Advanced Molecular Biology course.

  13. n

    Physician Data Query

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Physician Data Query [Dataset]. http://identifiers.org/RRID:SCR_006833
    Explore at:
    Dataset updated
    Jan 16, 2025
    Description

    NCI''s comprehensive cancer database that contains summaries on a wide range of cancer topics; a registry of 8,000+ open and 19,000+ closed cancer clinical trials from around the world; a directory of professionals who provide genetics services; the NCI Dictionary of Cancer Terms, with definitions for 6,800+ cancer and medical terms; and the NCI Drug Dictionary, which has information on 2,300+ agents used in the treatment of cancer or cancer-related conditions. The PDQ cancer information summaries are peer reviewed and updated monthly by six editorial boards comprised of specialists in adult treatment, pediatric treatment, supportive care, screening and prevention, genetics, and complementary and alternative medicine. The Boards review current literature from more than 70 biomedical journals, evaluate its relevance, and synthesize it into clear summaries. Many of the summaries are also available in Spanish.

  14. O

    ARCHIVED - 2022 Non-Communicable (Chronic) Diseases

    • data.sandiegocounty.gov
    csv, xlsx, xml
    Updated Aug 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County of San Diego (2025). ARCHIVED - 2022 Non-Communicable (Chronic) Diseases [Dataset]. https://data.sandiegocounty.gov/w/a6z3-qh6u/by4r-nr9x?cur=viKALNW-jic&from=rU8uUBkc0eO
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset authored and provided by
    County of San Diego
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Data by medical encounter for the following conditions by age, race/ethnicity, and sex (gender):

    Acute Myocardial Infarction (AMI) Asthma Bladder Cancer Brain Cancer Coronary Heart Disease (CHD) Colorectal Cancer Chronic Kidney Disease (CKD) Chronic Obstructive Pulmonary Disease (COPD)/Chronic Lower Respiratory Diseases Diabetes Female Breast Cancer Female Reproductive Cancer Heart Failure Hyperlipidemia (High Blood Cholesterol) Kidney Cancer Leukemia Liver Cancer Lung Cancer Lupus and Connective Tissue Disorders Melanoma of the Skin Non-Hodgkin's Lymphoma Non-melanoma Skin Cancer Overall Cancer Overall Heart Disease Overall Hypertensive Diseases Pancreatic Cancer Prostate Cancer Stroke Thyroid Cancer

    Rates per 100,000 population. Age-adjusted rates per 100,000 2000 US standard population. Blank Cells: Events less than 11 are suppressed. Starting with data year 2022, geographies with less than 20,000 population contain no age-adjusted rates and all rates based on events <20 are suppressed due to statistical instability. Rates not calculated in cases where zip code is unknown. SES: Is the median household income by Subregional Area (SRA) community. Data for SRA only.

    Data sources: California Department of Public Health, Center for Health Statistics, Office of Health Information and Research, Vital Records Business Intelligence System (VRBIS), 2022. California Department of Health Care Access and Information (HCAI), Emergency Department Discharge Database and Patient Discharge Database, 2022. SANDAG Population Estimates, 2022 (v11/23). 2022 population estimates were derived from the 2020 decennial census. Comparison of rates to prior years may not be appropriate. Prepared by: County of San Diego, Health and Human Services Agency, Public Health Services, Community Health Statistics Unit, May 2024.

    2022 Community Profile Data Guide and Data Dictionary Dashboard: https://public.tableau.com/app/profile/chsu/viz/2022COREDataGuideandDataDictionary/Home

  15. a

    Data from: breast ductal carcinoma

    • alliancegenome.org
    Updated Aug 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alliance of Genome Resources (2025). breast ductal carcinoma [Dataset]. http://identifiers.org/DOID:3007
    Explore at:
    Dataset updated
    Aug 26, 2025
    Dataset authored and provided by
    Alliance of Genome Resources
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A breast carcinoma that derives_from the lining of milk ducts. url:http://www.cancer.gov/dictionary?CdrID=45085

  16. c

    Time Walk Bike to Work

    • s.cnmilf.com
    • healthdata.gov
    • +4more
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2024). Time Walk Bike to Work [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/time-walk-bike-to-work-b2ed6
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California Department of Public Health
    Description

    This table contains data on the percent of population aged 16 years or older whose commute to work is 10 or more minutes/day by walking or biking for California, its regions, counties, and cities/towns. Data is from the U.S. Census Bureau, American Community Survey, and from the U.S. Department of Transportation, Federal Highway Administration, and National Household Travel Survey. The table is part of a series of indicators in the Healthy Communities Data and Indicators Project of the Office of Health Equity. Active modes of transport, bicycling and walking alone and in combination with public transit, offer opportunities to incorporate physical activity into the daily routine. Physical activity is associated with lowering rates of heart disease and stroke, diabetes, colon and breast cancer, dementia and depression. Automobile commuting is associated with health hazards, such as air pollution, motor vehicle crashes, pedestrian injuries and fatalities, and sedentary lifestyles. Consequently the transition from automobile-focused transport to public and active transport offers environmental health benefits, including reductions in air pollution, greenhouse gases and noise pollution, and may lead to greater overall safety in transportation. More information about the data table and a data dictionary can be found in the About/Attachments section.

  17. Z

    Preliminary Mitosis Detection Results for TCGA-BRCA Dataset

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jahanifar, Mostafa (2024). Preliminary Mitosis Detection Results for TCGA-BRCA Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_10245706
    Explore at:
    Dataset updated
    Feb 21, 2024
    Authors
    Jahanifar, Mostafa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides mitosis detection results employing the "Mitosis Detection, Fast and Slow" (MDFS) algorithm [[2208.12587] Mitosis Detection, Fast and Slow: Robust and Efficient Detection of Mitotic Figures (arxiv.org)] on the TCGA-BRCA dataset.

    The MDFS algorithm exemplifies a robust and efficient two-stage process for mitosis detection. Initially, potential mitotic figures are identified and later refined. The proposed model for the preliminary identification of candidates, the EUNet, stands out for its swift and accurate performance, largely due to its structural design. EUNet operates by outlining candidate areas at a lower resolution, significantly expediting the detection process. In the second phase, the initially identified candidates undergo further refinement using a more intricate classifier network, namely the EfficientNet-B7. The MDFS algorithm was originally developed for the MIDOG challenges.

    Viewing in QuPath

    The dataset at hand comprises GeoJSON files in two categories: mitosis and proxy (mimicker -- the candidates that are unlikely to be mitosis based on our algorithm). Users can open and visualize each category overlaid on the Whole Slide Image (WSI) using QuPath. Simply drag and drop the annotation file onto the opened image in the program. Additionally, users can employ the provided Python snippet to read the annotation into a Python dictionary or a Numpy array.

    Loading in Python

    To load the GeoJSON files in Python, users can use the following code:

    import json

    import numpy as np

    import pandas as pd

    def load_geojson(filename):

    # Load the GeoJSON file

    with open(filename, 'r') as f:

     data = json.load(f)
    

    # Extract the properties and store in a dictionary

    slide_properties = data["properties"]

    # Convert the points to a numpy array

    points_np = np.array([(feat['geometry']['coordinates'][0], feat['geometry']['coordinates'][1], feat['properties']['score']) for feat in data['features']])

    # Convert the points to a pandas DataFrame

    points_df = pd.DataFrame(points_np, columns=['x', 'y', 'score'])

    return slide_properties, points_np, points_df

    Use the function to load mitosis data

    mitosis_properties, mitosis_points_np, mitosis_points_df = load_geojson('mitosis.geojson')

    Use the function to load mimickers data

    mimickers_properties, mimickers_points_np, mimickers_points_df = load_geojson('mimickers.geojson')

    Properties

    Each WSI in the dataset includes the candidate's centroid, bounding box, hotspot location, hotspot mitotic count, and hotspot mitotic score. The structures of the mitosis and mimicker property dictionaries are as follows:

    Mitosis property dictionary structure:

    mitosis_properties = {

    'slide_id': slide_id,

    'slide_height': img_h,

    'slide_width': img_w,

    'wsi_mitosis_count': num_mitosis,

    'mitosis_threshold': 0.5,

    'hotspot_rect': {'x1': hotspot[0], 'y1': hotspot[1], 'x2': hotspot[2], 'y2': hotspot[3]},

    'hotspot_mitosis_count': mitosis_count,

    'hotspot_mitosis_score': mitosis_score,

    }

    Proxy figure (mimicker) property dictionary structure:

    mimicker_properties = {

    'slide_id': slide_id,

    'slide_height': img_h,

    'slide_width': img_w,

    'wsi_mimicker_count': num_mimicker,

    'mitosis_threshold': 0.5,

    }

    Disclaimer:

    It should be noted that we did not conduct a comprehensive review of all mitotic figures within each WSI, and we do not purport these to be free of errors. Nonetheless, a pathologist examined the resultant hotspot regions of interest from 757 WSIs within the TCGA-BRCA Mitosis Dataset where we found strong correlations between pathologist and MDFS mitotic counts (r=0.8, p$<$0.001). Furthermore, MDFS-derived mitosis scores are shown to be as prognostic as pathologist-assigned mitosis scores [1]. This examination was also aimed at verifying the quality of the selections, ensuring excessive false detections or artifacts did not primarily drive them and were in a plausible location in the tumor landscape.

    [1] Ibrahim, Asmaa, et al. "Artificial Intelligence-Based Mitosis Scoring in Breast Cancer: Clinical Application." Modern Pathology 37.3 (2024): 100416.

  18. f

    Summary of phase II/III double-blind, placebo-controlled trials evaluating...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roland Morley; Alison Cardenas; Peter Hawkins; Yasuyo Suzuki; Virginia Paton; See-Chun Phan; Mark Merchant; Jessie Hsu; Wei Yu; Qi Xia; Daniel Koralek; Patricia Luhn; Wassim Aldairy (2023). Summary of phase II/III double-blind, placebo-controlled trials evaluating onartuzumab in patients with solid tumors. [Dataset]. http://doi.org/10.1371/journal.pone.0139679.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Roland Morley; Alison Cardenas; Peter Hawkins; Yasuyo Suzuki; Virginia Paton; See-Chun Phan; Mark Merchant; Jessie Hsu; Wei Yu; Qi Xia; Daniel Koralek; Patricia Luhn; Wassim Aldairy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bev = bevacizumab, Erl = erlotinib, FOLFOX = oxaliplatin, 5-fluorouracil and folinic acid, GBM = glioblastoma, mCRC = metastatic colorectal cancer, NSCLC = non-small cell lung cancer, Ona = onartuzumab, Pac = paclitaxel, Pbo = placebo, Pem = pemetrexed, Plat = carboplatin or cisplatin, TNBC = triple negative metastatic breast cancer. Data cut-off dates: GO27819: 7 Nov 2013; GO27820: 9 January 2014; GO27821 (Cohort 1): 31 October 2013; GO27821 (Cohort 2): 9 September 2013; GO27827: 6 Feb 2014; OAM4861g: 22 March 2014; OAM4971g: 26 October 2013; YO28252: 29 Jan 2014.Summary of phase II/III double-blind, placebo-controlled trials evaluating onartuzumab in patients with solid tumors.

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Igor Makhlin; Nicholas McAndrew; E. Paul Wileyto; Amy S. Clark; Robin Holmes; Lisa N. Bottalico; Clementina Mesaros; Ian A. Blair; Grace Jeschke; Kevin Fox; Susan M. Domchek; Jennifer Matro; Angela R. Bradbury; Michael Feldman; Elizabeth O. Hexner; Jacqueline F. Bromberg; Angela DeMichele (2022). JAKEE Data Dictionary.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.19783000.v1
Organization logoOrganization logo

JAKEE Data Dictionary.xlsx

Explore at:
xlsxAvailable download formats
Dataset updated
May 19, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Igor Makhlin; Nicholas McAndrew; E. Paul Wileyto; Amy S. Clark; Robin Holmes; Lisa N. Bottalico; Clementina Mesaros; Ian A. Blair; Grace Jeschke; Kevin Fox; Susan M. Domchek; Jennifer Matro; Angela R. Bradbury; Michael Feldman; Elizabeth O. Hexner; Jacqueline F. Bromberg; Angela DeMichele
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data Dictionary for accompanying de-identified JAKEE STATA data set.

Search
Clear search
Close search
Google apps
Main menu