CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Breast density is a radiologic feature that reflects fibroglandular tissue content relative to breast area or volume, and it is a breast cancer risk factor.
This study employed deep learning approaches to identify histologic correlates in radiologically-guided biopsies that may underlie breast density and distinguish cancer among women with elevated and low density.
Data access: Datasets supporting figure 2, tables 2 and 3 and supplementary table 2 of the published article are publicly available in the figshare repository, as part of this data record (https://doi.org/10.6084/m9.figshare.9786152). These datasets are contained in the zip file NPJ FigShare.zip. Datasets supporting figure 3, table 1 and supplementary table 1 of the published article are not publicly available to protect patient privacy, but can be made available on request from Dr. Gretchen L. Gierach, Senior Investigator, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA, email address: gierachg@mail.nih.gov.
Study description and aims: The study aimed to identify tissue correlates of breast density that may be important for distinguishing malignant from benign biopsy diagnoses separately among women with high and low breast density, to help inform cancer risk stratification among women undergoing a biopsy following an abnormal mammogram.
Haematoxylin and eosin (H&E)-stained digitized images from image-guided breast biopsies (n=852 patients) were evaluated. Breast density was assessed as global and localized fibroglandular volume (%). A convolutional neural network characterized H&E composition. 37 features were extracted from the network output, describing tissue quantities and morphological structure. A random forest regression model was trained to identify correlates most predictive of fibroglandular volume (n=588). Correlations between predicted and radiologically quantified fibroglandular volume were assessed in 264 independent patients. A second random forest classifier was trained to predict diagnosis (invasive vs. benign); performance was assessed using area under receiver-operating characteristics curves (AUC). For more details on the methodology please see the published article.
Study approval: The Institutional Review Boards at the NCI and the University of Vermont approved the protocol for this project for either active consenting or a waiver of consent to enrol participants, link data and perform analytical studies.
Dataset descriptions:
Data supporting figure 2: Datasets Figure 2A H&E.jpg, Figure 2A Mammogram.jpg, Figure 2B H&E.jpg and Figure 2B Mammogram.jpg are in .jpg file format and consist of histological whole slide H&E images and corresponding full-field digital mammograms from patients whose biopsies yielded diagnoses of atypical ductal hyperplasia and invasive carcinoma.
Data supporting figure 3: Dataset Figure 3.xls is in .xls file format and contains raw data used to generate the Receiver Operating Characteristic (ROC) curves for the prediction of invasive cancer among women with high percent global fibroglandular volume, low percent global fibroglandular volume, high percent localized fibroglandular volume and low percent localized fibroglandular volume.
Data supporting table 1: Dataset Table1_analysis.sas7bdat is in SAS file format and contains the characteristics of study participants in the BREAST Stamp Project, who were referred for an image-guided breast biopsy, stratified by the training and testing sets (n = 852).
Data supporting table 2: Datasets Global FGV.xls (accompanying Global FGV.png file) and Localized FGV.xls (accompanying Localized FGV.png file) are in .xls file format and the accompanying files are in .png file format. The data contain histologic features identified in the random forest model for the prediction of global and localized % fibroglandular volume.
Data supporting table 3: Datasets HighGlobal_feature_importance.xls, HighGlobal_feature_importance.pdf, HighLocal_feature_importance.xls, HighLocal_feature_importance.pdf, LowGlobal_feature_importance.xls, LowGlobal_feature_importance.pdf, LowLocal_feature_importance.xls, LowLocal_feature_importance.pdf are in .xls file format. The accompanying figures generated from the data in the .xls files are in .pdf file format. These files contain histologic features identified in the random forest model for the prediction of invasive cancer status among women with high vs. low % fibroglandular volume.
Data supporting supplementary table 1: Datasets testfeatures.xls and trainfeatures.xls are in .xls file format and include the distribution and description of the 37 histologic features extracted from the convolutional neural network deep learning output in the H&E stained whole slide images from the training and testing sets.
Data supporting supplementary table 2: Datasets All_samples_global.xls, All_samples_global.png, All_samples_local.xls, All_samples_local.png, PostMeno_global.xls, PostMeno_global.png, PostMeno_local.xls, PostMeno_local.png, PreMeno_global.xls, PreMeno_global.png, PreMeno_local.xls, PreMeno_local.png are in .xls file format. The accompanying figures generated from the data in the .xls files are in .png file format. These data include the histologic features identified in the random forest model that included BMI for the prediction of global and localized % fibroglandular volume.Software needed to access the data: Data files in SAS file format require the SAS software to be accessed.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Note: This version supersedes version 1: https://doi.org/10.15482/USDA.ADC/1522654. In Fall of 2019 the USDA Food and Nutrition Service (FNS) conducted the third Farm to School Census. The 2019 Census was sent via email to 18,832 school food authorities (SFAs) including all public, private, and charter SFAs, as well as residential care institutions, participating in the National School Lunch Program. The questionnaire collected data on local food purchasing, edible school gardens, other farm to school activities and policies, and evidence of economic and nutritional impacts of participating in farm to school activities. A total of 12,634 SFAs completed usable responses to the 2019 Census. Version 2 adds the weight variable, “nrweight”, which is the Non-response weight. Processing methods and equipment used The 2019 Census was administered solely via the web. The study team cleaned the raw data to ensure the data were as correct, complete, and consistent as possible. This process involved examining the data for logical errors, contacting SFAs and consulting official records to update some implausible values, and setting the remaining implausible values to missing. The study team linked the 2019 Census data to information from the National Center of Education Statistics (NCES) Common Core of Data (CCD). Records from the CCD were used to construct a measure of urbanicity, which classifies the area in which schools are located. Study date(s) and duration Data collection occurred from September 9 to December 31, 2019. Questions asked about activities prior to, during and after SY 2018-19. The 2019 Census asked SFAs whether they currently participated in, had ever participated in or planned to participate in any of 30 farm to school activities. An SFA that participated in any of the defined activities in the 2018-19 school year received further questions. Study spatial scale (size of replicates and spatial scale of study area) Respondents to the survey included SFAs from all 50 States as well as American Samoa, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and Washington, DC. Level of true replication Unknown Sampling precision (within-replicate sampling or pseudoreplication) No sampling was involved in the collection of this data. Level of subsampling (number and repeat or within-replicate sampling) No sampling was involved in the collection of this data. Study design (before–after, control–impacts, time series, before–after-control–impacts) None – Non-experimental Description of any data manipulation, modeling, or statistical analysis undertaken Each entry in the dataset contains SFA-level responses to the Census questionnaire for SFAs that responded. This file includes information from only SFAs that clicked “Submit” on the questionnaire. (The dataset used to create the 2019 Farm to School Census Report includes additional SFAs that answered enough questions for their response to be considered usable.) In addition, the file contains constructed variables used for analytic purposes. The file does not include weights created to produce national estimates for the 2019 Farm to School Census Report. The dataset identified SFAs, but to protect individual privacy the file does not include any information for the individual who completed the questionnaire. Description of any gaps in the data or other limiting factors See the full 2019 Farm to School Census Report [https://www.fns.usda.gov/cfs/farm-school-census-and-comprehensive-review] for a detailed explanation of the study’s limitations. Outcome measurement methods and equipment used None Resources in this dataset:Resource Title: 2019 Farm to School Codebook with Weights. File Name: Codebook_Update_02SEP21.xlsxResource Description: 2019 Farm to School Codebook with WeightsResource Title: 2019 Farm to School Data with Weights CSV. File Name: census2019_public_use_with_weight.csvResource Description: 2019 Farm to School Data with Weights CSVResource Title: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets. File Name: Farm_to_School_Data_AgDataCommons_SAS_SPSS_R_STATA_with_weight.zipResource Description: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Breast density is a radiologic feature that reflects fibroglandular tissue content relative to breast area or volume, and it is a breast cancer risk factor.
This study employed deep learning approaches to identify histologic correlates in radiologically-guided biopsies that may underlie breast density and distinguish cancer among women with elevated and low density.
Data access: Datasets supporting figure 2, tables 2 and 3 and supplementary table 2 of the published article are publicly available in the figshare repository, as part of this data record (https://doi.org/10.6084/m9.figshare.9786152). These datasets are contained in the zip file NPJ FigShare.zip. Datasets supporting figure 3, table 1 and supplementary table 1 of the published article are not publicly available to protect patient privacy, but can be made available on request from Dr. Gretchen L. Gierach, Senior Investigator, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA, email address: gierachg@mail.nih.gov.
Study description and aims: The study aimed to identify tissue correlates of breast density that may be important for distinguishing malignant from benign biopsy diagnoses separately among women with high and low breast density, to help inform cancer risk stratification among women undergoing a biopsy following an abnormal mammogram.
Haematoxylin and eosin (H&E)-stained digitized images from image-guided breast biopsies (n=852 patients) were evaluated. Breast density was assessed as global and localized fibroglandular volume (%). A convolutional neural network characterized H&E composition. 37 features were extracted from the network output, describing tissue quantities and morphological structure. A random forest regression model was trained to identify correlates most predictive of fibroglandular volume (n=588). Correlations between predicted and radiologically quantified fibroglandular volume were assessed in 264 independent patients. A second random forest classifier was trained to predict diagnosis (invasive vs. benign); performance was assessed using area under receiver-operating characteristics curves (AUC). For more details on the methodology please see the published article.
Study approval: The Institutional Review Boards at the NCI and the University of Vermont approved the protocol for this project for either active consenting or a waiver of consent to enrol participants, link data and perform analytical studies.
Dataset descriptions:
Data supporting figure 2: Datasets Figure 2A H&E.jpg, Figure 2A Mammogram.jpg, Figure 2B H&E.jpg and Figure 2B Mammogram.jpg are in .jpg file format and consist of histological whole slide H&E images and corresponding full-field digital mammograms from patients whose biopsies yielded diagnoses of atypical ductal hyperplasia and invasive carcinoma.
Data supporting figure 3: Dataset Figure 3.xls is in .xls file format and contains raw data used to generate the Receiver Operating Characteristic (ROC) curves for the prediction of invasive cancer among women with high percent global fibroglandular volume, low percent global fibroglandular volume, high percent localized fibroglandular volume and low percent localized fibroglandular volume.
Data supporting table 1: Dataset Table1_analysis.sas7bdat is in SAS file format and contains the characteristics of study participants in the BREAST Stamp Project, who were referred for an image-guided breast biopsy, stratified by the training and testing sets (n = 852).
Data supporting table 2: Datasets Global FGV.xls (accompanying Global FGV.png file) and Localized FGV.xls (accompanying Localized FGV.png file) are in .xls file format and the accompanying files are in .png file format. The data contain histologic features identified in the random forest model for the prediction of global and localized % fibroglandular volume.
Data supporting table 3: Datasets HighGlobal_feature_importance.xls, HighGlobal_feature_importance.pdf, HighLocal_feature_importance.xls, HighLocal_feature_importance.pdf, LowGlobal_feature_importance.xls, LowGlobal_feature_importance.pdf, LowLocal_feature_importance.xls, LowLocal_feature_importance.pdf are in .xls file format. The accompanying figures generated from the data in the .xls files are in .pdf file format. These files contain histologic features identified in the random forest model for the prediction of invasive cancer status among women with high vs. low % fibroglandular volume.
Data supporting supplementary table 1: Datasets testfeatures.xls and trainfeatures.xls are in .xls file format and include the distribution and description of the 37 histologic features extracted from the convolutional neural network deep learning output in the H&E stained whole slide images from the training and testing sets.
Data supporting supplementary table 2: Datasets All_samples_global.xls, All_samples_global.png, All_samples_local.xls, All_samples_local.png, PostMeno_global.xls, PostMeno_global.png, PostMeno_local.xls, PostMeno_local.png, PreMeno_global.xls, PreMeno_global.png, PreMeno_local.xls, PreMeno_local.png are in .xls file format. The accompanying figures generated from the data in the .xls files are in .png file format. These data include the histologic features identified in the random forest model that included BMI for the prediction of global and localized % fibroglandular volume.Software needed to access the data: Data files in SAS file format require the SAS software to be accessed.