Companion dataset of the manuscript: Seonyeong Park, Umberto Villa, Fu Li, Refik Mert Cam, Alexander A. Oraevsky, Mark A. Anastasio, "Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer," J. Biomed. Opt. 28(6) 066002 (20 June 2023) https://doi.org/10.1117/1.JBO.28.6.066002 This dataset contains 40 sets of three-dimensional (3D) numerical breast phantoms (NBPs) for use in virtual imaging trials (VITs) of optoacoustic tomography (OAT) and the corresponding simulated multi-wavelength optical fluence distributions, induced initial pressure distributions, and OAT measurement data. The NBPs are in natural shapes, and four different-sized lesions, that are composed solely of a viable tumor cell region, were inserted into each of 40 NBPs. The NBPs correspond to one of the following four breast density types defined in Breast Imaging Reporting and Data System (BI-RAD®): A: Breast is almost entirely fatty; B: Breast has scattered areas of fibroglandular density; C: Breast is heterogeneously dense; D: Breast is extremely dense. Each NBP set consists of A tissue label map (anatomical NBP); Functional properties and chromophore concentrations maps (functional NBPs); Optical properties maps (optical absorption and scattering coefficients, scattering anisotropy, and refractive indexes) at multiple illumination wavelengths (optical NBPs); Acoustic properties (speed of sound, density, and acoustic attenuation) maps (acoustic NBPs); Simulated multi-wavelength optical fluence distributions; Simulated multi-wavelength induced initial pressure distributions; Simulated multi-wavelength acoustic measurements. The tissue label map (anatomical NBP) was created by use of our adaptation of tools from the Virtual Imaging Clinical Trials for Regulatory Evaluation (VICTRE) project at the U.S. Food and Drug Administration (FDA) and our python library to introduce blood vasculature under the skin layer. In each NBP, four different-sized (<10 mm in diameter) numerical lesion phantoms (NLPs) were inserted at locations randomly selected from those predicted by the VICTRE tools based on the duct and terminal duct lobular unit (TDLU) structures that are well-known sites for lesion formation. The considered tissue types and their unsigned 8-bit integer (uint8) labels are as belows. Background: 0 Fat: 1 Skin: 2 Glandular: 29 Nipple: 33 Ligament: 88 Terminal duct lobular unit (TDLU): 95 Duct: 125 Artery: 150 Vein: 225 Peripheral angiogenesis: 190 Viable tumor cell: 200 Necrotic core: 210 The functional, optical, and acoustic properties maps were produced employing our python libraries to assign corresponding properties to each breast tissue type. The optical fluence distributions were simulated using the MCX software [Fang2009], [Yu2018]. The induced initial pressure distributions were calculated via voxelwise multiplication of the optical absorption coefficient distributions and the simulated optical fluence distributions, assuming a Grüneisen Γ=1 as commonly done as constant for soft tissues. The acoustic measurements were simulated employing the k-wave software [Treeby2010]. Further details on 1) modifications and adaptations of the VICTRE NBPs for use in VITs of OAT; 2) virtual OAT imaging system and data acquisition are in the accompanying paper [Park2023]. The file naming convention of files in this dataset is {breast type}{seed number}{lesion presence}_{contained data}.mat, where: Breast type: A, B, C, or D; Seed number, i.e., a number randomly generated when producing each tissue label map; Lesion presence: absent (healthy, h) or present (l); Contained data: tissue label map (label), functional properties and chromophore concentrations maps (func), optical properties maps (opt), acoustic properties maps(acou), simulated optical fluence distributions (phi), simulated induced initial pressure distributions (p0), or simulated acoustic measurements (p). For example, the file name of a tissue label map of the type A breast with no lesion inserted (healthy breast) that was created using the seed number 123456 is A12345678h_label.mat. The actual data and metadata contained in each file are as below. {breast type}{seed number}{lesion presence}_label.mat Data label: 1360 x 1360 x 680 unit8 tissue label map that includes healthy tissues, viable tumor cell region, and necrotic core Metadata origin: 3 x 1 float32 values that specify x, y, and z coordinates of origin, (-85, -85, -85) mm voxel_size: float32 value that specifies voxel size, 0.125 mm tissue_type_label: 13 x 2 cell that specifies tissue type names and the corresponding uint8 values in label breast_type: Breast type, A: Breast is almost entirely fatty; B: Breast has scattered areas of fibroglandular density; C: Breast is heterogeneously dense; or D: Breast is extremely dense lesion_presence: Lesion presence, h: Lesion-absent numerical breast phantom (healthy) or l: Lesion-present...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of ultrafast ultrasound acquisitions from nine volunteers and the CIRS 054G phantom. For a comprehensive understanding of the dataset, please refer to the paper: Viñals, R.; Thiran, J.-P. A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning. J. Imaging 2023, 9, 256. https://doi.org/10.3390/jimaging9120256. Please cite the original paper when using this dataset.
Due to data size restriction, the dataset has been divided into six subdatasets, each one published into a separate entry in Zenodo. This repository contains subdataset 3.
Number of Acquisitions: 20,000
Volunteers: Nine volunteers
File Structure: Each volunteer's data is compressed in a separate zip file.
Regions :
File Naming Convention: Incremental IDs from acquisition_00000 to acquisition_19999.
Two CSV files are provided:
invivo_dataset.csv :
invitro_dataset.csv :
The dataset has been divided into six subdatasets, each one published in a separate entry on Zenodo. The following table indicates, for each file or compressed folder, the Zenodo dataset split where it has been uploaded along with its size. Each dataset split is named "A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning: Dataset (ii/6)", where ii represents the split number. This repository contains the 3rd split.
File name | Size | Zenodo subdataset number |
invivo_dataset.csv | 995.9 kB | 1 |
invitro_dataset.csv | 1.1 kB | 1 |
cirs-phantom.zip | 418.2 MB | 1 |
volunteer-1-lowerLimbs.zip | 29.7 GB | 1 |
volunteer-1-carotids.zip | 8.8 GB | 1 |
volunteer-1-back.zip | 7.1 GB | 1 |
volunteer-1-abdomen.zip | 34.0 GB | 2 |
volunteer-1-breast.zip | 15.7 GB | 2 |
volunteer-1-upperLimbs.zip | 25.0 GB | 3 |
volunteer-2.zip | 26.5 GB | 4 |
volunteer-3.zip | 20.3 GB | 3 |
volunteer-4.zip | 24.1 GB | 5 |
volunteer-5.zip | 6.5 GB | 5 |
volunteer-6.zip | 11.5 GB | 5 |
volunteer-7.zip | 11.1 GB | 6 |
volunteer-8.zip | 21.2 GB | 6 |
volunteer-9.zip | 23.2 GB | 4 |
Beamforming:
Depth from 1 mm to 55 mm
Width spanning the probe aperture
Grid: 𝜆/8 × 𝜆/8
Resulting images shape: 1483 × 1189
Two beamformed RF images from each acquisition:
Normalization:
To display the images:
File Format: Saved in npy format, loadable using Python and numpy.load(file)
.
For the volunteer-based split used in the paper:
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Please cite the original paper when using this dataset :
Viñals, R.; Thiran, J.-P. A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning. J. Imaging 2023, 9, 256. DOI: 10.3390/jimaging9120256
For inquiries or issues related to this dataset, please contact:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The clinical, radiomics, adn so on.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of 15 image features that were selected in leave-one-out training and testing cycles to test 36 cases in the validation dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
References - Li, Fu, Umberto Villa, Seonyeong Park, and Mark A. Anastasio. "3-D stochastic numerical breast phantoms for enabling virtual imaging trials of ultrasound computed tomography." IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 69, no. 1 (2021): 135-146. DOI: 10.1109/TUFFC.2021.3112544 - Li, Fu; Villa, Umberto; Park, Seonyeong; Anastasio, Mark, 2021, "2D Acoustic Numerical Breast Phantoms and USCT Measurement Data", https://doi.org/10.7910/DVN/CUFVKE, Harvard Dataverse, V1 Overview - This dataset includes 1,089 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for ultrasound computed tomography (USCT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in USCT studies are described in the publication cited above. - The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories: > Type A - The breast is almost entirely fatty > Type B - There are scattered areas of fibroglandular density in the breast > Type C - The breast is heterogeneously dense > Type D - The breast is extremely dense - Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom. File Name Format - Each data file is stored as an HDF5 .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID. File Contents - Each file contains the following variables: > "type": Breast type > "sos": Speed-of-sound map [mm/μs] > "den": Ambient density map [kg/mm³] > "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm] > "y": power-law exponent > "label": Tissue label map. Tissue types are denoted using the following labels: water (0), fat (1), skin (2), glandular tissue (29), ligament (88), lesion (200). - All spatial maps ("sos", "den", "att", and "label") have the same spatial dimensions of 2560 x 2560 pixels, with a pixel size of 0.1 mm x 0.1 mm. - "sos", "den", and "att" are float32 arrays, and "label" is an 8-bit unsigned integer array.
Dataset Summary This is the companion dataset for the manuscript by Fu Li, Umberto Villa, Neb Duric, and Mark A. Anastasio titled "A forward model incorporating elevation-focused transducer properties for 3D full-waveform inversion in ultrasound computed tomography". IEEE UFFC, link (2023). The dataset comprises two types of thin-slab 3D numerical breast phantoms (NBPs) and their corresponding ring-array ultrasound computed tomography (USCT) simulated measurement data. Two representative ACR BI-RADS breast composition types are selected: B: Scattered areas of fibroglandular density C: Heterogeneously dense breast Each NBP contains maps of acoustic properties (speed of sound, acoustic attenuation, and density). The corresponding ring-array USCT measurement data were simulated using the elevation-focused transducer described in the accompanying paper. Wave propagation simulations were conducted in lossy heterogeneous media using k-wave. Dataset Structure The dataset is organized into three folders: Dataset folder for phantom type B Dataset folder for phantom type C The imaging_system folder, which contains details about the 3D imaging system, including the excitation source, transducer coordinates, and lens thickness. B:/C: This dataset folder houses three sets of ring-array data, corresponding to acquisitions at three distinct elevation locations. As described in the manuscript, these locations are: ring -1 at z=-1.8mm, ring 0 at z=0mm, and ring 1 at z=1.8mm. The employed NBP and the associated simulated measurement data for each ring are provided. Specifically, the data files include phantom.mat: This .mat file contains three acoustic media: sos: Speed of sound map (m/s) with dimensions 1280,1280,194 and a 0.2 mm pixel size. Data type is float32. aa: Acoustic attenuation map (Np/m/MHz) with dimensions 1280,1280,194 and a 0.2 mm pixel size. Data type is float32. density: Density map (kg/mm³) with dimensions 1280,1280,194 and a 0.2 mm pixel size. Data type is float32. Note: These data are used to generate the corresponding acoustic media for each ring simulation by executing the Matlab script generate_slab.m. See the description of generate_slab.m for more details. data_ring_{i}.h5: Measurement data of the i-th ring (for i = -1, 0, 1) . Measurement is stored with an hdf5 key that matches the transducer index. This is a two-dimensional array of size [1024,4200], where rows denote the receiver index and columns indicate the time sample. The sampling frequency is 25MHz. imaging_system: source300.mat: Time profile of the excitation pulse. Comprises 300 time samples with a sampling frequency of 25MHz. receiver_locations_1024.mat: XY coordinates (in mm) of the location for each receiver transducer in the imaging plane. Data type is float32 with an array size of [2x1024]. emitter_locations_128.mat: XY coordinates (in mm) of the location for each emitter transducer in the imaging plane. Data type is float32 with an array size of [2x128]. lens_thickness.mat: The thickness (in mm) of the lens model with size [1x90]. Each element denotes the thickness of each segment (voxel) of the line aperture, from top to bottom. Additionally, the dataset includes helper MATLAB scripts: read_data.m: A function to aid in loading and visualizing the excitation source, transducer locations, and hdf5 measurement data files. generate_slab.m: A script to extract thin-slab from phantom.mat, the extracted medium is the used as computational domain for each ring-array simulation (for ring -1,0,1). Specifically, acoustic media were extracted from the shared phantom at different elevation locations. Two adjacent slabs are shifted vertically by 1.8 mm. The imaging plane for ring i corresponds to the central z-slice of each computational domain. The positions of each ring-array thin-slab are illustrated in the following figure. This function will generate data files named phantom_ring_{i}.mat, where i=-1, 0, 1. These correspond to the different ring-array simulation grids used to produce the shared three ring-array measurement data. For more details on the NBP generation, please refer to li2021uffc (2021). The anatomical structures of these NBPs were derived using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transmission ultrasound data simulated using the k-Wave toolbox as a benchmark for biomedical quantitative ultrasound tomography using a ray approximation to Green's function
The folder ‘’simulation’’ includes the transmission ultrasound data sets used in the project:https://github.com/Ash1362/ray-based-quantitative-ultrasound-tomography. In the Github link, the associated project can be found in the branch master in the folder r-Wave #V1.1. (The folder ‘’data_ust_kWave_transmission.zip’’ is deprecated.)
...........................................................................................
The ultrasound data were simulated using the k-Wave toolbox (version 1.3.) [5] and using a digital breast phantom [4]. In k-Wave version 1.4., no changes have been reported that affects the simulations. The simulations were done assuming isotropic point sources.
The folder ‘’simulation’’ must be added to the path:
''…r-Wave/data/simulation/…''
For running the Matlab example scripts in the project in the github, the user has two choices:
Simulate the k-Wave ultrasound data by setting data_sim=true; in the examples in the project.
Upload the already simulated k-Wave ultrasound data according to the description below and load them by setting data_sim=false; in the examples in the project.
Please read the description in the example scripts!
…………………………………………………………………………………
The folder simulation includes 2 subfolders, ‘’phantom’’ and ‘’data_ust_kWave_transmission’’.
1) The subfolder ‘’simulation/phantom’’ includes ‘’OA-BREAST’’.
In the project: https://anastasio.bioengineering.illinois.edu/downloadable-content/oa-breast-database/,
the user must upload the folder ‘’Neg_47_Left’’ , and add it as ‘’r-wave/data/simulation/phantom/OA-BREAST/Neg_47_Left/’’.
.......................................................................................................................................................................
2) The subfolder ‘’simulation/data_ust_kWave_transmission’’ includes 2 subfolders, ‘’2D’’ and ‘’3D’’ .
The subfolder ‘’2D’’ includes:
data_ust_kWave_transmission/2D/PulsePammoth_1_dx4_cfl1_Nr256_Ne64_Interpoffgrid_Transgeompoint_Absorption1_CodeMatlab/data4_sphere_nonsmooth.mat
Two transmission ultrasound data sets were simulated using the k-wave for only water and breast in water according to section ‘’6.1. data simulation’’ in [1]. 64 emitters and 256 receivers are simulated as off-grid points which are placed on a 2D circular ring. (The characters ‘’_sphere_’’ are added to indicate that the transducers are placed on a ring.) To simulate the data, each emitter was individually driven by an excitation pulse, and the induced acoustic pressure time series were recorded on all the receivers. The k-Wave simulation was performed on a grid with grid spacing 0.4 mm, and the time spacing was set using a CFL number 0.1. The acoustic absorption and dispersion were accounted for based on the frequency power law. This data set is used for the purpose of image reconstruction, and therefore, the sound speed and absorption coefficients maps are not smoothed, i.e., the original maps are used for simulations. This data set can be used for image reconstruction using the time-of-flight-based approach and then the Green's approach.
data_ust_kWave_transmission/2D/PulsePammoth_1_dx4_cfl1_Nr256_Ne64_Interpoffgrid_Transgeompoint_Absorption1_CodeMatlab/data4_plane_nonsmooth.mat
Two transmission ultrasound data sets were simulated using the k-wave for only water and breast in water. 64 emitters and 256 receivers are simulated as off-grid points which are placed on 16 planar arrays which are all aligned with a circle. Each planar array includes 4 emitters and 16 receivers. Therefore, in contrast with the data mentioned above, the ray linking is performed using the line equations defining the 2D geometry of the linear arrays. (The characters ‘’_plane_’’ are added to indicate that the transducers are placed on line.) To simulate the data, each emitter was individually driven by an excitation pulse, and the induced acoustic pressure time series were recorded on all the receivers. The k-Wave simulation was performed on a grid with grid spacing 0.4 mm, and the time spacing was set using a CFL number 0.1. The acoustic absorption and dispersion were accounted for based on the frequency power law. This data set is used for the purpose of image reconstruction, and therefore, the sound speed and absorption coefficients maps are not smoothed, i.e., the original maps are used for simulations. This data set can be used for image reconstruction using the time-of-flight-based approach, but ahs not been extended to the Green's approach yet. The image reconstruction should be slower than the circular array. the reason is for circular array, for each emitter, the raylinking problem is solved for all receivers once using the equation of circle. However, for this data set, for each emitter, the ray linking problem is solved for each receiver array separately, because receiver arrays are defined with different line equations.
data_ust_kWave_transmission/2D/PulsePammoth_1_dx4_cfl1_Nr256_Ne64_Interpoffgrid_Transgeompoint_Absorption1_CodeMatlab/data4_sphere_smooth_17_1.mat
Two transmission ultrasound data sets were simulated using the k-Wave for only water and breast in water as the benchmark for validation of ray approximation to Green’s function in homogeneous and heterogenous media, respectively. The simulation was performed according to section ‘’6.2. Numerical validation of the ray approximation to the Green’s function’’ in [1].
64 emitters and 256 receivers are simulated as off-grid points which are placed on a 2D circular ring. (The characters ‘’_sphere_’’ are added to indicate that the transducers are placed on a ring.) The pressure field was produced by emitter 1 (of the 64 emitters) and was recorded in time on all 256 receivers. The k-Wave simulation was performed on a grid with grid spacing 0.4 mm, and the time spacing was set using a CFL number 0.1. The acoustic absorption and dispersion were accounted for based on the frequency power law. The sound speed and absorption coefficient maps were smoothed by an averaging window of size 17 grid points. This data set is used as the benchmark for measuring accuracy of ray approximation to Green’s function for computing phase and amplitude of the pressure field on the receivers.
data_ust_kWave_transmission/2D/PulsePammoth_1_dx4_cfl1_Nr256_Ne64_Interpoffgrid_Transgeompoint_Absorption1_CodeMatlab/data4_sphere_smooth_17_20.mat
This data set is the same as data4_smooth_17_1 except the pressure field is produced by emitter 20.
………………………………………………………………………………………………………………….
The subfolder ‘’3D’’ includes:
data_ust_kWave_transmission/3D/PulsePammoth_1_dx5_cfl1_Nr4096_Ne1024_Interpnearest_Transgeompoint_Absorption0_CodeCUDA/data5_sphere_nonsmooth_tof_singram.mat
The discrepancy of time-of-flight data for two transmission ultrasound data sets simulated by the k-wave for breast in water and only water according to section 5.2 in [3]. The pressure fields were produced by 1024 emitters separately and were recorded on 4096 receivers. The emitters and receivers were simulated as points which are placed on a 3D hemispherical surface, and are interpolated onto the grid using a neighboring interpolation. The k-Wave simulations were performed on a grid with grid spacing 0.5 mm, and the time spacing was set using a CFL number 0.1. The time-of-flight data were computed and will be used for a refraction-corrected image reconstruction of the sound speed based on the inversion approach proposed in [3].
References
1 - A. Javaherian, ❝Hessian-inversion-free ray-born inversion for high-resolution quantitative ultrasound tomography❞, 2022, https://arxiv.org/abs/2211.00316/ .
2 - A. Javaherian and B. Cox, ❝Ray-based inversion accounting for scattering for biomedical ultrasound tomography❞, Inverse Problems vol. 37, no.11, 115003, 2021. https://iopscience.iop.org/article/10.1088/1361-6420/ac28ed/
3- A. Javaherian, F. Lucka and B. T. Cox, ❝Refraction-corrected ray-based inversion for three-dimensional ultrasound tomography of the breast❞, Inverse Problems, 36 125010. https://iopscience.iop.org/article/10.1088/1361-6420/abc0fc/
4- Y. Lou, W. Zhou, T. P. Matthews, C. M. Appleton and M. A. Anastasio, ❝Generation of anatomically realistic numerical phantoms for photoacoustic and ultrasonic breast imaging❞, J. Biomed. Opt., vol. 22, no. 4, pp. 041015, 2017. https://anastasio.bioengineering.illinois.edu/downloadable-content/oa-breast-database/
5 - B. E. Treeby and B. T. Cox, ❝k-Wave: MATLAB toolbox for the simulation and reconstruction of photoacoustic wave fields❞, J. Biomed. Opt. vol. 15, no. 2, 021314, 2010. http://www.k-wave.org/
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The aim of this study was to assess the acceptability and feasibility of offering risk-based breast cancer screening and its integration into regular clinical practice. A single-arm proof-of-concept trial was conducted with a sample of 387 women aged 40–50 years residing in the city of Lleida (Spain). The study intervention consisted of breast cancer risk estimation, risk communication and screening recommendations, and a follow-up. A polygenic risk score with 83 single nucleotide polymorphisms was used to update the Breast Cancer Surveillance Consortium risk model and estimate the 5-year absolute risk of breast cancer. The women expressed a positive attitude towards varying the frequency of breast screening according to individual risk and, especially, more frequently inviting women at higher-than-average risk. A lower intensity screening for women at lower risk was not as welcome, although half of the participants would accept it. Knowledge of the benefits and harms of breast screening was low, especially with regard to false positives and overdiagnosis. The women expressed a high understanding of individual risk and screening recommendations. The participants’ intention to participate in risk-based screening and satisfaction at 1-year were very high. Methods From January 2019 to February 2021, 387 women aged 40 to 50 years were enrolled in the study. Potential participants were the 2038 women living in the “Primer de Maig” Basic Health Area in Lleida, Catalonia, on 31 December 2018, who would have turned between 40 to 50 years of age during the following 1.5 years. Accrual was suspended because of the COVID-19 pandemic in March 2020 when 252 women had been included and resumed in October 2020. All women who turned 50 during the study period would have received the first invitation to participate in the population-based Breast Cancer Early Detection Program. Instead, they were invited to participate in our study. Women that declined were invited by the early detection program. From women that turned 40 to 49 years during the study period, random samples of 20 to 50 women were selected from the potential participants on a monthly basis, and the women were invited to participate until the accrual goal was achieved. Exclusion criteria included having a previous diagnosis of breast cancer, undergoing a current breast study, or fulfilling clinical criteria for cancer-related genetic counseling. We also excluded women not understanding or speaking Catalan or Spanish or those with a physical or cognitive disability that prevented breast screening or the main outcome’s assessment. The study intervention consisted of a baseline visit, the breast cancer risk estimation, a visit for risk communication and screening recommendations, the administration of a follow-up questionnaire, and a phone call to assess satisfaction after one year. The baseline visit was held at the Primary Care center, where the healthcare professional provided information about the study objectives; facilitated an informative brochure about the benefits and adverse effects of breast cancer screening; obtained information on sociodemographic variables, risk factors, previous screening experience, perceived personal risk of breast cancer, and general screening knowledge, attitudes, and intentions; obtained a saliva sample to determine the genomic profile; and scheduled a screening mammogram with breast density measurement. For women that had a mammogram during the year before the first visit, breast density and presence/absence of benign lesions were obtained from that mammogram and the radiologist’s report. Breast density was classified according to the Breast Imaging Reporting and Data System (BI-RADS), 5th edition, scoring system: almost entirely fatty (a), scattered areas of fibroglandular density (b), heterogeneously dense (c), and extremely dense (d). Mammographic findings were coded from 0 (incomplete—additional imaging needed) to 6 (known biopsy—proven malignancy). In the case of abnormal results, additional tests were requested. Collection, conservation, and delivery of saliva samples was completed following the saliva collection protocol provided by the University of Lleida’s Proteomics and Genomics Service. Details about the genotyping process can be found in the protocol. The PRS was obtained using the 83 SNPs associated with breast cancer, based on Shieh et al.’s or Mavaddat et al.’s studies, as a composite likelihood ratio representing the individual effects of each SNP. The primary outcome measures were attitude towards, intention to participate in, and satisfaction with personalized breast cancer screening by participating women. Attitude was measured with a three-item scale, each item ranging from 1 to 5, with higher scores indicating more positive attitudes. A “positive attitude” was defined as a total score greater than or equal to 12. Intention to participate was measured with a 5-point Likert scale from definitely will (1) to definitely will not (5). The variable was also dichotomized as intending to participate (definitely or likely) or not. Satisfaction was assessed after one year of recruitment and was measured on a 5-point Likert scale from very unsatisfied (1) to very satisfied (5). Secondary outcomes (e.g., attitude towards screening mammography, attitude towards measuring breast cancer risk, emotional impact of the measure of breast cancer risk, preference with regard to the current screening, knowledge, decisional conflict, confidence, and participation) have been detailed in full in the study protocol. The R programming language and the RStudio environment were used for the data analysis. The Likert function of the HH package was used to obtain the graphical representation of the primary outcomes measured as Likert scales.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and codes for "Imaging breast malignancies with the Twente Photoacoustic Mammoscope 2" doi: 10.1371/journal.pone.0281434.
Archive containing: - Photoacoustic reconstructions for each case presented in the paper, in .mat format - Anonymized clinical magnetic resonance images for case 1, 2 and 4 presented in the paper - MATLAB codes for image processing and comparison with magnetic resonance images
For any question, email: b.desanti@utwente.nl
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of ultrafast ultrasound acquisitions from nine volunteers and the CIRS 054G phantom. For a comprehensive understanding of the dataset, please refer to the paper: Viñals, R.; Thiran, J.-P. A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning. J. Imaging 2023, 9, 256. https://doi.org/10.3390/jimaging9120256. Please cite the original paper when using this dataset.
Due to data size restriction, the dataset has been divided into six subdatasets, each one published into a separate entry in Zenodo. This repository contains subdataset 4.
Number of Acquisitions: 20,000
Volunteers: Nine volunteers
File Structure: Each volunteer's data is compressed in a separate zip file.
Regions :
File Naming Convention: Incremental IDs from acquisition_00000 to acquisition_19999.
Two CSV files are provided:
invivo_dataset.csv :
invitro_dataset.csv :
The dataset has been divided into six subdatasets, each one published in a separate entry on Zenodo. The following table indicates, for each file or compressed folder, the Zenodo dataset split where it has been uploaded along with its size. Each dataset split is named "A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning: Dataset (ii/6)", where ii represents the split number. This repository contains the 4th split.
File name | Size | Zenodo subdataset number |
invivo_dataset.csv | 995.9 kB | 1 |
invitro_dataset.csv | 1.1 kB | 1 |
cirs-phantom.zip | 418.2 MB | 1 |
volunteer-1-lowerLimbs.zip | 29.7 GB | 1 |
volunteer-1-carotids.zip | 8.8 GB | 1 |
volunteer-1-back.zip | 7.1 GB | 1 |
volunteer-1-abdomen.zip | 34.0 GB | 2 |
volunteer-1-breast.zip | 15.7 GB | 2 |
volunteer-1-upperLimbs.zip | 25.0 GB | 3 |
volunteer-2.zip | 26.5 GB | 4 |
volunteer-3.zip | 20.3 GB | 3 |
volunteer-4.zip | 24.1 GB | 5 |
volunteer-5.zip | 6.5 GB | 5 |
volunteer-6.zip | 11.5 GB | 5 |
volunteer-7.zip | 11.1 GB | 6 |
volunteer-8.zip | 21.2 GB | 6 |
volunteer-9.zip | 23.2 GB | 4 |
Beamforming:
Depth from 1 mm to 55 mm
Width spanning the probe aperture
Grid: 𝜆/8 × 𝜆/8
Resulting images shape: 1483 × 1189
Two beamformed RF images from each acquisition:
Normalization:
To display the images:
File Format: Saved in npy format, loadable using Python and numpy.load(file)
.
For the volunteer-based split used in the paper:
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Please cite the original paper when using this dataset :
Viñals, R.; Thiran, J.-P. A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning. J. Imaging 2023, 9, 256. DOI: 10.3390/jimaging9120256
For inquiries or issues related to this dataset, please contact:
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Characteristic | Value (N = 385) |
---|---|
Age (years) | Mean ± SD: 48.8 ± 11 Median (IQR): 49 (41-56) Range: 23-77 |
Sex | Female: 386 (100%) |
Race | White: 282 (73%) |
Ethnicity | Hispanic: 25 (6%) |
The American College of Radiology Imaging Network (ACRIN) trial 6698 (NCT01564368) was a multi-center study to evaluate the effectiveness of quantitative diffusion weighted imaging (DWI) for assessing breast cancer response to neoadjuvant chemotherapy (NAC). ACRIN 6698 was performed as a sub-study of the ongoing I-SPY 2 TRIAL (Investigation of Serial studies to Predict Your Therapeutic Response with Imaging And moLecular Analysis 2), an adaptive, multi-agent phase II trial designed to quickly identify new agents for breast cancer.
Patients recruited for the I-SPY 2 TRIAL and enrolling at sites meeting DWI qualification requirements were eligible for the 6698 trial. 406 women with invasive breast cancer were prospectively enrolled to ACRIN 6698 at ten institutions between August 2012 to January 2015, and 272 were randomized to I‑SPY 2 experimental treatment or control arms. Patients underwent breast DWI using a 4-b value protocol, as well as standard T2-weighted and dynamic contrast enhanced (DCE) scans. MRI studies were conducted at 4 timepoints over the course of NAC: pre-treatment (T0), early-treatment after 3 cycles paclitaxel (T1), mid-treatment between paclitaxel and AC (T2) and post-treatment (T3). Of the 272 treated patients, 242 comprised the primary analysis cohort (30 were excluded for missing or non-evaluable DWI exams). This TCIA collection includes all MRI studies received by the UCSF image analysis lab, excluding studies with no analyzable acquisitions. A separate download option is provided to access only those studies in the primary analysis cohort. In addition, DWI test/retest data acquired at baseline or early-treatment in a subset of patients is included in the full collection and is available as a separate download option (N=89 subjects consented and imaged, 71 analyzable test/retest acquisition pairs).
The ACRIN 6698 image data set is currently a unique collection for investigating the utility of DWI for monitoring of response to neoadjuvant breast cancer treatment. While many smaller and/or single-site DWI studies have been published, the multi-center and quality-control aspects of this data allow investigators true evaluation of analysis techniques in the clinical trial environment. The multi-b value protocol also allows evaluation of higher order diffusion models and evaluation at two different clinically relevant b values (600 and 800 s/mm2). Furthermore, the embedded test-retest arm of the study will allow evaluation of repeatability and reproducibility of new DWI metrics and analysis techniques.
In addition to the original DWI data the collection includes derived ADC maps with manually delimited tumor segmentations from the primary study analysis (for all studies rated as analyzable in the QC evaluation), plus T2-weighted images, DCE images with derived enhancement maps, clinical data and outcome data (pathologic complete response [pCR] at surgery). Additional information about the trial is available in the Study Protocol and Case Report Forms.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Aim: The aim of this study was to investigate and evaluate the role of magnetic resonance (MR) diffusion kurtosis imaging (DKI) in characterizing breast lesions. Materials and Methods: One hundred and twenty-four lesions in 103 patients (mean age: 57±14 years) were evaluated by MR DKI performed with 7 b-values of 0, 250, 500, 750, 1,000, 1,500, 2,000 s/mm2 and dynamic contrast-enhanced (DCE) MR imaging. Breast lesions were histologically characterized and DKI related parameters—mean diffusivity (MD) and mean kurtosis (MK)—were measured. The MD and MK in normal fibroglandular breast tissue, benign and malignant lesions were compared by One-way analysis of variance (ANOVA) with Tukey's multiple comparison test. Receiver operating characteristic (ROC) analysis was performed to assess the sensitivity and specificity of MD and MK in the diagnosis of breast lesions. Results: The benign lesions (n = 42) and malignant lesions (n = 82) had mean diameters of 11.4±3.4 mm and 35.8±20.1 mm, respectively. The MK for malignant lesions (0.88±0.17) was significantly higher than that for benign lesions (0.47±0.14) (P<0.001), and, in contrast, MD for benign lesions (1.97±0.35 (10−3 mm2/s)) was higher than that for malignant lesions (1.20±0.31 (10−3 mm2/s)) (P<0.001). At a cutoff MD/MK 1.58 (10−3 mm2/s)/0.69, sensitivity and specificity of MD/MK for the diagnosis of malignant were 79.3%/84.2% and 92.9%/92.9%, respectively. The area under the curve (AUC) is 0.86/0.92 for MD/MK. Conclusions: DKI could provide valuable information on the diffusion properties related to tumor microenvironment and increase diagnostic confidence of breast tumors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FISH, fluorescence in situ hybridization.a ratio ≥ 2.2.b ratio = 1.8–2.2.c ratio
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction: The study aimed to assess the feasibility of calculating the European Society of Breast Cancer Specialists (EUSOMA) quality indicators (QIs) using Belgian cancer registry data coupled to administrative health data, and to provide national results. Methods: Women diagnosed with ductal carcinoma in situ (DCIS) or invasive breast cancer (IBC) in 2014-2018 were selected from the cancer registry. Fourteen EUSOMA QIs were chosen to assess the quality of care. Results: Overall, 46,035 patients with IBC and 3,973 patients with DCIS were included. Most QIs had to be rephrased so that they could be calculated with the available data. None of the selected QIs on systemic treatment could be calculated due to a lack of reliable receptor status information. For some QIs there is ample room for improvement in Belgian clinical practice: cTNM stage reporting, multidisciplinary team meetings, sentinel lymph-node biopsy only in IBC with clinically negative lymph nodes. The result was 1-5% lower than the target for mammography and breast ultrasound, pTNM reporting, start of treatment, single breast surgery in DCIS, and no axillary clearance in DCIS. For histological or cytological assessment, receptor status assessment, start of radiotherapy, and single breast surgery in IBC, the results were at or above the target. Conclusion: Several EUSOMA QIs can be calculated with routinely collected data, while for several important aspects of care additional data collection is indicated so that its quality can be assessed. The validity of the obtained results depends on the reporting accuracy.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Summary
This metadata record provides details of the data supporting the claims of the related manuscript: “Unmasking the immune microecology of ductal carcinoma in situ with deep learning”.
The data consist of immunohistochemistry (IHC) haematoxylin and eosin (H&E) staining images of grade 2-3 pure ductal carcinoma in situ (DCIS) and DCIS adjacent to invasive cancer (adjacent DCIS) samples.
The related study aimed to characterise tissue spatial architecture and the microenvironment of DCIS via design and validation of a new deep learning pipeline.
Data access
All training data, including the fully anonymised raw H&E image tiles and pathological annotations as binary marks, as well as Python code, are available in the corresponding author’s GitHub: https://github.com/pathdata/HE_Tissue_Segmentation. Requests for data access for the Duke samples can be submitted to E. Shelley Hwang (shelley.hwang@duke.edu) and Yinyin Yuan (yinyin.yuan@icr.ac.uk). Data underlying Figures 4 and 6 are in the files “Ext_validData_DCIS_DAVE_Fig4_data.csv” and “Ext_validData_DCIS_DAVE_Fig6_data.csv”, included with this metadata record. The images used as representative examples in Figure 8 are listed in the file “Figure 8 image details.xlsx”, included with this metadata record.
Name of Institutional Review Board or ethics committee that approved the study
The study was approved by the institutional review board of Duke with a waiver of the requirement to obtain informed consent.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Companion dataset of the manuscript: Seonyeong Park, Umberto Villa, Fu Li, Refik Mert Cam, Alexander A. Oraevsky, Mark A. Anastasio, "Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer," J. Biomed. Opt. 28(6) 066002 (20 June 2023) https://doi.org/10.1117/1.JBO.28.6.066002 This dataset contains 40 sets of three-dimensional (3D) numerical breast phantoms (NBPs) for use in virtual imaging trials (VITs) of optoacoustic tomography (OAT) and the corresponding simulated multi-wavelength optical fluence distributions, induced initial pressure distributions, and OAT measurement data. The NBPs are in natural shapes, and four different-sized lesions, that are composed solely of a viable tumor cell region, were inserted into each of 40 NBPs. The NBPs correspond to one of the following four breast density types defined in Breast Imaging Reporting and Data System (BI-RAD®): A: Breast is almost entirely fatty; B: Breast has scattered areas of fibroglandular density; C: Breast is heterogeneously dense; D: Breast is extremely dense. Each NBP set consists of A tissue label map (anatomical NBP); Functional properties and chromophore concentrations maps (functional NBPs); Optical properties maps (optical absorption and scattering coefficients, scattering anisotropy, and refractive indexes) at multiple illumination wavelengths (optical NBPs); Acoustic properties (speed of sound, density, and acoustic attenuation) maps (acoustic NBPs); Simulated multi-wavelength optical fluence distributions; Simulated multi-wavelength induced initial pressure distributions; Simulated multi-wavelength acoustic measurements. The tissue label map (anatomical NBP) was created by use of our adaptation of tools from the Virtual Imaging Clinical Trials for Regulatory Evaluation (VICTRE) project at the U.S. Food and Drug Administration (FDA) and our python library to introduce blood vasculature under the skin layer. In each NBP, four different-sized (<10 mm in diameter) numerical lesion phantoms (NLPs) were inserted at locations randomly selected from those predicted by the VICTRE tools based on the duct and terminal duct lobular unit (TDLU) structures that are well-known sites for lesion formation. The considered tissue types and their unsigned 8-bit integer (uint8) labels are as belows. Background: 0 Fat: 1 Skin: 2 Glandular: 29 Nipple: 33 Ligament: 88 Terminal duct lobular unit (TDLU): 95 Duct: 125 Artery: 150 Vein: 225 Peripheral angiogenesis: 190 Viable tumor cell: 200 Necrotic core: 210 The functional, optical, and acoustic properties maps were produced employing our python libraries to assign corresponding properties to each breast tissue type. The optical fluence distributions were simulated using the MCX software [Fang2009], [Yu2018]. The induced initial pressure distributions were calculated via voxelwise multiplication of the optical absorption coefficient distributions and the simulated optical fluence distributions, assuming a Grüneisen Γ=1 as commonly done as constant for soft tissues. The acoustic measurements were simulated employing the k-wave software [Treeby2010]. Further details on 1) modifications and adaptations of the VICTRE NBPs for use in VITs of OAT; 2) virtual OAT imaging system and data acquisition are in the accompanying paper [Park2023]. The file naming convention of files in this dataset is {breast type}{seed number}{lesion presence}_{contained data}.mat, where: Breast type: A, B, C, or D; Seed number, i.e., a number randomly generated when producing each tissue label map; Lesion presence: absent (healthy, h) or present (l); Contained data: tissue label map (label), functional properties and chromophore concentrations maps (func), optical properties maps (opt), acoustic properties maps(acou), simulated optical fluence distributions (phi), simulated induced initial pressure distributions (p0), or simulated acoustic measurements (p). For example, the file name of a tissue label map of the type A breast with no lesion inserted (healthy breast) that was created using the seed number 123456 is A12345678h_label.mat. The actual data and metadata contained in each file are as below. {breast type}{seed number}{lesion presence}_label.mat Data label: 1360 x 1360 x 680 unit8 tissue label map that includes healthy tissues, viable tumor cell region, and necrotic core Metadata origin: 3 x 1 float32 values that specify x, y, and z coordinates of origin, (-85, -85, -85) mm voxel_size: float32 value that specifies voxel size, 0.125 mm tissue_type_label: 13 x 2 cell that specifies tissue type names and the corresponding uint8 values in label breast_type: Breast type, A: Breast is almost entirely fatty; B: Breast has scattered areas of fibroglandular density; C: Breast is heterogeneously dense; or D: Breast is extremely dense lesion_presence: Lesion presence, h: Lesion-absent numerical breast phantom (healthy) or l: Lesion-present...