Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer-Net PCa-Data is an open access benchmark dataset of volumetric correlated diffusion imaging (CDIs) data acquisitions of prostate cancer patients. Cancer-Net PCa-Data is a part of the Cancer-Net open source initiative dedicated to advancement in machine learning and imaging research to aid clinicians in the global fight against cancer.
The volumetric CDIs data acquisitions in the Cancer-Net PCa-Data dataset were generated from a patient cohort of 200 patient cases acquired at Radboud University Medical Centre (Radboudumc) in the Prostate MRI Reference Center in Nijmegen, The Netherlands and made available as part of the SPIE-AAPM-NCI PROSTATEx Challenges. Masks derived from the PROSTATEx_masks repository are also provided which label regions of healthy prostate tissue, clinically significant prostate cancer (csPCa), and clinically insignificant prostate cancer (insPCa).
This dataset was used to investigate the relationship between PCa presence and CDIs hyperintensity.
Cancer-Net PCa-Data is released under a CC BY 4.0 license.
Example T2-weighted images of prostates with CDIs overlaid are shown below.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4364336%2Fc312a93e80813c9f4e5e418f1220d4e4%2FPROSTATEx-grid-top100.png?generation=1684256503310308&alt=media" alt="Grid of T2-weighted MRI images of the prostate with CDIs images overlaid.">
If you find our work useful for your research, please cite:
@article{Wong2022,
author={Alexander Wong and Hayden Gunraj and Vignesh Sivan and Masoom A. Haider},
title={Synthetic correlated diffusion imaging hyperintensity delineates clinically significant prostate cancer},
journal ={Scientific Reports},
volume={12},
year={2022},
number={3376},
doi={10.1038/s41598-022-06872-7}
}
and
@article{Gunraj2023,
author={Hayden Gunraj and Chi-en Amy Tai and Alexander Wong},
title={Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data},
journal ={NeurIPS Workshops},
year={2023}
}
Additionally, SPIE-AAPM-NCI PROSTATEx Challenges, PROSTATEx_masks, and The Cancer Imaging Archive (TCIA) should also be cited:
@misc{Litjens2017,
author={Geert Litjens and Oscar Debats and Jelle Barentsz and Nico Karssemeijer and Henkjan Huisman},
title={ProstateX Challenge data [data set]},
journal={The Cancer Imaging Archive},
year={2017},
doi={10.7937/K9TCIA.2017.MURS5CL
}
@article{Litjens2014,
author={Geert Litjens and Oscar Debats and Jelle Barentsz and Nico Karssemeijer and Henkjan Huisman},
title={Computer-Aided Detection of Prostate Cancer in MRI},
journal={IEEE Transactions on Medical Imaging},
year={2014},
volume={33},
number={5},
pages={1083-1092},
doi={10.1109/TMI.2014.2303821}
}
@article{Cuocolo2021,
author={Renato Cuocolo and Arnaldo Stanzione and Anna Castaldo and Davide Raffaele {De Lucia} and Massimo Imbriaco},
title={Quality control and whole-gland, zonal and lesion annotations for the PROSTATEx challenge public dataset},
journal={European Journal of Radiology},
volume={138},
pages={109647},
year={2021},
doi={10.1016/j.ejrad.2021.109647}
}
@article{Clark2013,
author={Kenneth Clark and Bruce Vendt and Kirk Smith and John Freymann and Justin Kirby and Paul Koppel and Stephen Moore and Stanley Phillips and David Maffitt and Michael Pringle and Lawrence Tarbox and Fred Prior},
title={The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository},
journal={Journal of Digital Imaging},
year={2013},
volume={26},
number={6},
pages={1045-1057},
}
Facebook
Twitterhttps://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Prescription Cost Analysis (PCA) provides details of the number of items and the net ingredient cost (NIC) of all prescriptions dispensed in the community in England. The drugs dispensed are listed by British National Formulary (BNF) therapeutic class. This publication includes data sheets for items dispensed and NIC from 2007 to 2017 at individual presentation level. The Prescribing by Dentists report is no longer published separately in April, however the data is already included in this publication and provided as a separate csv file purely with the dental data. Please note the two csv files (data and Dental data) should not be added together as the PCA data already includes the Dental data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Some drug names found in the PCA data do not exactly match any drug names in the current British National Formulary (BNF), e.g. formulation variants no longer available. A similar BNF presentation name could sometimes be found by using the “fuzzy” lookup add-on for Excel. These were validated manually by a pharmacist.
Facebook
TwitterThis dataset was created by Alex Wolski
Facebook
TwitterPrincipal component analysis (PCA) of behavioural data across the life stages.
Facebook
TwitterA dataset established in 2017 containing environmental data from the CityScapeLabs research platform in Berlin, Germany.This dataset is a subset of the main CityScapeLabs dataset which was used in the paper titled "Soil physico-chemical properties change across an urbanity gradient in Berlin", currently in review.Data was collected by Lena Fiechter, Moritz von der Lippe and Anne Hiller, with one additional parameter added by James Whitehead.For more details on the CityScapeLab research platform please see:von der Lippe, M.; Buchholz, S.; Hiller, A.; Seitz, B.; Kowarik, I. CityScapeLab Berlin: A Research Platform for Untangling Urbanization Effects on Biodiversity. Sustainability 2020, 12, 2565. https://doi.org/10.3390/su12062565
Facebook
TwitterEight complexity surfaces (mean depth, standard deviation of depth, curvature, plan curvature, profile curvature, rugosity, slope, and slope of slope) were stacked and exported to create one image with several different bands (each band representing a specific metric). This image was transformed into its first three principal components using the "Principal Components Analysis" (PCA) function in ENVI 4.6. The transformation reduced the dimensionality of the dataset by removing information that was redundant among the different bands. The resulting 2x2 meter resolution, three band PCA image only contains information that uniquely described the complexity and structure of the seafloor. Coral reef habitat types were delineated and classified from this PCA image. Acoustic imagery was acquired for the VICRNM on two separate missions onboard the NOAA ship, Nancy Foster. The first mission took place from 2/18/04 to 3/5/04. The second mission took place from 2/1/05 to 2/12/05. On both missions, seafloor depths between 14 to 55 m were mapped using a RESON SeaBat 8101 ER (240 kHz) MBES sensor. This pole-mounted system measured water depths across a 150 degree swath consisting of 101 individual 1.5 degree x 1.5 degree beams. The beams to the port and starboard of nadir (i.e., directly underneath the ship) overlapped adjacent survey lines by approximately 10 m. The vessel survey speed was between 5 and 8 kn. In 2004, the ship's location was determined by a Trimble DSM 132 DGPS system, which provided a RTCM differential data stream from the U.S. Coast Guard Continually Operating Reference Station (CORS) at Port Isabel, Puerto Rico. Gyro, heave, pitch and roll correctors were acquired using an Ixsea Octans gyrocompass. In 2005, the ship's positioning and orientation were determined by the Applanix POS/MV 320 V4, which is a GPS aided Inertial Motion Unit (IMU) providing measurements of roll, pitch and heading. The POS/MV obtained its positions from two dual frequency Trimble Zephyr GPS antennae. An auxiliary Trimble DSM 132 DGPS system provided a RTCM differential data stream from the U.S. Coast Guard CORS at Port Isabel, Puerto Rico. For both years, CTD (conductivity, temperature and depth) measurements were taken approximately every 4 hours using a Seabird Electronics SBE-19 to correct for the changing sound velocities in the water column. In 2004, raw data were logged in .xtf (extended triton format) using Triton ISIS software 6.2. In 2005, raw data were logged in .gsf (generic sensor format) using SAIC ISS 2000 software. Data from 2004 were referenced to the WGS84 UTM 20 N horizontal coordinate system, and data from 2005 were referenced to the NAD83 UTM 20 N horizontal coordinate system. Data from both projects were referenced to the Mean Lower Low Water (MLLW) vertical tidal coordinate system. The 2004 and 2005 MBES bathymetric data were both corrected for sensor offsets, latency, roll, pitch, yaw, static draft, the changing speed of sound in the water column and the influence of tides in CARIS Hips & Sips 5.3 and 5.4, respectively. The 2004 data was then binned to create a 1 x 1 m raster surface, and the 2005 data was binned to a create 2 x 2 m raster surface. After these final surfaces were created, the datum for the 2004 bathymetric surfaces was transformed from WGS84 to NAD83 using the "Project Raster" function in ArcGIS 9.1. The 2004 surface was transformed so that it would have the same datum as the 2005 surface. The 2004 bathymetric surface was then down sampled from 1 x 1 to 2 x 2 m using the "Resample" function in ArcGIS 9.1. The 2004 surface was resampled so it would have the same spatial resolution as the 2005 surface. Having the same coordinate systems and spatial resolutions, the final 2004 and 2005 bathymetry rasters were then merged using the Raster Calculator function "Merge" in ArcGIS's Spatial Analyst Extension to create a seamless bathymetry surface for the entire VICRNM area south of St. John. For a complete description of the data acquisition and processing parameters, please see the data acquisition and processing reports (DAPRs) for projects: NF-04-06-VI and NF-05-05-VI (Monaco & Rooney, 2004; Battista & Lazar, 2005).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
413 Global import shipment records of Pca with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data analyzed in the manuscript "The Association between Frequent Alcohol Drinking and Opioid Consumption after Abdominal Surgery: A Retrospective Analysis"
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 28 verified PCA locations in United States with complete contact information, ratings, reviews, and location data.
Facebook
Twitterhttps://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-2321https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-2321
This dataset contains the source code for uncertainty-aware principal component analysis (UA-PCA) and a series of images that show dimensionality reduction plots created with UA-PCA. The software is a JavaScript library for performing principal component analysis and dimensionality reduction on datasets consisting of multivariate probability distributions. Each plot of the image series used UA-PCA to project a dataset consisting of multivariate normal distributions. The covariance matrices of the dataset instances were scaled with different factors resulting in different UA-PCA projections. The projected probability distributions are displayed using isolines of their probability density functions. As the scaling value increases, the projection changes, showing the sensitivity of UA-PCA to changes in variance.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Cancer Moonshot Biobank is a National Cancer Institute initiative to support current and future investigations into drug resistance and sensitivity and other NCI-sponsored cancer research initiatives, with an aim of improving researchers' understanding of cancer and how to intervene in cancer initiation and progression. During the course of this study, biospecimens (blood and tissue removed during medical procedures) and associated data will be collected longitudinally from at least 1000 patients across at least 10 cancer types, who are receiving standard of care cancer treatment at multiple NCI Community Oncology Research Program (NCORP) sites.
This collection contains de-identified radiology and histopathology imaging procured from subjects in NCI’s Cancer Moonshot Biobank - Prostate Cancer (CMB-PCA) cohort. Associated genomic, phenotypic and clinical data will be hosted by The Database of Genotypes and Phenotypes (dbGaP) and other NCI databases. A summary of Cancer Moonshot Biobank imaging efforts can be found on the Cancer Moonshot Biobank Imaging page.
Facebook
TwitterPrevious research shows that each type of cancer can be divided into multiple subtypes, which is one of the key reasons that make cancer difficult to cure. Under these circumstances, finding a new target gene of cancer subtypes has great significance on developing new anti-cancer drugs and personalized treatment. Due to the fact that gene expression data sets of cancer are usually high-dimensional and with high noise and have multiple potential subtypes’ information, many sparse principal component analysis (sparse PCA) methods have been used to identify cancer subtype biomarkers and subtype clusters. However, the existing sparse PCA methods have not used the known cancer subtype information as prior knowledge, and their results are greatly affected by the quality of the samples. Therefore, we propose the Dynamic Metadata Edge-group Sparse PCA (DM-ESPCA) model, which combines the idea of meta-learning to solve the problem of sample quality and uses the known cancer subtype information as prior knowledge to capture some gene modules with better biological interpretations. The experiment results on the three biological data sets showed that the DM-ESPCA model can find potential target gene probes with richer biological information to the cancer subtypes. Moreover, the results of clustering and machine learning classification models based on the target genes screened by the DM-ESPCA model can be improved by up to 22–23% of accuracies compared with the existing sparse PCA methods. We also proved that the result of the DM-ESPCA model is better than those of the four classic supervised machine learning models in the task of classification of cancer subtypes.
Facebook
TwitterLiuChong/INF-PCA-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
108 Global exporters importers export import shipment records of Sodium pca with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
Facebook
TwitterBackground In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure. Results We used PCA to detect the major sources of variance underlying the hybridization conditions followed by gene selection based on PCA-derived and permutation-based test statistics. We validated our method by applying it to well characterized yeast cell-cycle data and to two datasets from our laboratory. We could describe the major sources of variance, select informative genes and visualize the relationship of genes and arrays. We observed differences in the level of the explained variance and the interpretability of the selected genes. Conclusions Combining data visualization and permutation-based gene selection, permutation-validated PCA enables one to illustrate gene-expression variance between several conditions and to select genes by taking into account the relationship of between-group to within-group variance of genes. The method can be used to extract the leading sources of variance from microarray data, to visualize relationships between genes and hybridizations and to select informative genes in a statistically reliable manner. This selection accounts for the level of reproducibility of replicates or group structure as well as gene-specific scatter. Visualization of the data can support a straightforward biological interpretation.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A phenomenological study of solubility has been conducted using a combination of quantitative structure−property relationship (QSPR) and principal component analysis (PCA). A solubility database of 4540 experimental data points was used that utilized available experimental data into a matrix of 154 solvents times 397 solutes. Methodology in which QSPR and PCA are combined was developed to predict the missing values and to fill the data matrix. PCA on the resulting filled matrix, where solutes are observations and solvents are variables, shows 92.55% of coverage with three principal components. The corresponding transposed matrix, in which solvents are observations and solutes are variables, showed 62.96% of coverage with four principal components.
Facebook
TwitterExplore Indian Pca export data with HS codes, pricing, ports, and a verified list of Pca exporters and suppliers from India with complete shipment insights.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer-Net PCa-Data is an open access benchmark dataset of volumetric correlated diffusion imaging (CDIs) data acquisitions of prostate cancer patients. Cancer-Net PCa-Data is a part of the Cancer-Net open source initiative dedicated to advancement in machine learning and imaging research to aid clinicians in the global fight against cancer.
The volumetric CDIs data acquisitions in the Cancer-Net PCa-Data dataset were generated from a patient cohort of 200 patient cases acquired at Radboud University Medical Centre (Radboudumc) in the Prostate MRI Reference Center in Nijmegen, The Netherlands and made available as part of the SPIE-AAPM-NCI PROSTATEx Challenges. Masks derived from the PROSTATEx_masks repository are also provided which label regions of healthy prostate tissue, clinically significant prostate cancer (csPCa), and clinically insignificant prostate cancer (insPCa).
This dataset was used to investigate the relationship between PCa presence and CDIs hyperintensity.
Cancer-Net PCa-Data is released under a CC BY 4.0 license.
Example T2-weighted images of prostates with CDIs overlaid are shown below.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4364336%2Fc312a93e80813c9f4e5e418f1220d4e4%2FPROSTATEx-grid-top100.png?generation=1684256503310308&alt=media" alt="Grid of T2-weighted MRI images of the prostate with CDIs images overlaid.">
If you find our work useful for your research, please cite:
@article{Wong2022,
author={Alexander Wong and Hayden Gunraj and Vignesh Sivan and Masoom A. Haider},
title={Synthetic correlated diffusion imaging hyperintensity delineates clinically significant prostate cancer},
journal ={Scientific Reports},
volume={12},
year={2022},
number={3376},
doi={10.1038/s41598-022-06872-7}
}
and
@article{Gunraj2023,
author={Hayden Gunraj and Chi-en Amy Tai and Alexander Wong},
title={Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data},
journal ={NeurIPS Workshops},
year={2023}
}
Additionally, SPIE-AAPM-NCI PROSTATEx Challenges, PROSTATEx_masks, and The Cancer Imaging Archive (TCIA) should also be cited:
@misc{Litjens2017,
author={Geert Litjens and Oscar Debats and Jelle Barentsz and Nico Karssemeijer and Henkjan Huisman},
title={ProstateX Challenge data [data set]},
journal={The Cancer Imaging Archive},
year={2017},
doi={10.7937/K9TCIA.2017.MURS5CL
}
@article{Litjens2014,
author={Geert Litjens and Oscar Debats and Jelle Barentsz and Nico Karssemeijer and Henkjan Huisman},
title={Computer-Aided Detection of Prostate Cancer in MRI},
journal={IEEE Transactions on Medical Imaging},
year={2014},
volume={33},
number={5},
pages={1083-1092},
doi={10.1109/TMI.2014.2303821}
}
@article{Cuocolo2021,
author={Renato Cuocolo and Arnaldo Stanzione and Anna Castaldo and Davide Raffaele {De Lucia} and Massimo Imbriaco},
title={Quality control and whole-gland, zonal and lesion annotations for the PROSTATEx challenge public dataset},
journal={European Journal of Radiology},
volume={138},
pages={109647},
year={2021},
doi={10.1016/j.ejrad.2021.109647}
}
@article{Clark2013,
author={Kenneth Clark and Bruce Vendt and Kirk Smith and John Freymann and Justin Kirby and Paul Koppel and Stephen Moore and Stanley Phillips and David Maffitt and Michael Pringle and Lawrence Tarbox and Fred Prior},
title={The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository},
journal={Journal of Digital Imaging},
year={2013},
volume={26},
number={6},
pages={1045-1057},
}