90 datasets found

d
NIH Common Data Elements Repository
catalog.data.gov
datadiscovery.nlm.nih.gov
+4more
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). NIH Common Data Elements Repository [Dataset]. https://catalog.data.gov/dataset/nih-common-data-elements-repository-f6b3a
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
The NIH Common Data Elements (CDE) Repository has been designed to provide access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and for other purposes. Visit the NIH CDE Resource Portal for contextual information about the repository.
H
Therapeutics Data Commons (https://tdcommons.ai)
datasetcatalog.nlm.nih.gov
dataverse.harvard.edu
+1more
Updated Oct 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kexin Huang Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik, Tianfan Fu (2020). Therapeutics Data Commons (https://tdcommons.ai) [Dataset]. http://doi.org/10.7910/DVN/21LKWG
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/21LKWG
Dataset updated
Oct 14, 2020
Authors
Kexin Huang Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik, Tianfan Fu
Description
Therapeutics Data Commons (TDC) is an open-science initiative started at Harvard with AI/ML-ready datasets and ML tasks for therapeutics. It provides an ecosystem of tools, leaderboards, and community resources, including data functions, model benchmarking and comparison strategies, meaningful data splits, data processors, public leaderboards, and molecule generation oracles. All resources are integrated and accessible via an open Python library. TDC is available at https://tdcommons.ai.
Historical NCI Genomic Data Commons data (09-14-2017)
zenodo.org
data.niaid.nih.gov
tsv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
Explore at:
tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1186945
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Inge Seim; Inge Seim
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
samples570
version11-27-2017
hubhttps://gdc.xenahubs.net
type of dataphenotype
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
raw datahttps://api.gdc.cancer.gov/data/
input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
570 samples X 151 identifiersAll Identifiers All Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
samples512
version09-14-2017
hubhttps://gdc.xenahubs.net
type of datagene expression RNAseq
unitlog2(fpkm-uq+1)
platformIllumina
ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
raw datahttps://api.gdc.cancer.gov/data/
wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
60,484 identifiers X 512 samples
Common Metadata Elements for Cataloging Biomedical Datasets
figshare.com
xlsx
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Read (2016). Common Metadata Elements for Cataloging Biomedical Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1496573.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1496573.v1
Dataset updated
Jan 20, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Kevin Read
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.
NIH Common Data Elements Repository - ic3x-2s7m - Archive Repository
healthdata.gov
csv, xlsx, xml
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NIH Common Data Elements Repository - ic3x-2s7m - Archive Repository [Dataset]. https://healthdata.gov/w/9rjf-x4nc/default?cur=wG4qu23M_S7&from=2KYT7QcwQ96
Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Jul 16, 2025
Description
This dataset tracks the updates made on the dataset "NIH Common Data Elements Repository" as a repository for previous versions of the data and metadata.
Z
Data from: Uncommon Commons? Creative Commons licencing in Horizon 2020 Data...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Jun 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Spichtinger (2022). Uncommon Commons? Creative Commons licencing in Horizon 2020 Data Management Plans [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6685130
Explore at:
Dataset updated
Jun 22, 2022
Dataset provided by
independent researcher / Ludwig Boltzmann Gesellschaft
Authors
Daniel Spichtinger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As policies, good practices and funder mandates on research data management evolve, more emphasis has been put on the licencing of data. Licencing information allow potential re-users to quickly identify what they can do with the data in question and is therefore an important component to ensure the reusability of research.

In my research I analyse a pre-existing collection of 840 Horizon 2020 public data management plans (DMPs) available on the repository of the University of Vienna, Phaidra,, to determine which ones mention creative commons licences and among those who do, what licences are being used.

This excel file contains the data underlying the publication "Uncommon Commons? Creative Commons licencing in Horizon 2020 Data Management Plans ".

Sheet 1 contains the data collected in the previous "Data Re-Use" project: 840 DMPs downloaded from CORDIS and vetted to ensure they are public documents and not copyrighted

Sheet 2 contains the same data as sheet 1, with columns D to Q not visible (for better reading) but an added column R which now contains the CC licening information (where available)

Sheet 3 is filtered so that only the projects containing CC BY relevant licencing are shown

Sheet 4 is filtered so that only the projects containing CC-BY-SA relevant licencing are shown

Sheet 5 is filtered so that only the projects containing CC-BY-NC relevant licencing are shown

Sheet 6 is filtered so that only the projects containing CC-BY-ND relevant licencing are shown

Sheet 7 is filtered so that only the projects containing Cc-BY-NC-ND relevant licencing are shown

Sheet 8 is filtered so that only the projects containing CC-BY-NC-SA relevant licencing are shown

Sheet 9 is filtered so that only the projects containing CC0 relevant information are shown

Sheet 10 provides an overview table of the relevant licences (manual entry)

Sheet 11 and 12 contain graphic visulations of the data as used in the article
u
Data from: Microplitis demolitor Official Gene Set micdem_OGSv1.0
agdatacommons.nal.usda.gov
application/x-gzip
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelly Tims; Gaelen Burke (2025). Microplitis demolitor Official Gene Set micdem_OGSv1.0 [Dataset]. http://doi.org/10.15482/USDA.ADC/1521095
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1521095
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Kelly Tims; Gaelen Burke
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Microplitis demolitor (Hymenoptera: Braconidae) is a parasitoid used as a biological control agent to control larval-stage Lepidoptera and serves as a model for studying the function and evolution of symbiotic viruses in the genus Bracovirus. This dataset presents the Microplitis demolitor Official Gene Set (OGS) v1.0. The OGS is an integration of automatic gene predictions from Microplitis demolitor genome annotations NCBI-RefSeq's gene set NCBI Microplitis demolitor Annotation Release 101 (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Microplitis_demolitor/101/), with manual annotations by the research community, performed via the Apollo manual curation software (https://zenodo.org/record/1295754#.YDgLyJNKivg). Manual annotations were QC'd via the GFF3toolkit (https://github.com/NAL-i5K/gff3toolkit) and NCBI's table2asn_GFF software (https://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/table2asn_GFF/), and merged with NCBI Microplitis demolitor Annotation Release 101 via the GFF3toolkit (https://github.com/NAL-i5K/gff3toolkit). Resources in this dataset:Resource Title: Microplitis demolitor Official Gene Set micdem_OGSv1.0. File Name: micdem_OGSv1.0.tar.gzResource Description: This directory contains files for the Official Gene Set 1.0 for Microplitis demolitor (micdem_OGSv1.0). The general procedure for generating this OGS is outlined here: https://github.com/NAL-i5K/GFF3toolkit/. QC of community-curated models from the Apollo software was performed by NAL staff using the GFF3toolkit function gff3_QC, and errors were fixed using gff3_fix. OGSv1.0 was generated by merging NCBI-RefSeq's gene set NCBI Microplitis demolitor Annotation Release 101 (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Microplitis_demolitor/101/) with the QC'd and error-corrected community-curated models, and generating i5k Workspace IDs for all manually annotated features.

1) Fasta files - Protein Sequences: micdem_OGSv1.0_pep.fa - Coding Sequences (CDS): micdem_OGSv1.0_CDS.fa - Transcript Sequences (includes non-coding sequence): micdem_OGSv1.0_trans.fa

2) Gff3 file: micdem_OGSv1.0.gff

3) Mapping file between Gene set NCBI Microplitis demolitor Annotation Release 101 and OGSv1.0: ID_map_report.txt
d
Blog | Common Credits Model
catalog.data.gov
datasets.ai
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Health (2025). Blog | Common Credits Model [Dataset]. https://catalog.data.gov/dataset/blog-common-credits-model
Explore at:
Dataset updated
Mar 26, 2025
Dataset provided by
National Institute of Health
Description
This blog post was posted on November 13, 2015 and was written. by. George Komatsoulis. It is a cross-post from the NIH's Data Science blog - https://datascience.nih.gov/blog.
Z
Research Software Communities Global South
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Oct 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martinez, Paula Andrea (2022). Research Software Communities Global South [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7179806
Explore at:
Dataset updated
Oct 11, 2022
Dataset provided by
Australian Research Data Commons
Authors
Martinez, Paula Andrea
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Research Software Alliance's (ReSA) mission is to bring research software communities together to collaborate on the advancement of research software. Given the ReSA mission, it is important to understand the landscape of communities involved with research software. In 2020, ReSA completed an initial exercise to scope the international research software community landscape. This work was reported by ReSA's Software Landscape Analysis task force via a blog post. The majority of the communities in the previous analysis represented the global north. To improve the extent of this landscape analysis, ReSA announced a paid opportunity for short-term contractors located in the global south to collect data on communities and funders in their region in early 2022. This document describes how the work was undertaken, a summary of findings, the gaps and opportunities perceived by the data collectors and some highlights. This work identified 126 organisations and communities and 62 funder bodies that support research software in the global south. Their main activities are connecting people, training, and networking, and support through research grants.

To add to this communities list please fill in the following form https://forms.gle/KJE9vkBnM6vhh7cEA
u
Data from: Bacterial communities and prevalence of antibiotic resistance...
agdatacommons.nal.usda.gov
datasets.ai
+2more
xlsx
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saraswoti Neupane; Justin L. Talley; David B. Taylor; Dana Nayduch (2025). Data from: Bacterial communities and prevalence of antibiotic resistance genes carried within house flies (Diptera: Muscidae) associated with beef and dairy cattle farms [Dataset]. http://doi.org/10.15482/USDA.ADC/1529546
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1529546
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Saraswoti Neupane; Justin L. Talley; David B. Taylor; Dana Nayduch
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
House flies (Musca domestica L.) are vectors of human and animal pathogens at livestock operations. Microbial communities in flies are acquired from, and correlate with, their local environment. However, variation among microbial communities carried by flies from farms in different geographical areas is not well understood. We characterized bacterial communities of female house flies collected from beef and dairy farms in Oklahoma, Kansas, and Nebraska and further evaluated the prevalence of antibiotic resistance genes in bacteria within flies. We evaluated the influence of farm type and farm location on bacterial communities, diversity, pathogenic bacteria strains and prevalence of antibiotic resistance genes. These data can be used for better understanding of abundance and prevalence of bacterial communities in house flies associated with livestock operations. These data were collected in September 2019. Abbreviations used include Operational Taxonomic Units(OTUs), Canonical Correspondence analysis (CCA), Infectious Bovine Keratoconjunctivitis (IBK), Anti Microbial Resistance (AMR), and Antibiotic Resistance Genes (ARGs).
The raw Illumina MiSeq sequence data for this project can be found here: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA863664 Resources in this dataset:

Resource title: Metadata for Microbiome of House Fly Associated with Cattle Farms File name: Metadata for Microbiome of House Fly Associated with Cattle Farms.xlsx Resource description: This spreadsheet links the raw sequence reads on NCBI with data on farm type, farm location and sample type.
Z
Data from: Global scientific research commons under the Nagoya Protocol:...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Aug 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dedeurwaerdere, Tom (2024). Global scientific research commons under the Nagoya Protocol: Towards a collaborative economy model for the sharing of basic research assets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_47397
Explore at:
Dataset updated
Aug 4, 2024
Dataset provided by
Université catholique de Louvain
Authors
Dedeurwaerdere, Tom
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This paper aims to get a better understanding of the motivational and transaction cost features of building global scientific research commons, with a view to contributing to the debate on the design of appropriate policy measures under the recently adopted Nagoya Protocol. For this purpose, the paper analyses the results of a world-wide survey of managers and users of microbial culture collections, which focused on the role of social and internalized motivations, organizational networks and external incentives in promoting the public availability of upstream research assets. Overall, the study confirms the hypotheses of the social production model of information and shareable goods, but it also shows the need to complete this model. For the sharing of materials, the underlying collaborative economy in excess capacity plays a key role in addition to the social production, while for data, competitive pressures amongst scientists tend to play a bigger role.
f
Common data operations expressed as MLMs.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Dec 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nehorai, Arye; La Rosa, Patricio S.; Cawi, Eric (2019). Common data operations expressed as MLMs. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000127140
Explore at:
Dataset updated
Dec 2, 2019
Authors
Nehorai, Arye; La Rosa, Patricio S.; Cawi, Eric
Description
Common data operations expressed as MLMs.
Industrial Ecology Data Commons (iedc) December 2024 update
zenodo.org
data.niaid.nih.gov
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Pauliuk; Stefan Pauliuk (2024). Industrial Ecology Data Commons (iedc) December 2024 update [Dataset]. http://doi.org/10.5281/zenodo.14217322
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14217322
Dataset updated
Nov 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stefan Pauliuk; Stefan Pauliuk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 25, 2024
Description
The Industrial Ecology Data Commons (iedc) is a database that contains more than 200 IE-related datasets from the literature, including stocks, flows, process descriptions, IO tables, material composition of products, and many more. Launched in 2018, the iedc is continuously improved and expanded.

The homepage of the project is https://www.database.industrialecology.uni-freiburg.de/

This Zenodo backup contains a .zip file with 156 parameter templates (xlsx), which where all uploaded to the iedc (SQL database) and are available online.

This backup is for archiving the intermediate step between raw data and uploaded data.

It contains all data that were gathered up to and including November 2024 except for those data that were uploaded directly via Pyhton scripts from other sources (like .csv) and not via the xlsx templates.
GTEx: DICOM converted whole slide hematoxylin and eosin stained images from...
zenodo.org
bin
Updated Sep 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton (2025). GTEx: DICOM converted whole slide hematoxylin and eosin stained images from the Genotype-Tissue Expression (GTEx) Project [Dataset]. http://doi.org/10.5281/zenodo.11099100
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11099100
Dataset updated
Sep 19, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: GTEx. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Genotype-Tissue Expression (GTEx) Project established a data resource and tissue bank to study the relationship between genetic variants and gene expression in multiple human tissues and across individuals. The project included contributions from numerous groups with diverse expertise in biospecimen collection and processing, pathology review, molecular analysis, and data management. The contributors are collectively called the GTEx Consortium.

GTEx collected a total of 26,468 unique tissue samples from 50+ different tissue types, from 956 healthy postmortem donors. The standardized biospecimen collection and analysis practices applied during the study served to minimize preanalytical variability associated with specimen-related factors and their potential impact on analytic endpoints. Each GTEx tissue was divided into two tissue blocks, one for histology and one for molecular analysis; both tissue blocks were preserved in PAXgene Tissue Fixative (Qiagen) solution for 6 to 24 hours, followed by PAXgene Tissue Stabilizer (Qiagen) as specified in the project-specific standard operating procedures. Tissue blocks were processed and embedded in paraffin at the GTEx central repository at the Van Andel Institute (MI) and hematoxylin and eosin–stained slides were generated from all GTEx donors. Digitally scanned whole slide images of PAXgene-fixed/stabilized, paraffin-embedded tissue sections were created using Aperio Scanscope software (Leica Biosystems). The digital images were then reviewed and annotated by one of four board-certified pathologists assigned to the GTEx study. There are a total of 25,503 digital histology images in the GTEx collection.

GTEx was supported by the NIH Common Fund (2010 – 2019). Additional resources include the GTEx Biobank, the GTEx Portal, and the full dataset at dbGaP (accession number phs000424).

Please refer to the listed GTEx publications below for more details [2-7].

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

gtex-idc_v19-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

gtex-idc_v19-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

gtex-idc_v19-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Please acknowledge the GTEx Consortium in any published work that includes the images. A sample statement for the acknowledgment of the Genotype-Tissue Expression (GTEx) Project dataset(s) follows.

The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI/Leidos Biomedical Research, Inc. subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the Broad Institute of MIT and Harvard. Biorepository operations were funded through a Leidos Biomedical Research, Inc. subcontract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc. (HHSN261200800001E). The Brain Bank was supported with supplements to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941& MH101814), the University of Chicago (MH090951, MH090937, MH101825, & MH101820), the University of North Carolina - Chapel Hill (MH090936), North Carolina State University (MH101819), Harvard University (MH090948), Stanford University (MH101782), Washington University (MH101810), and to the University of Pennsylvania (MH101822).

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

[2] Sobin, L., Barcus, M., Branton, P. A., Engel, K. B., Keen, J., Tabor, D., Ardlie, K. G., Greytak, S. R., Roche, N., Luke, B., Vaught, J., Guan, P. & Moore, H. M. Histologic and quality assessment of genotype-Tissue Expression (GTEx) research samples: A large postmortem tissue collection. Arch. Pathol. Lab. Med. (2024). doi:10.5858/arpa.2023-0467-OA

[3] GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

[4] GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

[5] GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

[6] Carithers, L. J., Ardlie, K., Barcus, M., Branton, P. A., Britton, A., Buia, S. A., Compton, C. C., DeLuca, D. S., Peter-Demchok, J., Gelfand, E. T., Guan, P., Korzeniewski, G. E., Lockhart, N. C., Rabiner, C. A., Rao, A. K., Robinson, K. L., Roche, N. V., Sawyer, S. J., Segrè, A. V., Shive, C. E., Smith, A. M., Sobin, L. H., Undale, A. H., Valentino, K. M., Vaught, J., Young, T. R., Moore, H. M. & GTEx Consortium. A novel approach to high-quality postmortem tissue procurement: The GTEx project. Biopreserv. Biobank. 13, 311–319 (2015).

[7] Branton, P. A., Sobin, L., Barcus, M., Engel, K. B., Greytak, S. R., Guan, P., Vaught, J. & Moore, H. M. Notable histologic findings in a ‘normal’ cohort: The National Institutes of Health Genotype-Tissue Expression (GTEx) project. Arch. Pathol. Lab. Med. (2024). doi:10.5858/arpa.2023-0468-OA
u
Data From: Habitat type and host grazing regimen influence the soil...
agdatacommons.nal.usda.gov
datasetcatalog.nlm.nih.gov
+2more
xlsx
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saraswoti Neupane; Travis Davis; Dana Nayduch; Bethany Mcgregor (2025). Data From: Habitat type and host grazing regimen influence the soil microbial diversity and communities within potential biting midge larval habitats [Dataset]. http://doi.org/10.15482/USDA.ADC/1528782
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1528782
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Saraswoti Neupane; Travis Davis; Dana Nayduch; Bethany Mcgregor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Culicoides biting midges are important vectors of diverse microbes such as viruses, protozoa, and nematodes that cause diseases in wild and domestic animals. However, little is known about the role of microbial communities in midge larval habitat utilization in the wild. In this study, we characterized microbial communities (bacterial, protistan, fungal and metazoan) in soils from disturbed (bison and cattle grazed) and undisturbed (non-grazed) pond and spring potential midge larval habitats. We evaluated the influence of habitat and grazing disturbance and their interaction on microbial communities, diversity, presence of midges, and soil properties. These data can be used to better understand environmental microbial communities in tallgrass prairie ecosystems associated with grazed versus ungrazed pond and spring habitats and to draw inferences on the interactions of these communities and soil properties with the presence of biting midge larvae. These data should not be used to make inferences for ecosystems other than tallgrass prairie, for animal management methods other than open cow-calf or bison grazing (such as feedlots, dairies, or stockyards), or for other grazing mammals (such as sheep or goats). These data were collected between the months of September and December and therefore are not representative of microbial communities present from January through August. Abbreviations used include Total Carbon (TC), Total Nitrogen (TN), Organic Matter (OM), Konza Prairie Biological Station (KPBS), Operational Taxonomic Unit (OTU), Principal Coordinates Analysis (PCoA), ribosomal RNA (rRNA), and vesicular stomatitis virus (VSV). The raw Illumina MiSeq sequence data for this project can be found here: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA862140 Resources in this dataset:

Resource Title: Metadata for Midge Larval Habitat Soil Microbiome File Name: Metadata for NCBI Accession PRJNA862140.xlsx Resource Description: This spreadsheet links the raw sequence reads on NCBI with data on the presence/absence of Culicoides midges and soil chemistry data (% total soil nitrogen, % total soil carbon, and % organic matter).
DICOM converted Slide Microscopy images for the TCGA-READ collection
zenodo.org
bin
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-READ collection [Dataset]. http://doi.org/10.5281/zenodo.12689999
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12689999
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-READ. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Cancer Genome Atlas-Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the genetically-analyzed READ cases.

Please see the TCGA-READ wiki page to learn more about the images and to obtain any supporting metadata for this collection.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

tcga_read-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

tcga_read-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

tcga_read-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
m
CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object)
data.mendeley.com
data.niaid.nih.gov
+3more
Updated Dec 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farah Zaib Khan (2018). CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object) [Dataset]. http://doi.org/10.17632/6wtpgr3kbj.1
Explore at:
Unique identifier
https://doi.org/10.17632/6wtpgr3kbj.1
Dataset updated
Dec 4, 2018
Authors
Farah Zaib Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The CWL alignment workflow included in this case study is designed by Data Biosphere. It adapts the alignment pipeline originally developed at Abecasis Lab, The University of Michigan. This workflow is part of NIH Data Commons initiative and comprises of four stages. First step, "Pre-align'' accepts a Compressed Alignment Map (CRAM) file (a compressed format for BAM files developed by European Bioinformatics Institute (EBI)) and human genome reference sequence as input and using underlying software utilities of SAMtools such as view, sort and fixmate returns a list of fastq files which can be used as input for the next step. The next step "Align'' also accepts the human reference genome as input along with the output files from "Pre-align'' and uses BWA-mem to generate aligned reads as BAM files. SAMBLASTER is used to mark duplicate reads and SAMtools view to convert read files from SAM to BAM format. The BAM files generated after "Align'' are sorted with "SAMtool sort''. Finally, these sorted alignment files are merged to produce single sorted BAM file using SAMtools merge in "Post-align'' step.

This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.6.0 or use https://pypi.org/project/cwlprov/ to explore
d
COVID information commons archive
search.dataone.org
data.niaid.nih.gov
+1more
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florence Hudson; Ryan Scherle; Lauren Close; Varalika Mahajan; Benjamin Sango; Helen Yang; Haleigh Stewart; Sven Johnson; Karl Ragnauth; Katie Naum; Rene Baston (2024). COVID information commons archive [Dataset]. http://doi.org/10.5061/dryad.37pvmcvqp
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.37pvmcvqp
Dataset updated
Aug 15, 2024
Dataset provided by
Dryad Digital Repository
Authors
Florence Hudson; Ryan Scherle; Lauren Close; Varalika Mahajan; Benjamin Sango; Helen Yang; Haleigh Stewart; Sven Johnson; Karl Ragnauth; Katie Naum; Rene Baston
Time period covered
Jan 1, 2023
Description
The COVID Information Commons (CIC) is an open website portal and community to facilitate knowledge-sharing and collaboration across various COVID research efforts, funded by theÂ NSF Convergence AcceleratorÂ and theÂ Â NSF Technology, Innovation and Partnerships Directorate. The CIC serves as an open resource for researchers, students, and decision-makers from academia, government, not-for-profits and industry to identify collaboration opportunities, to leverage each other's research findings, and to accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic. The CIC was developed as a collaborative proposal led by theÂ Northeast Big Data Innovation Hub, hosted by Columbia University, in collaboration with theÂ Midwest Big Data Innovation Hub,Â South Big Data Innovation Hub, andÂ West Big Data Innovation Hub.Â It was funded by the NSF Convergence Accelerator (NSF #2028999) in MayÂ 2020 and launched in July 2020.Â The initial focus of the CIC website ..., The NSF and NIH funded COVID related awards corpus in the CIC was collected primarily from NSF and NIH via APIs. Further information has been collected directly from researchers, who filled out an online form to enhance the descriptions.Â The dataset has been cleaned and enhanced by automated processing, using custom scripts to remove invalid characters, and standardize names of funding agency divisions., , # COVID Information Commons Archive

This archive is a snapshot of the COVID Information Commons (CIC). The CIC is a live database that records information about COVID-19 researchers and their projects.

Description of the data and file structure

The snapshot of the CIC contains the following files, each listed with a description of the fields it contains:

cic_people_export.json -- Researchers who have studied aspects of COVID-19. All information known about the researchers in CIC, except email addresses, which have been filtered out for privacy purposes. Some researchers have minimal information, as CIC may only know their name via a reference in a grant description. Other people have more complete records, if they have provided additional information to the CIC.

affiliations -- organizational affiliations of the researcher (as described for cic_orgs_export.json)

first_name -- researcher's first name

last_name -- researcher's last name

orcid -- researchers i...
c
The Cancer Genome Atlas Rectum Adenocarcinoma Collection
cancerimagingarchive.net
stage.cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
Pancrease CT Segmenatation
kaggle.com
zip
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandeesh H U (2025). Pancrease CT Segmenatation [Dataset]. https://www.kaggle.com/datasets/nandeeshhu/pancrease-ct-segmenatation
Explore at:
zip(1796888062 bytes)Available download formats
Dataset updated
Mar 18, 2025
Authors
Nandeesh H U
Description
This dataset contains 2D image slices extracted from the publicly available Pancreas-CT-SEG dataset, which provides manually segmented pancreas annotations for contrast-enhanced 3D abdominal CT scans. The original dataset was curated by the National Institutes of Health Clinical Center (NIH) and was made available through the NCI Imaging Data Commons (IDC). The dataset consists of 82 CT scans from 53 male and 27 female subjects, converted into 2D slices for segmentation tasks.

Dataset Details:

Modality: Contrast-enhanced CT (portal-venous phase, ~70s post-injection)

Number of Subjects: 82

Age Range: 18 to 76 years (Mean: 46.8 ± 16.7 years)

Scan Resolution: 512 × 512 pixels per slice

Slice Thickness: Varies between 1.5 mm and 2.5 mm

Scanners Used: Philips and Siemens MDCT scanners (120 kVp tube voltage)

Segmentation: Manually performed by a medical student and verified by an expert radiologist

Data Format: Converted from 3D DICOM/NIfTI to 2D PNG/JPEG slices for segmentation tasks

Total Dataset Size: ~1.85 GB

Category: Non-cancerous healthy controls (No pancreatic cancer lesions or major abdominal pathologies)

Preprocessing and Conversion:

The original 3D CT scans and corresponding pancreas segmentation masks (available in NIfTI format) were converted into 2D slices to facilitate 2D medical image segmentation tasks. The conversion steps include:

Extracting axial slices from each 3D CT scan.

Normalizing pixel intensities for consistency.

Saving images in PNG/JPEG format for compatibility with deep learning frameworks.

Generating corresponding binary segmentation masks where the pancreas region is labeled.

Dataset Structure:

Applications

This dataset is ideal for medical image segmentation tasks such as:

Deep learning-based pancreas segmentation (e.g., using U-Net, DeepLabV3+)

Automated organ detection and localization

AI-assisted diagnosis and analysis of abdominal CT scans

Acknowledgments & References

This dataset is derived from:

National Cancer Institute Imaging Data Commons (IDC) [1]

The Cancer Imaging Archive (TCIA) [2]

Original dataset DOI: https://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU

Citations: If you use this dataset, please cite the following:

Roth, H., Farag, A., Turkbey, E. B., Lu, L., Liu, J., & Summers, R. M. (2016). Data From Pancreas-CT (Version 2). The Cancer Imaging Archive. DOI: 10.7937/K9/TCIA.2016.tNB1kqBU

Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., et al. (2023). National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. Radiographics 43.

License: This dataset is provided under the Creative Commons Attribution 4.0 International (CC-BY-4.0) license. Users must abide by the TCIA Data Usage Policy and Restrictions.

Additional Resources: Imaging Data Commons (IDC) Portal: https://portal.imaging.datacommons.cancer.gov/explore/

OHIF DICOM Viewer: https://viewer.ohif.org/

This dataset provides a high-quality, well-annotated resource for researchers and developers working on medical image analysis, segmentation, and AI-based pancreas detection.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Library of Medicine (2025). NIH Common Data Elements Repository [Dataset]. https://catalog.data.gov/dataset/nih-common-data-elements-repository-f6b3a

NIH Common Data Elements Repository

Explore at:

Dataset updated

Jun 19, 2025

Dataset provided by

National Library of Medicine

Description

The NIH Common Data Elements (CDE) Repository has been designed to provide access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and for other purposes. Visit the NIH CDE Resource Portal for contextual information about the repository.

Clear search

Close search

Google apps

Main menu

NIH Common Data Elements Repository

Therapeutics Data Commons (https://tdcommons.ai)

Historical NCI Genomic Data Commons data (09-14-2017)

Common Metadata Elements for Cataloging Biomedical Datasets

NIH Common Data Elements Repository - ic3x-2s7m - Archive Repository

Data from: Uncommon Commons? Creative Commons licencing in Horizon 2020 Data...

Data from: Microplitis demolitor Official Gene Set micdem_OGSv1.0

Blog | Common Credits Model

Research Software Communities Global South

Data from: Bacterial communities and prevalence of antibiotic resistance...

Data from: Global scientific research commons under the Nagoya Protocol:...

Common data operations expressed as MLMs.

Industrial Ecology Data Commons (iedc) December 2024 update

GTEx: DICOM converted whole slide hematoxylin and eosin stained images from...

Collection description

Files included

Download instructions

Acknowledgments

References

Data From: Habitat type and host grazing regimen influence the soil...

DICOM converted Slide Microscopy images for the TCGA-READ collection

Collection description

Files included

Download instructions

Acknowledgments

References

CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object)

COVID information commons archive

Description of the data and file structure

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

CIP TCGA Radiology Initiative

Pancrease CT Segmenatation

NIH Common Data Elements Repository