90 datasets found
  1. d

    NIH Common Data Elements Repository

    • catalog.data.gov
    • datadiscovery.nlm.nih.gov
    • +4more
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). NIH Common Data Elements Repository [Dataset]. https://catalog.data.gov/dataset/nih-common-data-elements-repository-f6b3a
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    The NIH Common Data Elements (CDE) Repository has been designed to provide access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and for other purposes. Visit the NIH CDE Resource Portal for contextual information about the repository.

  2. H

    Therapeutics Data Commons (https://tdcommons.ai)

    • datasetcatalog.nlm.nih.gov
    • dataverse.harvard.edu
    • +1more
    Updated Oct 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kexin Huang Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik, Tianfan Fu (2020). Therapeutics Data Commons (https://tdcommons.ai) [Dataset]. http://doi.org/10.7910/DVN/21LKWG
    Explore at:
    Dataset updated
    Oct 14, 2020
    Authors
    Kexin Huang Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik, Tianfan Fu
    Description

    Therapeutics Data Commons (TDC) is an open-science initiative started at Harvard with AI/ML-ready datasets and ML tasks for therapeutics. It provides an ecosystem of tools, leaderboards, and community resources, including data functions, model benchmarking and comparison strategies, meaningful data splits, data processors, public leaderboards, and molecule generation oracles. All resources are integrated and accessible via an open Python library. TDC is available at https://tdcommons.ai.

  3. Historical NCI Genomic Data Commons data (09-14-2017)

    • zenodo.org
    • data.niaid.nih.gov
    tsv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Inge Seim; Inge Seim
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

    TCGA-COAD.GDC_phenotype.tsv

    dataset: phenotype - Phenotype

    cohortGDC TCGA Colon Cancer (COAD)
    dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
    downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
    samples570
    version11-27-2017
    hubhttps://gdc.xenahubs.net
    type of dataphenotype
    authorGenomic Data Commons
    raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
    raw datahttps://api.gdc.cancer.gov/data/
    input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
    570 samples X 151 identifiersAll IdentifiersAll Samples

    TCGA-COAD.htseq_fpkm-uq.tsv

    dataset: gene expression RNAseq - HTSeq - FPKM-UQ

    cohortGDC TCGA Colon Cancer (COAD)
    dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
    downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
    samples512
    version09-14-2017
    hubhttps://gdc.xenahubs.net
    type of datagene expression RNAseq
    unitlog2(fpkm-uq+1)
    platformIllumina
    ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
    authorGenomic Data Commons
    raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
    raw datahttps://api.gdc.cancer.gov/data/
    wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
    input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
    60,484 identifiers X 512 samples

  4. Common Metadata Elements for Cataloging Biomedical Datasets

    • figshare.com
    xlsx
    Updated Jan 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Read (2016). Common Metadata Elements for Cataloging Biomedical Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1496573.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 20, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Kevin Read
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.

  5. NIH Common Data Elements Repository - ic3x-2s7m - Archive Repository

    • healthdata.gov
    csv, xlsx, xml
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NIH Common Data Elements Repository - ic3x-2s7m - Archive Repository [Dataset]. https://healthdata.gov/w/9rjf-x4nc/default?cur=wG4qu23M_S7&from=2KYT7QcwQ96
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Jul 16, 2025
    Description

    This dataset tracks the updates made on the dataset "NIH Common Data Elements Repository" as a repository for previous versions of the data and metadata.

  6. Z

    Data from: Uncommon Commons? Creative Commons licencing in Horizon 2020 Data...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Spichtinger (2022). Uncommon Commons? Creative Commons licencing in Horizon 2020 Data Management Plans [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6685130
    Explore at:
    Dataset updated
    Jun 22, 2022
    Dataset provided by
    independent researcher / Ludwig Boltzmann Gesellschaft
    Authors
    Daniel Spichtinger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As policies, good practices and funder mandates on research data management evolve, more emphasis has been put on the licencing of data. Licencing information allow potential re-users to quickly identify what they can do with the data in question and is therefore an important component to ensure the reusability of research.

    In my research I analyse a pre-existing collection of 840 Horizon 2020 public data management plans (DMPs) available on the repository of the University of Vienna, Phaidra,, to determine which ones mention creative commons licences and among those who do, what licences are being used.

    This excel file contains the data underlying the publication "Uncommon Commons? Creative Commons licencing in Horizon 2020 Data Management Plans ".

    Sheet 1 contains the data collected in the previous "Data Re-Use" project: 840 DMPs downloaded from CORDIS and vetted to ensure they are public documents and not copyrighted

    Sheet 2 contains the same data as sheet 1, with columns D to Q not visible (for better reading) but an added column R which now contains the CC licening information (where available)

    Sheet 3 is filtered so that only the projects containing CC BY relevant licencing are shown

    Sheet 4 is filtered so that only the projects containing CC-BY-SA relevant licencing are shown

    Sheet 5 is filtered so that only the projects containing CC-BY-NC relevant licencing are shown

    Sheet 6 is filtered so that only the projects containing CC-BY-ND relevant licencing are shown

    Sheet 7 is filtered so that only the projects containing Cc-BY-NC-ND relevant licencing are shown

    Sheet 8 is filtered so that only the projects containing CC-BY-NC-SA relevant licencing are shown

    Sheet 9 is filtered so that only the projects containing CC0 relevant information are shown

    Sheet 10 provides an overview table of the relevant licences (manual entry)

    Sheet 11 and 12 contain graphic visulations of the data as used in the article

  7. u

    Data from: Microplitis demolitor Official Gene Set micdem_OGSv1.0

    • agdatacommons.nal.usda.gov
    application/x-gzip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelly Tims; Gaelen Burke (2025). Microplitis demolitor Official Gene Set micdem_OGSv1.0 [Dataset]. http://doi.org/10.15482/USDA.ADC/1521095
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Kelly Tims; Gaelen Burke
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Microplitis demolitor (Hymenoptera: Braconidae) is a parasitoid used as a biological control agent to control larval-stage Lepidoptera and serves as a model for studying the function and evolution of symbiotic viruses in the genus Bracovirus. This dataset presents the Microplitis demolitor Official Gene Set (OGS) v1.0. The OGS is an integration of automatic gene predictions from Microplitis demolitor genome annotations NCBI-RefSeq's gene set NCBI Microplitis demolitor Annotation Release 101 (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Microplitis_demolitor/101/), with manual annotations by the research community, performed via the Apollo manual curation software (https://zenodo.org/record/1295754#.YDgLyJNKivg). Manual annotations were QC'd via the GFF3toolkit (https://github.com/NAL-i5K/gff3toolkit) and NCBI's table2asn_GFF software (https://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/table2asn_GFF/), and merged with NCBI Microplitis demolitor Annotation Release 101 via the GFF3toolkit (https://github.com/NAL-i5K/gff3toolkit). Resources in this dataset:Resource Title: Microplitis demolitor Official Gene Set micdem_OGSv1.0. File Name: micdem_OGSv1.0.tar.gzResource Description: This directory contains files for the Official Gene Set 1.0 for Microplitis demolitor (micdem_OGSv1.0). The general procedure for generating this OGS is outlined here: https://github.com/NAL-i5K/GFF3toolkit/. QC of community-curated models from the Apollo software was performed by NAL staff using the GFF3toolkit function gff3_QC, and errors were fixed using gff3_fix. OGSv1.0 was generated by merging NCBI-RefSeq's gene set NCBI Microplitis demolitor Annotation Release 101 (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Microplitis_demolitor/101/) with the QC'd and error-corrected community-curated models, and generating i5k Workspace IDs for all manually annotated features.

    1) Fasta files - Protein Sequences: micdem_OGSv1.0_pep.fa - Coding Sequences (CDS): micdem_OGSv1.0_CDS.fa - Transcript Sequences (includes non-coding sequence): micdem_OGSv1.0_trans.fa

    2) Gff3 file: micdem_OGSv1.0.gff

    3) Mapping file between Gene set NCBI Microplitis demolitor Annotation Release 101 and OGSv1.0: ID_map_report.txt

  8. d

    Blog | Common Credits Model

    • catalog.data.gov
    • datasets.ai
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Health (2025). Blog | Common Credits Model [Dataset]. https://catalog.data.gov/dataset/blog-common-credits-model
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    National Institute of Health
    Description

    This blog post was posted on November 13, 2015 and was written. by. George Komatsoulis. It is a cross-post from the NIH's Data Science blog - https://datascience.nih.gov/blog.

  9. Z

    Research Software Communities Global South

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Oct 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martinez, Paula Andrea (2022). Research Software Communities Global South [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7179806
    Explore at:
    Dataset updated
    Oct 11, 2022
    Dataset provided by
    Australian Research Data Commons
    Authors
    Martinez, Paula Andrea
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Research Software Alliance's (ReSA) mission is to bring research software communities together to collaborate on the advancement of research software. Given the ReSA mission, it is important to understand the landscape of communities involved with research software. In 2020, ReSA completed an initial exercise to scope the international research software community landscape. This work was reported by ReSA's Software Landscape Analysis task force via a blog post. The majority of the communities in the previous analysis represented the global north. To improve the extent of this landscape analysis, ReSA announced a paid opportunity for short-term contractors located in the global south to collect data on communities and funders in their region in early 2022. This document describes how the work was undertaken, a summary of findings, the gaps and opportunities perceived by the data collectors and some highlights. This work identified 126 organisations and communities and 62 funder bodies that support research software in the global south. Their main activities are connecting people, training, and networking, and support through research grants.

    To add to this communities list please fill in the following form https://forms.gle/KJE9vkBnM6vhh7cEA

  10. u

    Data from: Bacterial communities and prevalence of antibiotic resistance...

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +2more
    xlsx
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saraswoti Neupane; Justin L. Talley; David B. Taylor; Dana Nayduch (2025). Data from: Bacterial communities and prevalence of antibiotic resistance genes carried within house flies (Diptera: Muscidae) associated with beef and dairy cattle farms [Dataset]. http://doi.org/10.15482/USDA.ADC/1529546
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Saraswoti Neupane; Justin L. Talley; David B. Taylor; Dana Nayduch
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    House flies (Musca domestica L.) are vectors of human and animal pathogens at livestock operations. Microbial communities in flies are acquired from, and correlate with, their local environment. However, variation among microbial communities carried by flies from farms in different geographical areas is not well understood. We characterized bacterial communities of female house flies collected from beef and dairy farms in Oklahoma, Kansas, and Nebraska and further evaluated the prevalence of antibiotic resistance genes in bacteria within flies. We evaluated the influence of farm type and farm location on bacterial communities, diversity, pathogenic bacteria strains and prevalence of antibiotic resistance genes. These data can be used for better understanding of abundance and prevalence of bacterial communities in house flies associated with livestock operations. These data were collected in September 2019. Abbreviations used include Operational Taxonomic Units(OTUs), Canonical Correspondence analysis (CCA), Infectious Bovine Keratoconjunctivitis (IBK), Anti Microbial Resistance (AMR), and Antibiotic Resistance Genes (ARGs).
    The raw Illumina MiSeq sequence data for this project can be found here: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA863664 Resources in this dataset:

    Resource title: Metadata for Microbiome of House Fly Associated with Cattle Farms File name: Metadata for Microbiome of House Fly Associated with Cattle Farms.xlsx Resource description: This spreadsheet links the raw sequence reads on NCBI with data on farm type, farm location and sample type.

  11. Z

    Data from: Global scientific research commons under the Nagoya Protocol:...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dedeurwaerdere, Tom (2024). Global scientific research commons under the Nagoya Protocol: Towards a collaborative economy model for the sharing of basic research assets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_47397
    Explore at:
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Université catholique de Louvain
    Authors
    Dedeurwaerdere, Tom
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This paper aims to get a better understanding of the motivational and transaction cost features of building global scientific research commons, with a view to contributing to the debate on the design of appropriate policy measures under the recently adopted Nagoya Protocol. For this purpose, the paper analyses the results of a world-wide survey of managers and users of microbial culture collections, which focused on the role of social and internalized motivations, organizational networks and external incentives in promoting the public availability of upstream research assets. Overall, the study confirms the hypotheses of the social production model of information and shareable goods, but it also shows the need to complete this model. For the sharing of materials, the underlying collaborative economy in excess capacity plays a key role in addition to the social production, while for data, competitive pressures amongst scientists tend to play a bigger role.

  12. f

    Common data operations expressed as MLMs.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Dec 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nehorai, Arye; La Rosa, Patricio S.; Cawi, Eric (2019). Common data operations expressed as MLMs. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000127140
    Explore at:
    Dataset updated
    Dec 2, 2019
    Authors
    Nehorai, Arye; La Rosa, Patricio S.; Cawi, Eric
    Description

    Common data operations expressed as MLMs.

  13. Industrial Ecology Data Commons (iedc) December 2024 update

    • zenodo.org
    • data.niaid.nih.gov
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Pauliuk; Stefan Pauliuk (2024). Industrial Ecology Data Commons (iedc) December 2024 update [Dataset]. http://doi.org/10.5281/zenodo.14217322
    Explore at:
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stefan Pauliuk; Stefan Pauliuk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 25, 2024
    Description

    The Industrial Ecology Data Commons (iedc) is a database that contains more than 200 IE-related datasets from the literature, including stocks, flows, process descriptions, IO tables, material composition of products, and many more. Launched in 2018, the iedc is continuously improved and expanded.

    The homepage of the project is https://www.database.industrialecology.uni-freiburg.de/

    This Zenodo backup contains a .zip file with 156 parameter templates (xlsx), which where all uploaded to the iedc (SQL database) and are available online.

    This backup is for archiving the intermediate step between raw data and uploaded data.

    It contains all data that were gathered up to and including November 2024 except for those data that were uploaded directly via Pyhton scripts from other sources (like .csv) and not via the xlsx templates.

  14. GTEx: DICOM converted whole slide hematoxylin and eosin stained images from...

    • zenodo.org
    bin
    Updated Sep 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton (2025). GTEx: DICOM converted whole slide hematoxylin and eosin stained images from the Genotype-Tissue Expression (GTEx) Project [Dataset]. http://doi.org/10.5281/zenodo.11099100
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: GTEx. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Genotype-Tissue Expression (GTEx) Project established a data resource and tissue bank to study the relationship between genetic variants and gene expression in multiple human tissues and across individuals. The project included contributions from numerous groups with diverse expertise in biospecimen collection and processing, pathology review, molecular analysis, and data management. The contributors are collectively called the GTEx Consortium.

    GTEx collected a total of 26,468 unique tissue samples from 50+ different tissue types, from 956 healthy postmortem donors. The standardized biospecimen collection and analysis practices applied during the study served to minimize preanalytical variability associated with specimen-related factors and their potential impact on analytic endpoints. Each GTEx tissue was divided into two tissue blocks, one for histology and one for molecular analysis; both tissue blocks were preserved in PAXgene Tissue Fixative (Qiagen) solution for 6 to 24 hours, followed by PAXgene Tissue Stabilizer (Qiagen) as specified in the project-specific standard operating procedures. Tissue blocks were processed and embedded in paraffin at the GTEx central repository at the Van Andel Institute (MI) and hematoxylin and eosin–stained slides were generated from all GTEx donors. Digitally scanned whole slide images of PAXgene-fixed/stabilized, paraffin-embedded tissue sections were created using Aperio Scanscope software (Leica Biosystems). The digital images were then reviewed and annotated by one of four board-certified pathologists assigned to the GTEx study. There are a total of 25,503 digital histology images in the GTEx collection.

    GTEx was supported by the NIH Common Fund (2010 – 2019). Additional resources include the GTEx Biobank, the GTEx Portal, and the full dataset at dbGaP (accession number phs000424).

    Please refer to the listed GTEx publications below for more details [2-7].

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. gtex-idc_v19-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. gtex-idc_v19-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. gtex-idc_v19-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Please acknowledge the GTEx Consortium in any published work that includes the images. A sample statement for the acknowledgment of the Genotype-Tissue Expression (GTEx) Project dataset(s) follows.

    The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI/Leidos Biomedical Research, Inc. subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the Broad Institute of MIT and Harvard. Biorepository operations were funded through a Leidos Biomedical Research, Inc. subcontract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc. (HHSN261200800001E). The Brain Bank was supported with supplements to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941& MH101814), the University of Chicago (MH090951, MH090937, MH101825, & MH101820), the University of North Carolina - Chapel Hill (MH090936), North Carolina State University (MH101819), Harvard University (MH090948), Stanford University (MH101782), Washington University (MH101810), and to the University of Pennsylvania (MH101822).

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

    [2] Sobin, L., Barcus, M., Branton, P. A., Engel, K. B., Keen, J., Tabor, D., Ardlie, K. G., Greytak, S. R., Roche, N., Luke, B., Vaught, J., Guan, P. & Moore, H. M. Histologic and quality assessment of genotype-Tissue Expression (GTEx) research samples: A large postmortem tissue collection. Arch. Pathol. Lab. Med. (2024). doi:10.5858/arpa.2023-0467-OA

    [3] GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    [4] GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    [5] GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    [6] Carithers, L. J., Ardlie, K., Barcus, M., Branton, P. A., Britton, A., Buia, S. A., Compton, C. C., DeLuca, D. S., Peter-Demchok, J., Gelfand, E. T., Guan, P., Korzeniewski, G. E., Lockhart, N. C., Rabiner, C. A., Rao, A. K., Robinson, K. L., Roche, N. V., Sawyer, S. J., Segrè, A. V., Shive, C. E., Smith, A. M., Sobin, L. H., Undale, A. H., Valentino, K. M., Vaught, J., Young, T. R., Moore, H. M. & GTEx Consortium. A novel approach to high-quality postmortem tissue procurement: The GTEx project. Biopreserv. Biobank. 13, 311–319 (2015).

    [7] Branton, P. A., Sobin, L., Barcus, M., Engel, K. B., Greytak, S. R., Guan, P., Vaught, J. & Moore, H. M. Notable histologic findings in a ‘normal’ cohort: The National Institutes of Health Genotype-Tissue Expression (GTEx) project. Arch. Pathol. Lab. Med. (2024). doi:10.5858/arpa.2023-0468-OA

  15. u

    Data From: Habitat type and host grazing regimen influence the soil...

    • agdatacommons.nal.usda.gov
    • datasetcatalog.nlm.nih.gov
    • +2more
    xlsx
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saraswoti Neupane; Travis Davis; Dana Nayduch; Bethany Mcgregor (2025). Data From: Habitat type and host grazing regimen influence the soil microbial diversity and communities within potential biting midge larval habitats [Dataset]. http://doi.org/10.15482/USDA.ADC/1528782
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Saraswoti Neupane; Travis Davis; Dana Nayduch; Bethany Mcgregor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Culicoides biting midges are important vectors of diverse microbes such as viruses, protozoa, and nematodes that cause diseases in wild and domestic animals. However, little is known about the role of microbial communities in midge larval habitat utilization in the wild. In this study, we characterized microbial communities (bacterial, protistan, fungal and metazoan) in soils from disturbed (bison and cattle grazed) and undisturbed (non-grazed) pond and spring potential midge larval habitats. We evaluated the influence of habitat and grazing disturbance and their interaction on microbial communities, diversity, presence of midges, and soil properties. These data can be used to better understand environmental microbial communities in tallgrass prairie ecosystems associated with grazed versus ungrazed pond and spring habitats and to draw inferences on the interactions of these communities and soil properties with the presence of biting midge larvae. These data should not be used to make inferences for ecosystems other than tallgrass prairie, for animal management methods other than open cow-calf or bison grazing (such as feedlots, dairies, or stockyards), or for other grazing mammals (such as sheep or goats). These data were collected between the months of September and December and therefore are not representative of microbial communities present from January through August. Abbreviations used include Total Carbon (TC), Total Nitrogen (TN), Organic Matter (OM), Konza Prairie Biological Station (KPBS), Operational Taxonomic Unit (OTU), Principal Coordinates Analysis (PCoA), ribosomal RNA (rRNA), and vesicular stomatitis virus (VSV). The raw Illumina MiSeq sequence data for this project can be found here: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA862140 Resources in this dataset:

    Resource Title: Metadata for Midge Larval Habitat Soil Microbiome File Name: Metadata for NCBI Accession PRJNA862140.xlsx Resource Description: This spreadsheet links the raw sequence reads on NCBI with data on the presence/absence of Culicoides midges and soil chemistry data (% total soil nitrogen, % total soil carbon, and % organic matter).

  16. DICOM converted Slide Microscopy images for the TCGA-READ collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-READ collection [Dataset]. http://doi.org/10.5281/zenodo.12689999
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-READ. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Cancer Genome Atlas-Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the genetically-analyzed READ cases.


    Please see the TCGA-READ wiki page to learn more about the images and to obtain any supporting metadata for this collection.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_read-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_read-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_read-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  17. m

    CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object)

    • data.mendeley.com
    • data.niaid.nih.gov
    • +3more
    Updated Dec 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farah Zaib Khan (2018). CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object) [Dataset]. http://doi.org/10.17632/6wtpgr3kbj.1
    Explore at:
    Dataset updated
    Dec 4, 2018
    Authors
    Farah Zaib Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CWL alignment workflow included in this case study is designed by Data Biosphere. It adapts the alignment pipeline originally developed at Abecasis Lab, The University of Michigan. This workflow is part of NIH Data Commons initiative and comprises of four stages. First step, "Pre-align'' accepts a Compressed Alignment Map (CRAM) file (a compressed format for BAM files developed by European Bioinformatics Institute (EBI)) and human genome reference sequence as input and using underlying software utilities of SAMtools such as view, sort and fixmate returns a list of fastq files which can be used as input for the next step. The next step "Align'' also accepts the human reference genome as input along with the output files from "Pre-align'' and uses BWA-mem to generate aligned reads as BAM files. SAMBLASTER is used to mark duplicate reads and SAMtools view to convert read files from SAM to BAM format. The BAM files generated after "Align'' are sorted with "SAMtool sort''. Finally, these sorted alignment files are merged to produce single sorted BAM file using SAMtools merge in "Post-align'' step.

    This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.6.0 or use https://pypi.org/project/cwlprov/ to explore

  18. d

    COVID information commons archive

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florence Hudson; Ryan Scherle; Lauren Close; Varalika Mahajan; Benjamin Sango; Helen Yang; Haleigh Stewart; Sven Johnson; Karl Ragnauth; Katie Naum; Rene Baston (2024). COVID information commons archive [Dataset]. http://doi.org/10.5061/dryad.37pvmcvqp
    Explore at:
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Florence Hudson; Ryan Scherle; Lauren Close; Varalika Mahajan; Benjamin Sango; Helen Yang; Haleigh Stewart; Sven Johnson; Karl Ragnauth; Katie Naum; Rene Baston
    Time period covered
    Jan 1, 2023
    Description

    The COVID Information Commons (CIC) is an open website portal and community to facilitate knowledge-sharing and collaboration across various COVID research efforts, funded by the NSF Convergence Accelerator and the  NSF Technology, Innovation and Partnerships Directorate. The CIC serves as an open resource for researchers, students, and decision-makers from academia, government, not-for-profits and industry to identify collaboration opportunities, to leverage each other's research findings, and to accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic. The CIC was developed as a collaborative proposal led by the Northeast Big Data Innovation Hub, hosted by Columbia University, in collaboration with the Midwest Big Data Innovation Hub, South Big Data Innovation Hub, and West Big Data Innovation Hub. It was funded by the NSF Convergence Accelerator (NSF #2028999) in May 2020 and launched in July 2020. The initial focus of the CIC website ..., The NSF and NIH funded COVID related awards corpus in the CIC was collected primarily from NSF and NIH via APIs. Further information has been collected directly from researchers, who filled out an online form to enhance the descriptions. The dataset has been cleaned and enhanced by automated processing, using custom scripts to remove invalid characters, and standardize names of funding agency divisions., , # COVID Information Commons Archive

    This archive is a snapshot of the COVID Information Commons (CIC). The CIC is a live database that records information about COVID-19 researchers and their projects.

    Description of the data and file structure

    The snapshot of the CIC contains the following files, each listed with a description of the fields it contains:

    cic_people_export.json -- Researchers who have studied aspects of COVID-19. All information known about the researchers in CIC, except email addresses, which have been filtered out for privacy purposes. Some researchers have minimal information, as CIC may only know their name via a reference in a grant description. Other people have more complete records, if they have provided additional information to the CIC.

    • affiliations -- organizational affiliations of the researcher (as described for cic_orgs_export.json)
    • first_name -- researcher's first name
    • last_name -- researcher's last name
    • orcid -- researchers i...
  19. c

    The Cancer Genome Atlas Rectum Adenocarcinoma Collection

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  20. Pancrease CT Segmenatation

    • kaggle.com
    zip
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandeesh H U (2025). Pancrease CT Segmenatation [Dataset]. https://www.kaggle.com/datasets/nandeeshhu/pancrease-ct-segmenatation
    Explore at:
    zip(1796888062 bytes)Available download formats
    Dataset updated
    Mar 18, 2025
    Authors
    Nandeesh H U
    Description

    This dataset contains 2D image slices extracted from the publicly available Pancreas-CT-SEG dataset, which provides manually segmented pancreas annotations for contrast-enhanced 3D abdominal CT scans. The original dataset was curated by the National Institutes of Health Clinical Center (NIH) and was made available through the NCI Imaging Data Commons (IDC). The dataset consists of 82 CT scans from 53 male and 27 female subjects, converted into 2D slices for segmentation tasks.

    Dataset Details:

    Modality: Contrast-enhanced CT (portal-venous phase, ~70s post-injection)

    Number of Subjects: 82

    Age Range: 18 to 76 years (Mean: 46.8 ± 16.7 years)

    Scan Resolution: 512 × 512 pixels per slice

    Slice Thickness: Varies between 1.5 mm and 2.5 mm

    Scanners Used: Philips and Siemens MDCT scanners (120 kVp tube voltage)

    Segmentation: Manually performed by a medical student and verified by an expert radiologist

    Data Format: Converted from 3D DICOM/NIfTI to 2D PNG/JPEG slices for segmentation tasks

    Total Dataset Size: ~1.85 GB

    Category: Non-cancerous healthy controls (No pancreatic cancer lesions or major abdominal pathologies)

    Preprocessing and Conversion:

    The original 3D CT scans and corresponding pancreas segmentation masks (available in NIfTI format) were converted into 2D slices to facilitate 2D medical image segmentation tasks. The conversion steps include:

    Extracting axial slices from each 3D CT scan.

    Normalizing pixel intensities for consistency.

    Saving images in PNG/JPEG format for compatibility with deep learning frameworks.

    Generating corresponding binary segmentation masks where the pancreas region is labeled.

    Dataset Structure:

    Applications

    This dataset is ideal for medical image segmentation tasks such as:

    Deep learning-based pancreas segmentation (e.g., using U-Net, DeepLabV3+)

    Automated organ detection and localization

    AI-assisted diagnosis and analysis of abdominal CT scans

    Acknowledgments & References

    This dataset is derived from:

    National Cancer Institute Imaging Data Commons (IDC) [1]

    The Cancer Imaging Archive (TCIA) [2]

    Original dataset DOI: https://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU

    Citations: If you use this dataset, please cite the following:

    Roth, H., Farag, A., Turkbey, E. B., Lu, L., Liu, J., & Summers, R. M. (2016). Data From Pancreas-CT (Version 2). The Cancer Imaging Archive. DOI: 10.7937/K9/TCIA.2016.tNB1kqBU

    Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., et al. (2023). National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. Radiographics 43.

    License: This dataset is provided under the Creative Commons Attribution 4.0 International (CC-BY-4.0) license. Users must abide by the TCIA Data Usage Policy and Restrictions.

    Additional Resources: Imaging Data Commons (IDC) Portal: https://portal.imaging.datacommons.cancer.gov/explore/

    OHIF DICOM Viewer: https://viewer.ohif.org/

    This dataset provides a high-quality, well-annotated resource for researchers and developers working on medical image analysis, segmentation, and AI-based pancreas detection.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Library of Medicine (2025). NIH Common Data Elements Repository [Dataset]. https://catalog.data.gov/dataset/nih-common-data-elements-repository-f6b3a

NIH Common Data Elements Repository

Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description

The NIH Common Data Elements (CDE) Repository has been designed to provide access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and for other purposes. Visit the NIH CDE Resource Portal for contextual information about the repository.

Search
Clear search
Close search
Google apps
Main menu