100+ datasets found
  1. Processed GTEx v8 data

    • figshare.com
    zip
    Updated Apr 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramon Vinas Torne (2023). Processed GTEx v8 data [Dataset]. http://doi.org/10.6084/m9.figshare.22650763.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 18, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Ramon Vinas Torne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The GTEx dataset is a public resource that has generated a broad collection of gene expression data collected from a diverse set of human tissues. Here we share the processed GTEx data used in Hypergraph factorisation for multi-tissue gene expression imputation (Vinas Torne et al., 2023). We processed the data following the GTEx eQTL discovery pipeline.

    If you use this data for your research, please cite the GTEx consortium paper: GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. DOI: 10.1126/science.aaz1776

  2. r

    GTEx eQTL Browser

    • rrid.site
    • scicrunch.org
    • +2more
    Updated Aug 17, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). GTEx eQTL Browser [Dataset]. http://identifiers.org/RRID:SCR_001618
    Explore at:
    Dataset updated
    Aug 17, 2013
    Description

    Database and browser that provides a central resource to archive and display association between genetic variation and high-throughput molecular-level phenotypes. This effort originated with the NIH GTEx roadmap project: however the scope of this resource will be extended to include any available genotype/molecular phenotype datasets.

  3. r

    Genotype-Tissue Expression

    • rrid.site
    • neuinfo.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Genotype-Tissue Expression [Dataset]. http://identifiers.org/RRID:SCR_013042
    Explore at:
    Description

    Project to study human gene expression and regulation in multiple tissues, providing valuable insights into mechanisms of gene regulation and its disease related perturbations. Genetic variation between individuals will be examined for correlation with differences in gene expression level to identify regions of the genome that influence whether and how much a gene is expressed. Includes initiatives: Novel Statistical Methods for Human Gene Expression Quantitative Trait Loci (eQTL) Analysis ,Laboratory, Data Analysis, and Coordinating Center (LDACC), caHUB Acquisition of Normal Tissues in Support of GTEx Project.

  4. GTEx: DICOM converted whole slide hematoxylin and eosin stained images from...

    • zenodo.org
    bin
    Updated Sep 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton (2025). GTEx: DICOM converted whole slide hematoxylin and eosin stained images from the Genotype-Tissue Expression (GTEx) Project [Dataset]. http://doi.org/10.5281/zenodo.11099100
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Erika Kim; Granger Sutton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: GTEx. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Genotype-Tissue Expression (GTEx) Project established a data resource and tissue bank to study the relationship between genetic variants and gene expression in multiple human tissues and across individuals. The project included contributions from numerous groups with diverse expertise in biospecimen collection and processing, pathology review, molecular analysis, and data management. The contributors are collectively called the GTEx Consortium.

    GTEx collected a total of 26,468 unique tissue samples from 50+ different tissue types, from 956 healthy postmortem donors. The standardized biospecimen collection and analysis practices applied during the study served to minimize preanalytical variability associated with specimen-related factors and their potential impact on analytic endpoints. Each GTEx tissue was divided into two tissue blocks, one for histology and one for molecular analysis; both tissue blocks were preserved in PAXgene Tissue Fixative (Qiagen) solution for 6 to 24 hours, followed by PAXgene Tissue Stabilizer (Qiagen) as specified in the project-specific standard operating procedures. Tissue blocks were processed and embedded in paraffin at the GTEx central repository at the Van Andel Institute (MI) and hematoxylin and eosin–stained slides were generated from all GTEx donors. Digitally scanned whole slide images of PAXgene-fixed/stabilized, paraffin-embedded tissue sections were created using Aperio Scanscope software (Leica Biosystems). The digital images were then reviewed and annotated by one of four board-certified pathologists assigned to the GTEx study. There are a total of 25,503 digital histology images in the GTEx collection.

    GTEx was supported by the NIH Common Fund (2010 – 2019). Additional resources include the GTEx Biobank, the GTEx Portal, and the full dataset at dbGaP (accession number phs000424).

    Please refer to the listed GTEx publications below for more details [2-7].

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. gtex-idc_v19-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. gtex-idc_v19-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. gtex-idc_v19-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Please acknowledge the GTEx Consortium in any published work that includes the images. A sample statement for the acknowledgment of the Genotype-Tissue Expression (GTEx) Project dataset(s) follows.

    The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI/Leidos Biomedical Research, Inc. subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the Broad Institute of MIT and Harvard. Biorepository operations were funded through a Leidos Biomedical Research, Inc. subcontract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc. (HHSN261200800001E). The Brain Bank was supported with supplements to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941& MH101814), the University of Chicago (MH090951, MH090937, MH101825, & MH101820), the University of North Carolina - Chapel Hill (MH090936), North Carolina State University (MH101819), Harvard University (MH090948), Stanford University (MH101782), Washington University (MH101810), and to the University of Pennsylvania (MH101822).

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

    [2] Sobin, L., Barcus, M., Branton, P. A., Engel, K. B., Keen, J., Tabor, D., Ardlie, K. G., Greytak, S. R., Roche, N., Luke, B., Vaught, J., Guan, P. & Moore, H. M. Histologic and quality assessment of genotype-Tissue Expression (GTEx) research samples: A large postmortem tissue collection. Arch. Pathol. Lab. Med. (2024). doi:10.5858/arpa.2023-0467-OA

    [3] GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    [4] GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    [5] GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    [6] Carithers, L. J., Ardlie, K., Barcus, M., Branton, P. A., Britton, A., Buia, S. A., Compton, C. C., DeLuca, D. S., Peter-Demchok, J., Gelfand, E. T., Guan, P., Korzeniewski, G. E., Lockhart, N. C., Rabiner, C. A., Rao, A. K., Robinson, K. L., Roche, N. V., Sawyer, S. J., Segrè, A. V., Shive, C. E., Smith, A. M., Sobin, L. H., Undale, A. H., Valentino, K. M., Vaught, J., Young, T. R., Moore, H. M. & GTEx Consortium. A novel approach to high-quality postmortem tissue procurement: The GTEx project. Biopreserv. Biobank. 13, 311–319 (2015).

    [7] Branton, P. A., Sobin, L., Barcus, M., Engel, K. B., Greytak, S. R., Guan, P., Vaught, J. & Moore, H. M. Notable histologic findings in a ‘normal’ cohort: The National Institutes of Health Genotype-Tissue Expression (GTEx) project. Arch. Pathol. Lab. Med. (2024). doi:10.5858/arpa.2023-0468-OA

  5. Gene expression and splicing counts from 49 tissues from GTEx v6p genome...

    • zenodo.org
    application/gzip
    Updated Apr 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vicente A. Yepez; Christian Mertes; Julien Gagneur; Vicente A. Yepez; Christian Mertes; Julien Gagneur (2024). Gene expression and splicing counts from 49 tissues from GTEx v6p genome build hg19 - non-strand specific [Dataset]. http://doi.org/10.5281/zenodo.5638707
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vicente A. Yepez; Christian Mertes; Julien Gagneur; Vicente A. Yepez; Christian Mertes; Julien Gagneur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset description:

    49 folders, each corresponding to one tissue from GTEx v6p and containing the following files:

    1. geneCounts: gene-level counts

    2. k_j: split counts spanning from one exon to another.

    3. k_theta: non-split counts covering a splice site

    4. n_psi3: total split counts from a given acceptor site

    5. n_psi5: total split counts from a given donor site

    6. n_theta: total split and non-split counts for a given splice site

    7. Sample annotation describing each sample from the dataset

    8. Description file with global information from the dataset

    The gene counts were originated using the GTF file from release 29 of GENCODE, and the split and non-split counts contain only the annotated junctions from the same release. Statistics are reported only for GENCODE-annotated introns and splice sites, in compliance with the regulations of the GTEx consortium. For a description of the samples, methods, and protocols, see the GTEx publication specified below.

    Use: The count matrices are intended to help researchers that are interested in using RNA-Seq data with the purpose of diagnostics. Researchers can merge their own dataset with the downloaded ones, provided the tissue, genome build, strand, and paired-end specifications match. Afterwards, the Detection of RNA outliers Pipeline (DROP) can be used to compute gene expression and splicing outliers.
    Organism: Homo sapiens
    Genome assembly: hg19
    Gene annotation: gencode29
    Strand specific: FALSE
    Paired end: TRUE
    Protocol: poly(A) enrichment

    Contact: Vicente A. Yepez, yepez at in.tum.de; Christian Mertes, mertes at in.tum.de; Julien Gagneur, gagneur at in.tum.de

    Citation: Write the following in the "Data availability" section of the manuscript or similar replacing the three citations by the ones from the References section below:

    The count matrices for the GTEx samples were downloaded from Zenodo (doi: 10.5281/zenodo.5596755) and were generated through DROP using the release 29 of the GENCODE annotation .

    Also, write the following in the Acknowledgements section:

    The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The raw data used for the analyses described in this manuscript were obtained from the GTEx Portal on June 12, 2017, under accession number dbGaP phs00424.v6.p1.


  6. GTEx v8 fine mapping on eQTL and sQTL

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barbeira, Alvaro Numa; Bonazzola, Rodrigo; Gamazon, Eric R; Liang, Yanyu; Park, YoSon; Kim-Hellmuth, Sarah; Wang, Gao; Jiang, Zhuoxun; Zhou, Dan; Hormozdiari, Farhad; Liu, Boxiang; Rao, Abhiram; Hamel, Andrew R; Pividori, Milton D; Aguet, François; Bastarache, Lisa; Jordan, Daniel M; Verbanck, Marie; Do, Ron; Stephens, Matthew; Montgomery, Stephen B; Segré, Ayellet V; Brown, Christopher D; Lappalainen, Tuuli; Wen, Xiaoquan; Im, Hae Kyung (2020). GTEx v8 fine mapping on eQTL and sQTL [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3517188
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    New York Genome Centerhttps://www.nygenome.org/
    Harvard University
    Stanford University
    University of Michigan
    Icahn School of Medicine at Mount Sinai
    Université de Paris
    Vanderbilt University
    The University of Chicago
    University of Pennsylvania
    Authors
    Barbeira, Alvaro Numa; Bonazzola, Rodrigo; Gamazon, Eric R; Liang, Yanyu; Park, YoSon; Kim-Hellmuth, Sarah; Wang, Gao; Jiang, Zhuoxun; Zhou, Dan; Hormozdiari, Farhad; Liu, Boxiang; Rao, Abhiram; Hamel, Andrew R; Pividori, Milton D; Aguet, François; Bastarache, Lisa; Jordan, Daniel M; Verbanck, Marie; Do, Ron; Stephens, Matthew; Montgomery, Stephen B; Segré, Ayellet V; Brown, Christopher D; Lappalainen, Tuuli; Wen, Xiaoquan; Im, Hae Kyung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data usage policy

    When using this data, you must acknowledge the source by citing the publication "Widespread dose-dependent effects of RNA expression and splicing on complex diseases and traits" (https://doi.org/10.1101/814350).

    GTEx-GWAS integration: Finemapping

    This package contains DAP-G results on GTEx v8 eQTL and sQTL data. See (DAP-G software) for details. We used only European individuals and variants with MAF>0.01, on genes that are annotated as protein_coding or lncRNA. DAP-G ld_control parameter was 0.75.

    The results were analyzed in this preprint

    Contents

    finemapping/
    |-- README_finemapping.md
    |-- dapg_eqtl.tar
    `-- dapg_sqtl.tar
    

    Unpack each tarball with a command like tar -xvpf dapg_sqtl.tar

    For every tissue:

    • {tissue}.variants_pip.txt.gz contains the variants' posterior inclusion probabilities at being causal for every gene.
      • gene: gene id (or intron id)
      • rank: ranking of the variant according to its PIP (see below)
      • variant_id: gtex variant id
      • pip: posterior inclusion probability of the variant in the causal models
      • log10_abf: approximate Bayes factor (-log10)
      • cluster_id: id of cluster to which the variant belongs
    • {tissue}.models_variants.txt.gz contains, for every model contemplated by DAPG, the list of variants involved. Most of them have single variant.
    • {tissue}.model_summary.txt.gz contains, for every analized gene, a summary of the modes such as expected number of causal variants
      • gene: gene id (or intron id)
      • pes: posterior expected model size (i.e. number of causal variants)
      • pse_se: standard error of the above
      • log_nc: dapg undocumented statistic
      • log10_nc: dapg undocumented statistic
    • {tissue}.models.txt.gz for every analyzed gene:
      • gene: gene id (or intron id)
      • model: number (serving as a model name)
      • n: number of variants (0 for null model)
      • pp: posterior inclusion probability of the model
      • ps: posterior score
    • {tissue}.clusters.txt.gz for every analyzed gene:
      • gene: gene id (or intron id)
      • cluster: number (serving as cluster name)
      • n_snps: number of variants in the cluster
      • pip: posterior inclusion probability
      • average_r2: average correlation within the cluster
    • {tissue}.cluster_correlations.txt.gz: upper triangular matrix of correlations among clusters

    Disclaimer

    The data is provided "as is", and the authors assume no responsibility for errors or omissions.
    The User assumes the entire risk associated with its use of these data.
    The authors shall not be held liable for any use or misuse of the data described and/or contained herein.
    The User bears all responsibility in determining whether these data are fit for the User's intended use.

    The information contained in these data is not better than the original sources from which they were derived, and both scale and accuracy may vary across the data set.
    These data may not have the accuracy, resolution, completeness, timeliness, or other characteristics appropriate for applications that potential users of the data may contemplate.

    The user is responsible to comply with any data usage policy from the original GWAS studies; refer to the list of traits described here to identify their respective Consortia's requirements.

    THE DATA IS PROVIDED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATA OR THE USE OR OTHER DEALINGS IN THE DATA.

  7. Z

    GTEX BioBombe Results - Randomly Permuted Data

    • data.niaid.nih.gov
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Way, Gregory (2020). GTEX BioBombe Results - Randomly Permuted Data [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_2386815
    Explore at:
    Dataset updated
    Jan 21, 2020
    Dataset provided by
    University of Pennsylvania
    Authors
    Way, Gregory
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BioBombe analysis applied to randomly permuted gene expression data from The Genotype-Tissue Expression (GTEx) project. Method and results described in https://github.com/greenelab/BioBombe

  8. GTEx data from UCSC xena (TCGA TARGET GTEx) - part 2

    • zenodo.org
    application/gzip, bin
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhossein Naghsh Nilchi; Amirhossein Naghsh Nilchi; Valentin Hildemann; Valentin Hildemann (2025). GTEx data from UCSC xena (TCGA TARGET GTEx) - part 2 [Dataset]. http://doi.org/10.5281/zenodo.15120785
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Amirhossein Naghsh Nilchi; Amirhossein Naghsh Nilchi; Valentin Hildemann; Valentin Hildemann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains cleaned GTEx data from the dataset "TCGA TARGET GTEx" of UCSC Xena.

    All samples have survival and expression data. The patient ID matches the expression, survival, and phenotype data.

    The script for data cleaning is also included.

  9. h

    gtex-single-cell-rnaseq

    • huggingface.co
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lviv Polytechnic National University – Department of Artificial Intelligence Systems (2025). gtex-single-cell-rnaseq [Dataset]. https://huggingface.co/datasets/ai-department-lpnu/gtex-single-cell-rnaseq
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset authored and provided by
    Lviv Polytechnic National University – Department of Artificial Intelligence Systems
    Description

    GTEx Single-Cell RNA-seq Dataset

    This repository provides tools to create a Hugging Face dataset from GTEx single-nucleus RNA-seq data, transforming the hierarchical H5AD format into a flat, ML-ready structure.

      Overview
    
    
    
    
    
    
    
      Data Source
    

    The data comes from GTEx's snRNA-seq atlas:

    Source: GTEx Portal Publication: Eraslan et al., Science 2022 - "Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function" Content: 209,126… See the full description on the dataset page: https://huggingface.co/datasets/ai-department-lpnu/gtex-single-cell-rnaseq.

  10. f

    RNAseq data intron3 freo GTEx

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    García-Escudero, Ramón; Avila, Jesus; García-Escudero, Vega; Ruiz-Gabarre, Daniel (2023). RNAseq data intron3 freo GTEx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001086758
    Explore at:
    Dataset updated
    Oct 19, 2023
    Authors
    García-Escudero, Ramón; Avila, Jesus; García-Escudero, Vega; Ruiz-Gabarre, Daniel
    Description

    Analysis of RNA-seq data was raw data was obtained from the Genotype-Tissue Expression project (GTEx). A total of 363 samples of frontal cortex, dorsolateral prefrontal cortex, and hippocampus from 180 non-demented human brain donors were analysed. For donors with more than one sample in the same brain region, only the one with the highest levels of MAPT were analysed. FASTQ files were obtained from the SRA files and reads were re-mapped to human genome GRCh38 by means of STAR 2.5.2a. Gene expression was quantified using RSEM 1.3.1, as Transcripts per Million (TPM). The annotation file was obtained from GENCODE v23 and was modified to include TIR12-MAPT gene (coordinates chr17:45894382–46018851), which contains part of intron 12 (coordinates chr17:46018731–46018851) as the 3’ end of the gene.

  11. Z

    GTEX BioBombe Results

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Way, Gregory (2020). GTEX BioBombe Results [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2300615
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    University of Pennsylvania
    Authors
    Way, Gregory
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BioBombe analysis applied to gene expression data from The Genotype-Tissue Expression (GTEx) project. Method and results described in https://github.com/greenelab/BioBombe

  12. Expression data for GTEX with R code to produce coexpression networks

    • figshare.com
    application/gzip
    Updated Apr 24, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alejandro Caceres (2017). Expression data for GTEX with R code to produce coexpression networks [Dataset]. http://doi.org/10.6084/m9.figshare.4793605.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 24, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Alejandro Caceres
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  13. Gene expression and splicing counts from 49 tissues from GTEx v8 genome...

    • zenodo.org
    application/gzip
    Updated Feb 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vicente A. Yépez; Vicente A. Yépez; Nicholas H. Smith; Christian Mertes; Julien Gagneur; Nicholas H. Smith; Christian Mertes; Julien Gagneur (2022). Gene expression and splicing counts from 49 tissues from GTEx v8 genome build hg38 - non-strand specific [Dataset]. http://doi.org/10.5281/zenodo.6078397
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vicente A. Yépez; Vicente A. Yépez; Nicholas H. Smith; Christian Mertes; Julien Gagneur; Nicholas H. Smith; Christian Mertes; Julien Gagneur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset description:

    49 folders, each corresponding to one tissue from GTEx v8 and containing the following files:

    1. geneCounts: gene-level counts

    2. k_j: split counts spanning from one exon to another.

    3. k_theta: non-split counts covering a splice site

    4. n_psi3: total split counts from a given acceptor site

    5. n_psi5: total split counts from a given donor site

    6. n_theta: total split and non-split counts for a given splice site

    7. Sample annotation describing each sample from the dataset

    8. Description file with global information from the dataset

    The gene counts were originated using the GTF file from release 29 of GENCODE, and the split and non-split counts contain only the annotated junctions from the same release. Statistics are reported only for GENCODE-annotated introns and splice sites, in compliance with the regulations of the GTEx consortium. For a description of the samples, methods, and protocols, see the GTEx publication specified below.

    Use: The count matrices are intended to help researchers that are interested in using RNA-Seq data with the purpose of diagnostics. Researchers can merge their own dataset with the downloaded ones, provided the tissue, genome build, strand, and paired-end specifications match. Afterwards, the Detection of RNA outliers Pipeline (DROP) can be used to compute gene expression and splicing outliers.
    Organism: Homo sapiens
    Genome assembly: hg38
    Gene annotation: gencode29
    Strand specific: FALSE
    Paired end: TRUE
    Protocol: poly(A) enrichment

    Contact: Vicente A. Yepez, yepez at in.tum.de; Christian Mertes, mertes at in.tum.de; Julien Gagneur, gagneur at in.tum.de

    Citation: Write the following in the "Data availability" section of the manuscript or similar replacing the three citations by the ones from the References section below:

    The count matrices for the GTEx samples were downloaded from Zenodo (doi: 10.5281/zenodo.6078397) and were generated through DROP using the release 29 of the GENCODE annotation .

    Also, write the following in the Acknowledgements section:

    The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The raw data used for the analyses described in this manuscript were obtained from the GTEx Portal on June 12, 2017, under accession number dbGaP phs000424.v8.p2.

  14. f

    GTEx (Genotype-Tissue Expression) data normalized

    • figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erdogan Taskesen (2023). GTEx (Genotype-Tissue Expression) data normalized [Dataset]. http://doi.org/10.4121/uuid:ec5bfa66-5531-482a-904f-b693aa999e8b
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Erdogan Taskesen
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    This is a normalized dataset from the original RNAseq dataset downloaded from Genotype-Tissue Expression (GTEx) project: www.gtexportal.org: RNA-SeQCv1.1.8 gene rpkm Pilot V3 patch1. The data was used to analyze how tissue samples are related to each other in terms of gene expression data The data can be used to get insights in how gene expression levels behave in in the different human tissues.

  15. d

    Data from: On the cross-population generalizability of gene expression...

    • search.dataone.org
    • datadryad.org
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin L. Keys; Angel C.Y. Mak; Marquitta J. White; Walter L. Eckalbar; Andrew W. Dahl; Joel Mefford; Anna V. Mikhaylova; MarÃa G. Contreras; Jennifer R. Elhawary; Celeste Eng; Donglei Hu; Scott Huntsman; Sam S. Oh; Sandra Salazar; Michael A. Lenoir; Jimmie Chun Ye; Timothy A. Thornton; Noah Zaitlen; Esteban G. Burchard; Christopher R. Gignoux (2025). On the cross-population generalizability of gene expression prediction models [Dataset]. http://doi.org/10.7272/Q6RN362Z
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Kevin L. Keys; Angel C.Y. Mak; Marquitta J. White; Walter L. Eckalbar; Andrew W. Dahl; Joel Mefford; Anna V. Mikhaylova; María G. Contreras; Jennifer R. Elhawary; Celeste Eng; Donglei Hu; Scott Huntsman; Sam S. Oh; Sandra Salazar; Michael A. Lenoir; Jimmie Chun Ye; Timothy A. Thornton; Noah Zaitlen; Esteban G. Burchard; Christopher R. Gignoux
    Time period covered
    Jan 1, 2020
    Description

    The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVA...

  16. GWAS and GTEx QTL integration

    • zenodo.org
    application/gzip, bin +2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alvaro Numa Barbeira; Alvaro Numa Barbeira; Rodrigo Bonazzola; Eric R Gamazon; Eric R Gamazon; Yanyu Liang; Yanyu Liang; YoSon Park; YoSon Park; Sarah Kim-Hellmuth; Sarah Kim-Hellmuth; Gao Wang; Gao Wang; Zhuoxun Jiang; Dan Zhou; Farhad Hormozdiari; Farhad Hormozdiari; Boxiang Liu; Abhiram Rao; Andrew R Hamel; Andrew R Hamel; Milton D Pividori; Milton D Pividori; François Aguet; François Aguet; Lisa Bastarache; Lisa Bastarache; Daniel M Jordan; Daniel M Jordan; Marie Verbanck; Ron Do; Ron Do; Stephen B Montgomery; Stephen B Montgomery; Kristin Ardlie; Christopher D Brown; Christopher D Brown; Ayellet V Segré; Ayellet V Segré; Tuuli Lappalainen; Tuuli Lappalainen; Xiaoquan Wen; Xiaoquan Wen; Hae Kyung Im; Hae Kyung Im; Rodrigo Bonazzola; Zhuoxun Jiang; Dan Zhou; Boxiang Liu; Abhiram Rao; Marie Verbanck; Kristin Ardlie (2020). GWAS and GTEx QTL integration [Dataset]. http://doi.org/10.5281/zenodo.3518299
    Explore at:
    application/gzip, tar, txt, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alvaro Numa Barbeira; Alvaro Numa Barbeira; Rodrigo Bonazzola; Eric R Gamazon; Eric R Gamazon; Yanyu Liang; Yanyu Liang; YoSon Park; YoSon Park; Sarah Kim-Hellmuth; Sarah Kim-Hellmuth; Gao Wang; Gao Wang; Zhuoxun Jiang; Dan Zhou; Farhad Hormozdiari; Farhad Hormozdiari; Boxiang Liu; Abhiram Rao; Andrew R Hamel; Andrew R Hamel; Milton D Pividori; Milton D Pividori; François Aguet; François Aguet; Lisa Bastarache; Lisa Bastarache; Daniel M Jordan; Daniel M Jordan; Marie Verbanck; Ron Do; Ron Do; Stephen B Montgomery; Stephen B Montgomery; Kristin Ardlie; Christopher D Brown; Christopher D Brown; Ayellet V Segré; Ayellet V Segré; Tuuli Lappalainen; Tuuli Lappalainen; Xiaoquan Wen; Xiaoquan Wen; Hae Kyung Im; Hae Kyung Im; Rodrigo Bonazzola; Zhuoxun Jiang; Dan Zhou; Boxiang Liu; Abhiram Rao; Marie Verbanck; Kristin Ardlie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # Data usage policy

    When using this data, you must acknowledge the source by citing the publication "Widespread dose-dependent effects of RNA expression and splicing on complex diseases and traits" (https://doi.org/10.1101/814350).

    # GTEx GWAS integration
    
    This package contains the application of several GWAS-QTL integration methods.
    The results were analyzed in [this preprint](https://www.biorxiv.org/content/10.1101/814350v1)
    about GTEx v8 application to several GWAS traits.
     
    ``` 
    .
    |-- colocalization
    |  |-- coloc
    |  |  `-- coloc_enloc_priors_eqtl.tar.gz
    |  |-- enloc
    |  |  |-- enloc_eqtl_eur.tar.gz
    |  |  `-- enloc_sqtl_eur.tar.gz
    |  `-- eur_ld.bed.gz
    |-- prediction_models
    |  |-- gtex_v8_expression_mashr_snp_smultixcan_covariance.txt.gz
    |  |-- gtex_v8_splicing_mashr_snp_smultixcan_covariance.txt.gz
    |  |-- mashr_eqtl.tar
    |  `-- mashr_sqtl.tar
    |-- smr
    |  |-- SMR_gtex_v8_README.txt
    |  `-- SMRresults_GTEx_v8_peQTL5e-08.tar.gz
    |-- smultixcan
    |  |-- smultixcan_eqtl.tar.gz
    |  `-- smultixcan_sqtl.tar.gz
    `-- spredixcan
      |-- spredixcan_eqtl.tar.gz
      `-- spredixcan_sqtl.tar.gz
    
     ```
     
    You can uncompress gzipped tarball packages `*.tar.gz` in a UNIX command line with an instruction such as:
    ```bash
    tar -xzvpf smultixcan_eqtl.tar.gz
    ```
    , and the tar packages (`*.tar`) with an analogous instruction:
    ```bash
    tar -xvpf mashr_eqtl.tar
    ```
    
    
    ## Preliminaries
    
    **Finemapping** results are contained in a separate release due to size constraints.
    
    GWAS summary statistics for 114 traits were harmonized and imputed to GTEx v8 variants with MAF>0.01 using only european samples.
    (summary imputation software [here](https://github.com/hakyimlab/summary-gwas-imputation)). 
    Some of the following analyses used the full set of 114 traits,
    while some focused only on 87 traits whose imputed associations showed no deflation
    (the imputation algorithm is conservative, and studies with too few available variants have a depleted distribution of association p-values after imputation).
    
    The harmonized and imputed GWAS summary statistics are contained in a separate release due to size constraints. 
    For completeness' sake, the imputed summary statistics look like:
    ```
    variant_id panel_variant_id  chromosome position  effect_allele non_effect_allele current_build frequency sample_size  zscore pvalue effect_size  standard_error imputation_status n_cases
    rs554008981  chr1_13550_G_A_b38 chr1  13550 A G hg38  0.017316017316017316  336474 -2.2919929353647097  0.021906050841240293  NA NA imputed  NA
    rs201055865  chr1_14671_G_C_b38 chr1  14671 C G hg38  0.012987012987012988  336474 -0.9559192804440632  0.33911301727494103  NA NA imputed  NA
    ...
    ```
    
    The GWAS were split in approximately independent LD regions (Berisa-Pickrell)/
    GWAS regions are defined in `eur_ld.bed.gz` (note that a few of them are ill-defined in hg38 and where ignored; only completely defined regions were used). 
    
    ## Colocalization
    
    ### Enloc
    
    ENLOC ([see fotware here](https://github.com/xqwen/integrative))
    was run for sQTLs and eQTLs using individuals of european ancestry and DAP-G QTL enrichment results on 87 traits.
    Result files are included in `enloc_eqtl_eur.tar.gz` and `enloc_sqtl_eur.tar.gz`
    Each file contains a particular tissue-trait combination.
    Each row details colocalization between a GWAS region (Berisa-Pickrell) and gene's or intron's cis-window.
    
    A region might overlap multiple genes/introns or viceversa.
    Each ENLOC file contains the following columns:
    
    * gwas_locus: GWAS LD region
    * molecular_qtl_trait: gene or intron
    * locus_gwas_pip: posterior inclusion probability of variants in the GWAS LD region
    * locus_rcp: regional colocalization probability (main colocalization measure)
    * lead_coloc_SNP: snp with highest RCP
    * lead_snp_rcp: rcp of the lead coloc snp
    
    
    ### Coloc
    
    Coloc ([see software here](https://cran.r-project.org/web/packages/coloc/index.html))
    was run using prior probabilities estimated from QTL enrichment of GWAS variants (computed via ENLOC).
    Results for eQTL are available in `coloc_enloc_priors_eqtl.tar.gz`. 
    Each file contains results for a trait-tissue combination. Columns are:
    * gene_id: gene or intron id
    * p0: probability that neither QTL nor GWAS contain a causal variant
    * p1: probability that only GWAS contains a causal variant
    * p2: probability that only QTL has a causal variant
    * p3: probability that GWAS and QTL have a causal variant and it's distinct
    * p4: probability that GWAS and QTL have a causal variant and it's the same (main colocalization measure)
    
    ## PrediXcan
    
    `mashr_eqtl.tar` and `mashr_sqtl.tar` contain prediction models 
    (trained on expression or splicing data respectively, for 49 GTEx tissues) and LD compilations 
    to be used with PrediXcan, S-PrediXcan, MultiXcan and S-MultiXcan.
    
    For every tissue, the `mashr_{tissue}.db` file is a SQLite file with the prediction model definitions.
    `mashr_{tissue}.txt.gz` is a gzipped-text file with the upper triangular matrices of covariance between snps
    within a gene/intron prediction model.
    
    Many variants in these models don't have an rsid. To fully leverage the information in these models, 
    it is advised to at least harmonize to GTEx variants, and if possible impute as we did [here](https://github.com/hakyimlab/summary-gwas-imputation).
    
    ### S-PrediXcan
    
    S-PrediXcan was run for the 114 harmonized and imputed traits, on eQTL and sQTL mashr prediction models.
    All of the GWAS traits had the same format, so that the following format parameters were used with S-PrediXcan:
    
    ```
    --snp_column panel_variant_id --effect_allele_column effect_allele --non_effect_allele_column non_effect_allele --zscore_column zscore \
    --keep_non_rsid --additional_output --model_db_snp_key varID \
    ```
    
    Each file is a CSV, with each row containing a gene/intron association at a given trait-tissue combination:
    * gene: ENSEMBLE ID or intron id
    * gene_name: HUGO name or intron id
    * zscore: predicted association z-score
    * effect_size: estimated effect size
    * pvalue: association p-value
    * var_g: estimated variance of predicted expression or splicing
    * pred_perf_r2: prediction model cross-validated performance
    * pred_perf_pval: prediction model cross-validated performance
    * pred_perf_qval: deprecated, empty field left for compatibility
    * n_snps_used: number of snps in the intersection of GWAS and model
    * n_snps_in_cov: number of snps in the LD compilation
    * n_snps_in_model: number of snps in the model
    * best_gwas_p: smallest p-value acros GWAS snps used in this model
    * largest_weight: largest prediction model weight
    
    ### S-Multixcan
    
    S-MultiXcan results were generated from the above S-PrediXcan results. Each fiel contains multi-tissue associations for a given trait:
    
    
    * gene: ENSEMBLE ID or intron id
    * gene_name: HUGO name or intron id
    * pvalue: multi-tissue association p-value
    * n: number of models avialble for this gene/intron
    * n_indep: number of independent components of variation in predicted expression/splicing (surviving principal components) 
    * p_i_best: highest single-tissue p-value (S-PrediXcan) 
    * t_i_best: tissue of highest p-value
    * p_i_worst: lowest single-tissue p-value (S-PrediXcan)
    * t_i_worst: tissue of lowest p-value
    * eigen_max: maximum eigenvalue of SVD
    * eigen_min: minimum eigenvalue of SVD
    * eigen_min_kept: smallest eigenvalue retained after discarding smallest variations
    * z_min: minimum single-tissue z-score
    * z_max: maximum single-tissue z-score
    * z_mean: mean single-tissue zscre
    * z_sd: standard deviation of the single-tissue z-scores
    * tmi: trace of M * M_i where M is predicted expression/splicing covariance across tissues for a gene, and M_i is its SVD pseudo-inverse
    * status: computation status, 0 if no errors
    
    ## SMR
    
    See `SMR_gtex_v8_README.txt` for details.

    # Disclaimer

    The data is provided "as is", and the authors assume no responsibility for errors or omissions.
    The User assumes the entire risk associated with its use of these data.
    The authors shall not be held liable for any use or misuse of the data described and/or contained herein.
    The User bears all responsibility in determining whether these data are fit for the User's intended use.

    The information contained in these data is not better than the original sources from which they were derived,
    and both scale and accuracy may vary across the data set.
    These data may not have the accuracy, resolution, completeness, timeliness, or other characteristics
    appropriate for applications that potential users of the data may contemplate.

  17. Z

    Aberrant gene expression prediction benchmark based on GTEx v8

    • data.niaid.nih.gov
    Updated Oct 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hölzlwimmer, Florian Rupert (2024). Aberrant gene expression prediction benchmark based on GTEx v8 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8427311
    Explore at:
    Dataset updated
    Oct 18, 2024
    Dataset provided by
    Technical University Munich
    Authors
    Hölzlwimmer, Florian Rupert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the aberrant gene expression prediction benchmark data as well as the necessary expected gene expression across tissues and tissue-specific isoform contribution scores for AbExp prediction.

    The aberrant gene expression prediction benchmark data (aberrant_expression_prediction_benchmark.parquet) contains the following columns:

    individual: GTEx individual

    gene: Ensembl gene identifier

    tissue: GTEx tissue

    tissue_type: GTEx tissue type

    mu: OUTRIDER-estimated expected gene expression

    theta: OUTRIDER-estimated gene dispersion

    counts: Raw gene expression count

    normalized_counts: OUTRIDER-normalized gene expression count

    l2fc: log2 fold change between observed and expected gene expression count

    zscore: z-score of gene expression, obtained by quantile-mapping the OUTRIDER-estimated distribution to the standard normal distribution

    nominal_pvalue: OUTRIDER-estimated p-value of being an expression outlier

    FDR: FDR-adjusted p-value of being an expression outlier

    is_in_benchmark: Whether this observation is part of the aberrant gene expression prediction benchmark

    is_underexpressed_outlier: Whether this observation is an underexpression outlier at FDR < 5%. This is the benchmark prediction label.

    The isoform proportions table (gtex_v8_isoform_proportions.tsv) contains the following columns:

    gene: Ensembl gene identifier

    tissue_type: GTEx tissue type

    tissue: GTEx tissue

    transcript: Ensembl transcript identifier

    mean_transcript_proportions: mean transcript proportions across individuals in GTEx v8

    median_transcript_proportions: median transcript proportions across individuals in GTEx v8

    sd_transcript_proportions: standard deviation of transcript proportions across individuals in GTEx v8

    The expected gene expression table (gtex_v8_expected_expression.tsv) contains the following columns:

    gene: Ensembl gene identifier

    tissue_type: GTEx tissue type

    tissue: GTEx tissue

    gene_is_expressed: Whether the gene is expressed in the tissue

    median_expression: median OUTRIDER-estimated expected gene expression (mu) across individuals

    expression_dispersion: OUTRIDER-estimated gene dispersion (theta)

  18. Z

    isoTWAS models using 48 GTEx tissues and PsychENCODE data (07.04.22)

    • data.niaid.nih.gov
    Updated Jul 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhattacharya, Arjun (2022). isoTWAS models using 48 GTEx tissues and PsychENCODE data (07.04.22) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6795946
    Explore at:
    Dataset updated
    Jul 5, 2022
    Dataset provided by
    UCLA
    Authors
    Bhattacharya, Arjun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    isoTWAS models for 48 GTEx tissues, adult frontal cortex tissue from the CommonMind Consortium (subset of PsychENCODE project; Gandal et al 2018, Science), and fetal frontal cortext from Walker et al 2019, Cell.

    Each folder corresponds to a separate tissue and contains 1 .tsv.gz file per gene that contains the isoTWAS model. Refer to https://bhattacharya-a-bt.github.io/isotwas/ on how to use these models.

  19. GTEx data from UCSC xena (TCGA TARGET GTEx) - part 1

    • zenodo.org
    application/gzip, bin
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhossein Naghsh Nilchi; Amirhossein Naghsh Nilchi; Valentin Hildemann; Valentin Hildemann (2025). GTEx data from UCSC xena (TCGA TARGET GTEx) - part 1 [Dataset]. http://doi.org/10.5281/zenodo.15122336
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Amirhossein Naghsh Nilchi; Amirhossein Naghsh Nilchi; Valentin Hildemann; Valentin Hildemann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains cleaned GTEx data from the dataset "TCGA TARGET GTEx" of UCSC Xena.

    All samples have survival and expression data. The patient ID matches the expression, survival, and phenotype data.

    The script for data cleaning is also included.

  20. A massive proteogenomic screen identifies thousands of novel human protein...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip
    Updated Aug 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaolong Cao; Xiaolong Cao; Siqi Sun; Jinchuan Xing; Jinchuan Xing; Siqi Sun (2023). A massive proteogenomic screen identifies thousands of novel human protein coding sequences [Dataset]. http://doi.org/10.5281/zenodo.7014020
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xiaolong Cao; Xiaolong Cao; Siqi Sun; Jinchuan Xing; Jinchuan Xing; Siqi Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accurate annotation of genes in the human genome is fundamental for biomedical research and genomic data interpretation. The Ensembl, RefSeq, and GENCODE consortiums continuously update the human genome annotations based on new computational and experimental evidence, and new proteins were identified constantly. The Genotype-Tissue Expression (GTEx) project has generated more than 15,000 RNA sequencing dataset from multiple-tissues of more than 800 donors which allows to model almost all transcripts and proteins in the human genome. Using proteins translated from the GTEx transcript model, more than 21 million in-silico trypsin-digested peptides were generated. To identify high-confidence novel proteins with proteomic support, we screened more than 2,000 proteomic projects in the PRIDE database and selected more than 50,000 mass spectrometry (MS) runs from 923 projects. These MS data were used to validate the predicted novel peptides. With a stringent standard, we identified almost 20,000 novel peptides.

    This dataset include files used in the the above analysis. More details can be found in the GitHub page (https://github.com/ATPs/human_novo_protein_2022).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ramon Vinas Torne (2023). Processed GTEx v8 data [Dataset]. http://doi.org/10.6084/m9.figshare.22650763.v1
Organization logoOrganization logo

Processed GTEx v8 data

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Apr 18, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Ramon Vinas Torne
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The GTEx dataset is a public resource that has generated a broad collection of gene expression data collected from a diverse set of human tissues. Here we share the processed GTEx data used in Hypergraph factorisation for multi-tissue gene expression imputation (Vinas Torne et al., 2023). We processed the data following the GTEx eQTL discovery pipeline.

If you use this data for your research, please cite the GTEx consortium paper: GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. DOI: 10.1126/science.aaz1776

Search
Clear search
Close search
Google apps
Main menu