2 datasets found
  1. Z

    GEO gene expression dataset recompute for selected tumor samples

    • data.niaid.nih.gov
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Visentin, Luca (2024). GEO gene expression dataset recompute for selected tumor samples [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10817923
    Explore at:
    Dataset updated
    May 13, 2024
    Dataset authored and provided by
    Visentin, Luca
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

    All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

    Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

    Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:

    geo_accession: The GEO sample ID of the sample.

    ena_sample: The ENA sample ID of the sample.

    ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

    The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.

    Pipeline Details

    The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.

    Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.

  2. GEO gene expression dataset recompute for selected tumor samples

    • zenodo.org
    application/gzip
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luca Visentin; Luca Visentin (2024). GEO gene expression dataset recompute for selected tumor samples [Dataset]. http://doi.org/10.5281/zenodo.10817924
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Luca Visentin; Luca Visentin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

    All uploaded files are UTF-8, `.csv`-formatted matrices. The `*_expected_count.csv.gz` files are unlogged, raw expression counts as reported by `rsem-quantify-expression` (see details below). The associated `*_metadata.csv.gz` files contain metadata pertinent to each column of the corresponding expression matrix.
    Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

    Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

    Each recompute has at least the `gene_id` column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.
    Each associated metadata has at least the following columns:
    - `geo_accession`: The GEO sample ID of the sample.
    - `sample_accession`: The ENA sample ID of the sample.
    - `run_accession`: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

    ## Pipeline Details

    The alignment and quantification was made with the `x.FASTQ` tool available [on Github](https://github.com/TCP-Lab/x.FASTQ) installed locally on an Arch Linux machine running the Linux `6.7.8-zen1-1-zen` kernel with a `11th Gen Intel i7-1185G7 (8)` CPU and a `Intel TigerLake-LP GT2 [Iris Xe Graphics]` GPU.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Visentin, Luca (2024). GEO gene expression dataset recompute for selected tumor samples [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10817923

GEO gene expression dataset recompute for selected tumor samples

Explore at:
Dataset updated
May 13, 2024
Dataset authored and provided by
Visentin, Luca
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:

geo_accession: The GEO sample ID of the sample.

ena_sample: The ENA sample ID of the sample.

ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.

Pipeline Details

The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.

Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.

Search
Clear search
Close search
Google apps
Main menu