Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.
All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).
Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.
Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:
geo_accession: The GEO sample ID of the sample.
ena_sample: The ENA sample ID of the sample.
ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.
The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.
Pipeline Details
The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.
Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.
All uploaded files are UTF-8, `.csv`-formatted matrices. The `*_expected_count.csv.gz` files are unlogged, raw expression counts as reported by `rsem-quantify-expression` (see details below). The associated `*_metadata.csv.gz` files contain metadata pertinent to each column of the corresponding expression matrix.
Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).
Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.
Each recompute has at least the `gene_id` column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.
Each associated metadata has at least the following columns:
- `geo_accession`: The GEO sample ID of the sample.
- `sample_accession`: The ENA sample ID of the sample.
- `run_accession`: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.
## Pipeline Details
The alignment and quantification was made with the `x.FASTQ` tool available [on Github](https://github.com/TCP-Lab/x.FASTQ) installed locally on an Arch Linux machine running the Linux `6.7.8-zen1-1-zen` kernel with a `11th Gen Intel i7-1185G7 (8)` CPU and a `Intel TigerLake-LP GT2 [Iris Xe Graphics]` GPU.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.
All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).
Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.
Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:
geo_accession: The GEO sample ID of the sample.
ena_sample: The ENA sample ID of the sample.
ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.
The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.
Pipeline Details
The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.
Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.