100+ datasets found

m
Metadata in fecundity gene polymorphism for Ethiopian sheep
data.mendeley.com
Updated Feb 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helen Nigussie (2023). Metadata in fecundity gene polymorphism for Ethiopian sheep [Dataset]. http://doi.org/10.17632/39bb2vh37n.3
Explore at:
Unique identifier
https://doi.org/10.17632/39bb2vh37n.3
Dataset updated
Feb 9, 2023
Authors
Helen Nigussie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ethiopia
Description
The current study was done to identify whether there is fecundity gene polymorphism or not in indigenous sheep and its association with litter size. The dataset has three parts. Metadata _file 1: It comprises genotype data generated from five locus linked to fecundity gene mutation in Ethiopian indigenous sheep for polymorphism analysis. Metadata File2: It Comprise genotype data, litter size data in parity 1 and parity2 for association analysis. High genetic diversity and strong association with litter size were observed in the current study which will used as a baseline information to design cost effective and sustainable genetic improvement program for commercialization. The information will be used for those who are working in animal genetics and breeding and animal science to repeat the study in other species in the same country/location or same species in different location. Besides, the data could also be integrated with other related genotype data for comparative analysis among different breeds and species of livestock.
Z
Data from: Simultaneous estimation of gene regulatory network structure and...
data.niaid.nih.gov
zenodo.org
Updated Sep 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Jackson (2023). Simultaneous estimation of gene regulatory network structure and RNA kinetics from single cell gene expression [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8371194
Explore at:
Dataset updated
Sep 23, 2023
Dataset authored and provided by
Chris Jackson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplemental Data 1 is single-cell response to rapamycin count data first sequenced in this work and deposited in GEO with accession GSE242556. It is a 173348 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns ('Gene', 'Replicate', 'Pool', and 'Experiment') are cell-specific metadata.

Supplemental Data 2 is bulk response to rapamycin count data first sequenced in this work. It is a 33 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns ('Oligo', 'Time', 'Replicate', and 'Sample_barcode') are sample-specific metadata.

Supplemental Data 3 is single-cell count data published as GSE125162 and re-analyzed with the pipeline used for single-cell quantification in this work. It is a 65068 rows × 5850 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 7 columns ('Condition', 'Sample', 'Genotype_Group', 'Genotype_Individual', 'Genotype', 'Replicate', 'Cell_Barcode') are cell-specific metadata.

Supplemental Data 4 is the four deep learning models trained in this work. It is a TAR.GZ file containing the final biophysical transcription/decay model, the pre-trained decay model, the velocity prediction model, and the count prediction model. Each model file is an h5 file containing a pytorch model that can be loaded with supirfactor_dynamical.read().

Supplemental Data 5 is the prior knowledge network used to constrain the models for TF interpretability. It is a 1574 rows × 204 columns [Genes x TFs] TSV.GZ file where the first row is a header with TF names, the first column is an index of gene names, and TF-gene interactions are indicated by non-zero values in the matrix. There are 2799 TF-gene interactions.

Supplemental Table 6 is the oligonucleotide sequences used in this work. It is a TSV file with a header row.

Supplemental Table 7 is the yeast strains used in this work. It is a TSV file with a header row.

Supplemental Table 8 is gene metadata used in this work (e.g. Ribosomal Protein gene labels, etc). It is a TSV file with a header row.

Supplemental Table 9 is FY4/5 growth curve data generated in this work. It is a 20 rows × 7 columns TSV file where the first row is a header with replicate IDs, the first column is an index of times in minutes, and values are cell densities in YPD culture, in units of 10$^6$ cells / mL.

Supplemental Data 10 is a TAR.GZ file containing the yeast SacCer3 genome, modified to add UTR sequences, that was used to generate transcripts for kallisto pseudoalignment in this work.

Gene expression count data from human post-mortem spinal cord

zenodo.org
data.niaid.nih.gov

application/gzip

Updated Mar 26, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Jack Humphrey; Jack Humphrey (2022). Gene expression count data from human post-mortem spinal cord [Dataset]. http://doi.org/10.5281/zenodo.6385747

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6385747

Dataset updated

Mar 26, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Jack Humphrey; Jack Humphrey

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Gene expression data from human post-mortem tissue for three spinal cord sections (cervical, thoracic and lumbar) from amyotrophic lateral sclerosis (ALS) patients and non-neurological disease controls. RNA sequencing performed as part of the New York Genome Center ALS Consortium.

Analysis workbooks: https://jackhump.github.io/ALS_SpinalCord_QTLs/

Preprint describing results: https://www.medrxiv.org/content/10.1101/2021.08.31.21262682v1

Sample sizes:

Region	Control	ALS
Cervical	35	139
Thoracic	10	42
Lumbar	32	122

Library preparation

RNA was extracted from flash-frozen postmortem tissue using TRIzol (Thermo Fisher Scientific) chloroform, followed by column purification (RNeasy Minikit, QIAGEN). RNA integrity number (RIN) was assessed on a Bioanalyzer (Agilent Technologies). RNA-Seq libraries were prepared from 500ng total RNA using the KAPA Stranded RNA-Seq Kit with RiboErase (KAPA Biosystems) for rRNA depletion and Illumina-compatible indexes (NEXTflex RNA-Seq Barcodes, NOVA-512915, PerkinElmer, and IDT for Illumina TruSeq UD Indexes, 20022370). Pooled libraries (average insert size: 375 bp) passing the quality criteria were sequenced either on an Illumina HiSeq 2500 (125 bp paired end) or an Illumina NovaSeq (100 bp paired-end). The samples had a median sequencing depth of 42 million read pairs, with a range between 16 and 167 million read pairs.

Data processing

Samples were uniformly processed using RAPiD-nf, an efficient RNA-Seq processing pipeline implemented in the NextFlow framework. Following adapter trimming with Trimmomatic (version 0.36), all samples were aligned to the hg38 build (GRCh38.primary_assembly) of the human reference genome using STAR (2.7.2a), with indexes created from GENCODE, version 30. Gene expression was quantified using RSEM (1.3.1) using GENCODE v30. Quality control was performed using SAMtools and Picard, and the results were collated using MultiQC. Various technical metrics for sequencing quality control are provided in the metadata. Estimated read counts and normalised transcripts per million (TPM) matrices provided for each tissue.

Provided data:

gencode.v30.gene_meta.tsv.gz - tab separated table with columns "genename", the HGNC gene symbol, and "geneid" the Ensembl ID, as set in the GENCODE v30 comprehensive annotation.

For {tissue} in Cervical_Spinal_Cord, Thoracic_Spinal_Cord, Lumbar_Spinal_Cord:

{tissue}_metadata.tsv.gz - metadata describing each sample. Each row describes a sample. Descriptions of each column below.

{tissue}_gene_tpm.tsv.gz - the normalised TPM values from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.

{tissue}_gene_counts.tsv.gz - the estimated read counts from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.

Metadata Column Description

rna_id - de-identified sample ID for each unique RNA-seq sample

dna_id - de-identified donor ID for each patient enrolled in the study

site_id - de-identified site name for each contributing site

tissue - name of tissue/region

age_rounded - age at death, rounded to nearest decade

sex - biological sex of donor

subject_group - long form disease group

disease - short form disease group

site_of_motor_onset - for ALS donors, where did symptoms start?

disease_duration - for ALS donors, how long did donor live with disease?

mutations - any known ALS gene mutations

library_prep - type of library preparation method used

seq_platform - sequencing platform used for sequencing

rin - RNA integrity number, 0-10

c9orf72_repeat_size - estimated C9orf72 repeat expansion size

gPC1 - gPC5 - principal component of genetic ancestry from whole genome sequencing

Remaining metadata columns are from Picard - see here: http://broadinstitute.github.io/picard/picard-metric-definitions.html#RnaSeqMetrics

f
Data_Sheet_1_BioVDB: biological vector database for high-throughput gene...
figshare.com
frontiersin.figshare.com
pdf
Updated Mar 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren (2024). Data_Sheet_1_BioVDB: biological vector database for high-throughput gene expression meta-analysis.PDF [Dataset]. http://doi.org/10.3389/frai.2024.1366273.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2024.1366273.s001
Dataset updated
Mar 8, 2024
Dataset provided by
Frontiers
Authors
Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
High-throughput sequencing has created an exponential increase in the amount of gene expression data, much of which is freely, publicly available in repositories such as NCBI's Gene Expression Omnibus (GEO). Querying this data for patterns such as similarity and distance, however, becomes increasingly challenging as the total amount of data increases. Furthermore, vectorization of the data is commonly required in Artificial Intelligence and Machine Learning (AI/ML) approaches. We present BioVDB, a vector database for storage and analysis of gene expression data, which enhances the potential for integrating biological studies with AI/ML tools. We used a previously developed approach called Automatic Label Extraction (ALE) to extract sample labels from metadata, including age, sex, and tissue/cell-line. BioVDB stores 438,562 samples from eight microarray GEO platforms. We show that it allows for efficient querying of data using similarity search, which can also be useful for identifying and inferring missing labels of samples, and for rapid similarity analysis.
Paired differential gene expression and splicing analyses results of 199...
zenodo.org
data.niaid.nih.gov
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Søren Helweg Dam; Søren Helweg Dam; Lars Rønn Olsen; Lars Rønn Olsen; Kristoffer Vitting-Seerup; Kristoffer Vitting-Seerup (2023). Paired differential gene expression and splicing analyses results of 199 baseline vs. case comparisons across 100 datasets (Limma) [Dataset]. http://doi.org/10.5281/zenodo.7866420
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7866420
Dataset updated
Jul 19, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Søren Helweg Dam; Søren Helweg Dam; Lars Rønn Olsen; Lars Rønn Olsen; Kristoffer Vitting-Seerup; Kristoffer Vitting-Seerup
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OBS! This is the limma results of the analysis. See https://doi.org/10.5281/zenodo.7032090 for the DESeq2/DEXSeq results.

This dataset contains results from paired differential expression and differential splicing analyses as well as gene-set over-representation analysis results for 199 baseline vs. case comparisons across 100 randomly curated datasets with accompanying metadata (preprint).
All results were computed using the R package pairedGSEA, which utilized Limma (Ritchie et al., 2015) and fgsea (Korotkevich et al., 2019).

Each .RDS file contains a list with four objects: A 'metadata' object with the metadata of the respective raw data, a 'genes' object with gene-level differential splicing and expression results, a 'gene_set' object with over-representation results, and 'experiment' with the experiment title.

The filenames follow this pattern: "[dataset ID]_[GEO accession number]_[Manually assigned comparison title].RDS".

All datasets were obtained from a local copy of the ARCHS4 v11 database of transcript counts (Lachmann et al., 2018).
f
AUROC values from different algorithms for variables in microarray data.
figshare.com
plos.figshare.com
xls
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pei-Yau Lung; Dongrui Zhong; Xiaodong Pang; Yan Li; Jinfeng Zhang (2023). AUROC values from different algorithms for variables in microarray data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1007450.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007450.t002
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS Computational Biology
Authors
Pei-Yau Lung; Dongrui Zhong; Xiaodong Pang; Yan Li; Jinfeng Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AUROC values from different algorithms for variables in microarray data.
Z
Data from: Discrete regulatory modules instruct hematopoietic lineage...
data.niaid.nih.gov
zenodo.org
Updated Aug 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vierstra, Jeff (2021). Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5291736
Explore at:
Dataset updated
Aug 28, 2021
Dataset provided by
Georgolopoulos, Grigorios
Som, Tannishtha
Yiangou, Minas
Stamatoyannopoulos, John A
Psatha, Nikoletta
Nishida, Andrew
Vierstra, Jeff
Iwata, Mineo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2020.04.02.022566v4

Contact: Grigorios Georgolopoulos (ggeorgol@altius.org); Jeff Vierstra (jvierstra@altius.org)

Lineage commitment and differentiation is driven by the concerted action of master transcriptional regulators at their target chromatin sites. Multiple efforts have characterized the key transcription factors (TFs) that determine the various hematopoietic lineages. However, the temporal interactions between individual TFs and their chromatin targets during differentiation and how these interactions dictate lineage commitment remains poorly understood. Here we delineate the temporal interplay between the cis- and the trans-regulatory landscape in establishing lineage commitment and differentiation in human hematopoiesis by performing a dense timecourse of chromatin accessibility (DNase I-seq), and gene expression (total and single cell RNA-seq).

All data uploaded correspond to human genome build version GRCh38.

Contents

DNase I Hotspot (DHS) metadata: Supplementary_Data_1.txt

DNase I Hotspot quantile-normalized counts: A tab-separated matrix with quantile-normalized DNase I density counts from 79,085 FDR 5% hotspots, across 12 erythroid differentiation timepoints from 3 donors, present in at least n=2 samples. Rows correspond to DHS information in Supplementary_Data_1.txt (hotspots.fdr.0.05.qnorm.counts.tsv.gz)

Column information for DNase I Hotspot quantile-normalized counts: hotspots.fdr.0.05.qnorm.counts.info.tsv

Developmentally regulated gene metadata (erythroid): Supplementary_Data_2.csv

Gene matrix of quantile-normalized FPKM values (erythroid): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 erythroid differentiation timepoints from 3 donors. (fpkm_erythroid_qnorm.tsv.gz)

Column information for the quantile-normalized FPKM gene matrix (erythroid): A tab-separated table (fpkm_erythroid_qnorm.info.tsv)

CD34+ HSPC TADs at 10kb resolution: Supplementary_Data_3.bed

Day 11 ex vivo erythroid progenitor TADs at 10kb resolution: Supplementary_Data_4.bed

Transcription factor motif enrichment per DHS cluster: Supplementary_Data_5.csv

Correlation information (links) between developmentally regulated DHS and target genes: Supplementary_Data_6.csv

Chromatin anchor loops called from 10kb resolution Hi-C data: Supplementary_Data_7.bedgraph

Developmentally regulated gene metadata (megakaryocytic): Supplementary_Data_8.csv

Gene matrix of quantile-normalized FPKM values (megakaryocytic): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 megakaryocytic differentiation timepoints from 3 donors. (fpkm_megakaryocyte_qnorm.tsv.gz)

Column information for the quantile-normalized FPKM gene matrix (megakaryocytic): A tab-separated table (fpkm_megakaryocyte_qnorm.info.tsv)

Marker (differentially expressed) genes per single cell population: Supplementary_Data_9.csv

A SCANPY h5ad Annotated DataFrame object: Annotated Data frame anndata in h5ad format including the gene-by-cell count matrix, Velocyto splicing kinetics (RNA velocity) information layer, along with obs, obsm, var, varm, and uns layers. (SCANPY_anndata_object.h5ad)
o
Sequenced genes (ureC gene) and a metagenome from Archaea in Arctic and...
obis.org
gbif.org
Updated Mar 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koninklijk Belgisch Instituut voor Natuurwetenschappen (2019). Sequenced genes (ureC gene) and a metagenome from Archaea in Arctic and Antarctic marine environments [Dataset]. https://obis.org/dataset/50a9fc90-2fbc-47e4-8891-40502c347845
Explore at:
Dataset updated
Mar 19, 2019
Dataset authored and provided by
Koninklijk Belgisch Instituut voor Natuurwetenschappen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Antarctica, Arctic
Description
Microbial dataset containing sequenced genes (ureC gene) from Thaumarchaeota from the Beaufort Sea (Arctic) and the Amundsen Sea (Antarctica), as well as a metagenome (454 pyrosequencing) the Beaufort Sea.
Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB)
zenodo.org
bin
Updated Sep 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erick Quintanilla; Fajar Adnan; Kimberly Diwa; Ashley Nguyen; Lavang Vu; Mary Claryl Truz; Inimary Toby; Inimary Toby; Erick Quintanilla; Fajar Adnan; Kimberly Diwa; Ashley Nguyen; Lavang Vu; Mary Claryl Truz (2020). Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB) [Dataset]. http://doi.org/10.5281/zenodo.4033491
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4033491
Dataset updated
Sep 17, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Erick Quintanilla; Fajar Adnan; Kimberly Diwa; Ashley Nguyen; Lavang Vu; Mary Claryl Truz; Inimary Toby; Inimary Toby; Erick Quintanilla; Fajar Adnan; Kimberly Diwa; Ashley Nguyen; Lavang Vu; Mary Claryl Truz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To better understand the gene level associations that are most relevant to Acute Respiratory Distress Syndrome (ARDS), a comprehensive resource is needed. There’s currently no freely available database dedicated to ARDS that provides comprehensive gene lists from experimentally verifiable studies, gene function, gene location, and additional metadata for tracking related link out resources. The need for such a database is only accentuated by the steep rise in ARDS cases due to the 2020 Coronavirus pandemic, in which infected patients admitted to the ICU develop ARDS at a rate of 67% to 85%, calling for an increase into ARDS research.

Our goal was to develop such a resource for use by the scientific community to enhance our studies of ARDS and associated genes. Our first step was to perform data mining and curation of scientific literature through a robust review process. Subsequent steps enabled us to refine our data by capturing specific metadata and incorporating these into our database. The version 1 of the database will provide users with access to the database flat file with current genes, gene location, chromosomal information, and more in a freely accessible and downloadable format. Future project goals are to develop a standalone web portal that will integrate the gene level information with network analysis, and other visualizations for users.
d
Gene Expression Dataset 4
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verma, Ghanshyam (2023). Gene Expression Dataset 4 [Dataset]. https://search.dataone.org/view/sha256%3A012ed856972b9937db5977c20978dc16c56ada2ed357b2936a42191dc9bd8172
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/PD1K5Y
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Verma, Ghanshyam
Description
Gene Expression Dataset 4. Visit https://dataone.org/datasets/sha256%3A012ed856972b9937db5977c20978dc16c56ada2ed357b2936a42191dc9bd8172 for complete metadata about this dataset.
Z
Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB)
data.niaid.nih.gov
Updated Sep 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adnan, Fajar (2020). Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB) [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4015738
Explore at:
Dataset updated
Sep 17, 2020
Dataset provided by
Adnan, Fajar
Nguyen, Ashley
Quintanilla, Erick
Truz, Mary Claryl
Toby, Inimary
Diwa, Kimberly
Vu, Lavang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To better understand the gene level associations that are most relevant to Acute Respiratory Distress Syndrome (ARDS), a comprehensive resource is needed. There’s currently no freely available database dedicated to ARDS that provides comprehensive gene lists from experimentally verifiable studies, gene function, gene location, and additional metadata for tracking related link out resources. The need for such a database is only accentuated by the steep rise in ARDS cases due to the 2019 Coronavirus pandemic, in which infected patients admitted to the ICU develop ARDS at a rate of 67% to 85%, calling for an increase into ARDS research.

Our goal was to develop such a resource for use by the scientific community to enhance our studies of ARDS and associated genes. Our first step was to perform data mining and curation of scientific literature through a robust review process. Subsequent steps enabled us to refine our data by capturing specific metadata and incorporating these into our database. The version 1 of the database will provide users with access to the database flat file with current genes, gene location, chromosomal information, and more in a freely accessible and downloadable format. Future project goals are to develop a standalone web portal that will integrate the gene level information with network analysis, and other visualizations for users.
Mouse list of microarray datasets before and after hypoxic stress
figshare.com
txt
Updated Jan 23, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hidemasa Bono (2018). Mouse list of microarray datasets before and after hypoxic stress [Dataset]. http://doi.org/10.6084/m9.figshare.5811735.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5811735.v1
Dataset updated
Jan 23, 2018
Dataset provided by
figshare
Authors
Hidemasa Bono
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of NCBI Gene Expression Omnibus (GEO) or EBI ArrayExpress IDs before and after hypoxic stress in mouse.
f
Data from: Meta-Analysis of Public RNA Sequencing Data of Abscisic...
figshare.com
bin
Updated Feb 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mitsuo Shintani (2024). Meta-Analysis of Public RNA Sequencing Data of Abscisic Acid-Related Abiotic Stresses in Arabidopsis thaliana [Dataset]. http://doi.org/10.6084/m9.figshare.22566583.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22566583.v3
Dataset updated
Feb 17, 2024
Dataset provided by
figshare
Authors
Mitsuo Shintani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
File 1 - Metadata for Curated DatasetsThis file contains the metadata for the curated datasets used in the meta-analysis, including Sequence Read Archive (SRA) study ID, run ID, sample tissue, treatment type, treatment time, and sequence library type.File 2 - TPM Data for Gene Expression under Stress ConditionsThis file contains the transcripts per million (TPM) data, five different treatment types (ABA, Salt, Dehydration, Mannitol, and Cold).File 3 - TN-Ratio Data for Gene Expression under Stress ConditionsThis file contains the TN-ratio data, which represents the ratio of gene expression between stress-treated (T) and non-treated (N) samples.File 4 - TN-Score Data for Gene Expression under Stress ConditionsThis file contains the TN-score data, calculated by subtracting the number of downregulated experiments from the number of upregulated experiments. The TN-score was used to assess changes in gene expression under stress conditions across experiments.File 5a - Lists of Upregulated Genes for Each of the Five Stress Treatment TypesThis file contains the lists of upregulated genes identified in the Meta-analysis for each of the five stress treatment types.File 5b - Lists of Downregulated Genes for Each of the Five Stress Treatment TypesThis file contains the lists of downregulated genes identified in the Meta-analysis for each of the five stress treatment types.File 6 - Enrichment Analysis of Differentially Expressed Genes for Five Stress Treatment TypesGene set enrichment analysis of the genes regulated under the five treatments is shown in A–J, indicating upregulated and downregulated genes in the ABA (A, B), salt (C, D), dehydration (E, F), mannitol (G, H), and cold (I, J) treatments, respectively. File 7a - Overlap of Commonly Regulated Genes across ABA, Salt, and Dehydration TreatmentsThis file contains the lists of commonly regulated genes across three stress treatments: ABA, Salt, and Dehydration.File 7b - The Results of Enrichment Analysis for Commonly Regulated Genes across ABA, Salt, and Dehydration TreatmentsThis file contains the results of the enrichment analysis focusing on 166 upregulated and 66 downregulated genes that are commonly regulated across three different stress treatments: ABA, Salt, and Dehydration.File 8a - Overlap of Commonly Upregulated Genes across ABA, Salt, Dehydration, Mannitol, and Cold TreatmentsThis file contains the lists of commonly upregulated genes across five stress treatments: ABA, Salt, Dehydration, Mannitol, and Cold.File 8b - Overlap of Commonly Downregulated Genes across ABA, Salt, Dehydration, Mannitol, and Cold TreatmentsThis file contains the lists of commonly downregulated genes across five stress treatments: ABA, Salt, Dehydration, Mannitol, and Cold.File 9a - Overlap of Commonly Upregulated Genes across ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia TreatmentsThis file contains the lists of commonly upregulated genes across six stress treatments: ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia.File 9b - Overlap of Commonly Downregulated Genes across ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia TreatmentsThis file contains the lists of commonly downregulated genes across six stress treatments: ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia.
Z
GEO gene expression dataset recompute for selected tumor samples
data.niaid.nih.gov
Updated May 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Visentin, Luca (2024). GEO gene expression dataset recompute for selected tumor samples [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10817923
Explore at:
Dataset updated
May 13, 2024
Dataset authored and provided by
Visentin, Luca
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:

geo_accession: The GEO sample ID of the sample.

ena_sample: The ENA sample ID of the sample.

ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.

Pipeline Details

The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.

Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.
D
Metadata for: ‘Long-read sequencing identifies copy-specific markers of SMN...
dataverse.nl
txt, xlsx
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ewout Groen; Ewout Groen (2025). Metadata for: ‘Long-read sequencing identifies copy-specific markers of SMN gene conversion in spinal muscular atrophy’ [Dataset]. http://doi.org/10.34894/G7YG0V
Explore at:
xlsx(17140), txt(2141)Available download formats
Unique identifier
https://doi.org/10.34894/G7YG0V
Dataset updated
Feb 27, 2025
Dataset provided by
DataverseNL
Authors
Ewout Groen; Ewout Groen
License
https://dataverse.nl/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34894/G7YG0Vhttps://dataverse.nl/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34894/G7YG0V
Description
Description This DataverseNL item contains the metadata of the Nanopore sequencing dataset and limited clinical data used in ‘Long-read sequencing identifies copy-specific markers of SMN gene conversion in spinal muscular atrophy’. Access to this data is restricted due to privacy regulations; conditions and instructions for access are listed below. Abstract Background: The complex 2 Mb survival motor neuron (SMN) locus on chromosome 5q13, including the spinal muscular atrophy (SMA)-causing gene SMN1 and modifier SMN2, remains incompletely resolved due to numerous segmental duplications. Variation in SMN2 copy number, presumably influenced by SMN1 to SMN2 gene conversion, affects disease severity, though SMN2 copy number alone has insufficient prognostic value due to limited genotype-phenotype correlations. With advancements in newborn screening and SMN-targeted therapies, identifying genetic markers to predict disease progression and treatment response is crucial. Progress has thus far been limited by methodological constraints. Methods: To address this, we developed HapSMA, a method to perform polyploid phasing of the SMN locus to enable copy-specific analysis of SMN and its surrounding genes. We used HapSMA on publicly available Oxford Nanopore Technologies (ONT) sequencing data of 29 healthy controls and performed long-read, targeted ONT sequencing of the SMN locus of 31 patients with SMA. Results: In healthy controls, we identified single nucleotide variants (SNVs) specific to SMN1 and SMN2 haplotypes that could serve as gene conversion markers. Broad phasing including the NAIP gene allowed for a more complete view of SMN locus variation. Genetic variation in SMN2 haplotypes was larger in SMA patients. 42% of SMN2 haplotypes of SMA patients showed varying SMN1 to SMN2 gene conversion breakpoints, serving as direct evidence of gene conversion as a common genetic characteristic in SMA and highlighting the importance of inclusion of SMA patients when investigating the SMN locus. Conclusions: Our findings illustrate that both methodological advances and the analysis of patient samples are required to advance our understanding of complex genetic loci and address critical clinical challenges. Github The code for HapSMA is available at: https://github.com/UMCUGenetics/HapSMA (v1.0.0 was used for analyses in this study, v1.1.0 contains extra support for different types of data input). The code for analyses subsequent to HapSMA and input files used in these analyses are available at: https://github.com/UMCUGenetics/ManuscriptSMNGeneConversion. IRB approval The study protocol (09307/NL29692.041.09) was approved by the Medical Ethical Committee of the University Medical Center Utrecht and registered at the Dutch registry for clinical studies and trials (https://www.ccmo.nl/). Written informed consent was obtained from all adult patients, and from patients and/or parents additionally in case of children younger than 18 years old. Contact information Requests for data can be made by contacting the principal investigators of this study, Ludo van der Pol (w.l.vanderPol@umcutrecht.nl), Gijs van Haaften (G.vanHaaften@umcutrecht.nl) or Ewout Groen (e.j.n.groen-3@umcutrecht.nl) at University Medical Center Utrecht UMC Utrecht Brain Center Heidelberglaan 100 3584 CX Utrecht The Netherlands Expected response time for processing a data sharing agreement is 4 to 6 weeks.
Z
KG for heart failure gene expression data
data.niaid.nih.gov
zenodo.org
Updated Apr 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Farinola (2023). KG for heart failure gene expression data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7790930
Explore at:
Dataset updated
Apr 2, 2023
Dataset authored and provided by
Luca Farinola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre processed gene expression data for different heart failure. Includes count table, gene patiens metadata, gene lenght
m
NCBI accession metadata for 18S rRNA gene tag sequences from DNA and RNA...
darchive.mblwhoilibrary.org
bco-dmo.org
+1more
pdf, text/tsv, txt +2
Updated Jul 24, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah K Hu; David Caron (2019). NCBI accession metadata for 18S rRNA gene tag sequences from DNA and RNA from samples collected in coastal California in 2013 and 2014 [Dataset]. https://darchive.mblwhoilibrary.org/entities/publication/438f7d51-f9e5-5c8d-b797-b10f4b04156a
Explore at:
pdf, xml, text/tsv, zip, txtAvailable download formats
Dataset updated
Jul 24, 2019
Dataset provided by
Biological and Chemical Oceanography Data Management Office (BCO-DMO). Contact: bco-dmo-data@whoi.edu
Authors
Sarah K Hu; David Caron
Area covered

Description
NSF Division of Ocean Sciences (NSF OCE) OCE-1737409
Z
Historical NCI Genomic Data Commons data (09-14-2017)
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seim, Inge (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1186944
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Seim, Inge
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata samples570 version11-27-2017 hubhttps://gdc.xenahubs.net type of dataphenotype authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90 raw datahttps://api.gdc.cancer.gov/data/ input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix) 570 samples X 151 identifiersAll IdentifiersAll Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata samples512 version09-14-2017 hubhttps://gdc.xenahubs.net type of datagene expression RNAseq unitlog2(fpkm-uq+1) platformIllumina ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80 raw datahttps://api.gdc.cancer.gov/data/ wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed. input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix) 60,484 identifiers X 512 samples
d
TWIS meta-analyzed summary statistics
search.dataone.org
datadryad.org
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Evans (2023). TWIS meta-analyzed summary statistics [Dataset]. http://doi.org/10.5061/dryad.866t1g1tw
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.866t1g1tw
Dataset updated
Nov 29, 2023
Dataset provided by
Dryad Digital Repository
Authors
Luke Evans
Time period covered
Dec 9, 2022
Description
It remains unknown to what extent gene-gene interactions contribute to complex traits. Here, we introduce a new approach using predicted gene expression to perform exhaustive transcriptome-wide interaction studies (TWISs) for multiple traits across all pairs of genes expressed in several tissue types. Using imputed transcriptomes, we simultaneously reduce the computational challenge and improve interpretability and statistical power. We discover and replicate several interaction associations and find several hub genes with numerous interactions. We also demonstrate that TWIS can identify novel associated genes because genes with many or strong interactions have smaller single-locus model effect sizes. Finally, we develop a method to test gene set enrichment of TWIS associations (E-TWIS), finding numerous pathways and networks enriched in interaction associations. Epistasis is likely widespread, and our procedure represents a tractable framework for beginning to explore gene interactions...
Data from: The new bioinformatics: integrating ecological data from the gene...
zenodo.org
data.niaid.nih.gov
+1more
csv
Updated May 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman; Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman (2022). Data from: The new bioinformatics: integrating ecological data from the gene to the biosphere [Dataset]. http://doi.org/10.5061/dryad.qb0d6
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.qb0d6
Dataset updated
May 30, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman; Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Bioinformatics, the application of computational tools to the management and analysis of biological data, has stimulated rapid research advances in genomics through the development of data archives such as GenBank, and similar progress is just beginning within ecology. One reason for the belated adoption of informatics approaches in ecology is the breadth of ecologically pertinent data (from genes to the biosphere) and its highly heterogeneous nature. The variety of formats, logical structures, and sampling methods in ecology create significant challenges. Cultural barriers further impede progress, especially for the creation and adoption of data standards. Here we describe informatics frameworks for ecology, from subject-specific data warehouses, to generic data collections that use detailed metadata descriptions and formal ontologies to catalog and cross-reference information. Combining these approaches with automated data integration techniques and scientific workflow systems will maximize the value of data and open new frontiers for research in ecology.

Facebook

Twitter

Click to copy link

Link copied

Cite

Helen Nigussie (2023). Metadata in fecundity gene polymorphism for Ethiopian sheep [Dataset]. http://doi.org/10.17632/39bb2vh37n.3

Metadata in fecundity gene polymorphism for Ethiopian sheep

Explore at:

Unique identifier

https://doi.org/10.17632/39bb2vh37n.3

Dataset updated

Feb 9, 2023

Authors

Helen Nigussie

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Ethiopia

Description

The current study was done to identify whether there is fecundity gene polymorphism or not in indigenous sheep and its association with litter size. The dataset has three parts. Metadata _file 1: It comprises genotype data generated from five locus linked to fecundity gene mutation in Ethiopian indigenous sheep for polymorphism analysis. Metadata File2: It Comprise genotype data, litter size data in parity 1 and parity2 for association analysis. High genetic diversity and strong association with litter size were observed in the current study which will used as a baseline information to design cost effective and sustainable genetic improvement program for commercialization. The information will be used for those who are working in animal genetics and breeding and animal science to repeat the study in other species in the same country/location or same species in different location. Besides, the data could also be integrated with other related genotype data for comparative analysis among different breeds and species of livestock.

Clear search

Close search

Google apps

Main menu

Metadata in fecundity gene polymorphism for Ethiopian sheep

Data from: Simultaneous estimation of gene regulatory network structure and...

Gene expression count data from human post-mortem spinal cord

Data_Sheet_1_BioVDB: biological vector database for high-throughput gene...

Paired differential gene expression and splicing analyses results of 199...

AUROC values from different algorithms for variables in microarray data.

Data from: Discrete regulatory modules instruct hematopoietic lineage...

Sequenced genes (ureC gene) and a metagenome from Archaea in Arctic and...

Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB)

Gene Expression Dataset 4

Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB)

Mouse list of microarray datasets before and after hypoxic stress

Data from: Meta-Analysis of Public RNA Sequencing Data of Abscisic...

GEO gene expression dataset recompute for selected tumor samples

Metadata for: ‘Long-read sequencing identifies copy-specific markers of SMN...

KG for heart failure gene expression data

NCBI accession metadata for 18S rRNA gene tag sequences from DNA and RNA...

Historical NCI Genomic Data Commons data (09-14-2017)

TWIS meta-analyzed summary statistics

Data from: The new bioinformatics: integrating ecological data from the gene...

Metadata in fecundity gene polymorphism for Ethiopian sheepSee More Versions

Metadata in fecundity gene polymorphism for Ethiopian sheep