100+ datasets found
  1. m

    Metadata in fecundity gene polymorphism for Ethiopian sheep

    • data.mendeley.com
    Updated Feb 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Helen Nigussie (2023). Metadata in fecundity gene polymorphism for Ethiopian sheep [Dataset]. http://doi.org/10.17632/39bb2vh37n.3
    Explore at:
    Dataset updated
    Feb 9, 2023
    Authors
    Helen Nigussie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ethiopia
    Description

    The current study was done to identify whether there is fecundity gene polymorphism or not in indigenous sheep and its association with litter size. The dataset has three parts. Metadata _file 1: It comprises genotype data generated from five locus linked to fecundity gene mutation in Ethiopian indigenous sheep for polymorphism analysis. Metadata File2: It Comprise genotype data, litter size data in parity 1 and parity2 for association analysis. High genetic diversity and strong association with litter size were observed in the current study which will used as a baseline information to design cost effective and sustainable genetic improvement program for commercialization. The information will be used for those who are working in animal genetics and breeding and animal science to repeat the study in other species in the same country/location or same species in different location. Besides, the data could also be integrated with other related genotype data for comparative analysis among different breeds and species of livestock.

  2. Z

    Data from: Simultaneous estimation of gene regulatory network structure and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Jackson (2023). Simultaneous estimation of gene regulatory network structure and RNA kinetics from single cell gene expression [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8371194
    Explore at:
    Dataset updated
    Sep 23, 2023
    Dataset authored and provided by
    Chris Jackson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplemental Data 1 is single-cell response to rapamycin count data first sequenced in this work and deposited in GEO with accession GSE242556. It is a 173348 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns ('Gene', 'Replicate', 'Pool', and 'Experiment') are cell-specific metadata.

    Supplemental Data 2 is bulk response to rapamycin count data first sequenced in this work. It is a 33 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns ('Oligo', 'Time', 'Replicate', and 'Sample_barcode') are sample-specific metadata.

    Supplemental Data 3 is single-cell count data published as GSE125162 and re-analyzed with the pipeline used for single-cell quantification in this work. It is a 65068 rows × 5850 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 7 columns ('Condition', 'Sample', 'Genotype_Group', 'Genotype_Individual', 'Genotype', 'Replicate', 'Cell_Barcode') are cell-specific metadata.

    Supplemental Data 4 is the four deep learning models trained in this work. It is a TAR.GZ file containing the final biophysical transcription/decay model, the pre-trained decay model, the velocity prediction model, and the count prediction model. Each model file is an h5 file containing a pytorch model that can be loaded with supirfactor_dynamical.read().

    Supplemental Data 5 is the prior knowledge network used to constrain the models for TF interpretability. It is a 1574 rows × 204 columns [Genes x TFs] TSV.GZ file where the first row is a header with TF names, the first column is an index of gene names, and TF-gene interactions are indicated by non-zero values in the matrix. There are 2799 TF-gene interactions.

    Supplemental Table 6 is the oligonucleotide sequences used in this work. It is a TSV file with a header row.

    Supplemental Table 7 is the yeast strains used in this work. It is a TSV file with a header row.

    Supplemental Table 8 is gene metadata used in this work (e.g. Ribosomal Protein gene labels, etc). It is a TSV file with a header row.

    Supplemental Table 9 is FY4/5 growth curve data generated in this work. It is a 20 rows × 7 columns TSV file where the first row is a header with replicate IDs, the first column is an index of times in minutes, and values are cell densities in YPD culture, in units of 10$^6$ cells / mL.

    Supplemental Data 10 is a TAR.GZ file containing the yeast SacCer3 genome, modified to add UTR sequences, that was used to generate transcripts for kallisto pseudoalignment in this work.

  3. Gene expression count data from human post-mortem spinal cord

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Mar 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Humphrey; Jack Humphrey (2022). Gene expression count data from human post-mortem spinal cord [Dataset]. http://doi.org/10.5281/zenodo.6385747
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 26, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jack Humphrey; Jack Humphrey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gene expression data from human post-mortem tissue for three spinal cord sections (cervical, thoracic and lumbar) from amyotrophic lateral sclerosis (ALS) patients and non-neurological disease controls. RNA sequencing performed as part of the New York Genome Center ALS Consortium.

    Analysis workbooks: https://jackhump.github.io/ALS_SpinalCord_QTLs/

    Preprint describing results: https://www.medrxiv.org/content/10.1101/2021.08.31.21262682v1

    Sample sizes:

    Region

    Control

    ALS

    Cervical

    35

    139

    Thoracic

    10

    42

    Lumbar

    32

    122

    Library preparation

    RNA was extracted from flash-frozen postmortem tissue using TRIzol (Thermo Fisher Scientific) chloroform, followed by column purification (RNeasy Minikit, QIAGEN). RNA integrity number (RIN) was assessed on a Bioanalyzer (Agilent Technologies). RNA-Seq libraries were prepared from 500ng total RNA using the KAPA Stranded RNA-Seq Kit with RiboErase (KAPA Biosystems) for rRNA depletion and Illumina-compatible indexes (NEXTflex RNA-Seq Barcodes, NOVA-512915, PerkinElmer, and IDT for Illumina TruSeq UD Indexes, 20022370). Pooled libraries (average insert size: 375 bp) passing the quality criteria were sequenced either on an Illumina HiSeq 2500 (125 bp paired end) or an Illumina NovaSeq (100 bp paired-end). The samples had a median sequencing depth of 42 million read pairs, with a range between 16 and 167 million read pairs.

    Data processing

    Samples were uniformly processed using RAPiD-nf, an efficient RNA-Seq processing pipeline implemented in the NextFlow framework. Following adapter trimming with Trimmomatic (version 0.36), all samples were aligned to the hg38 build (GRCh38.primary_assembly) of the human reference genome using STAR (2.7.2a), with indexes created from GENCODE, version 30. Gene expression was quantified using RSEM (1.3.1) using GENCODE v30. Quality control was performed using SAMtools and Picard, and the results were collated using MultiQC. Various technical metrics for sequencing quality control are provided in the metadata. Estimated read counts and normalised transcripts per million (TPM) matrices provided for each tissue.

    Provided data:

    gencode.v30.gene_meta.tsv.gz - tab separated table with columns "genename", the HGNC gene symbol, and "geneid" the Ensembl ID, as set in the GENCODE v30 comprehensive annotation.

    For {tissue} in Cervical_Spinal_Cord, Thoracic_Spinal_Cord, Lumbar_Spinal_Cord:

    {tissue}_metadata.tsv.gz - metadata describing each sample. Each row describes a sample. Descriptions of each column below.

    {tissue}_gene_tpm.tsv.gz - the normalised TPM values from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.

    {tissue}_gene_counts.tsv.gz - the estimated read counts from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.

    Metadata Column Description

    rna_id - de-identified sample ID for each unique RNA-seq sample

    dna_id - de-identified donor ID for each patient enrolled in the study

    site_id - de-identified site name for each contributing site

    tissue - name of tissue/region

    age_rounded - age at death, rounded to nearest decade

    sex - biological sex of donor

    subject_group - long form disease group

    disease - short form disease group

    site_of_motor_onset - for ALS donors, where did symptoms start?

    disease_duration - for ALS donors, how long did donor live with disease?

    mutations - any known ALS gene mutations

    library_prep - type of library preparation method used

    seq_platform - sequencing platform used for sequencing

    rin - RNA integrity number, 0-10

    c9orf72_repeat_size - estimated C9orf72 repeat expansion size

    gPC1 - gPC5 - principal component of genetic ancestry from whole genome sequencing

    Remaining metadata columns are from Picard - see here: http://broadinstitute.github.io/picard/picard-metric-definitions.html#RnaSeqMetrics

  4. f

    Data_Sheet_1_BioVDB: biological vector database for high-throughput gene...

    • figshare.com
    • frontiersin.figshare.com
    pdf
    Updated Mar 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren (2024). Data_Sheet_1_BioVDB: biological vector database for high-throughput gene expression meta-analysis.PDF [Dataset]. http://doi.org/10.3389/frai.2024.1366273.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 8, 2024
    Dataset provided by
    Frontiers
    Authors
    Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High-throughput sequencing has created an exponential increase in the amount of gene expression data, much of which is freely, publicly available in repositories such as NCBI's Gene Expression Omnibus (GEO). Querying this data for patterns such as similarity and distance, however, becomes increasingly challenging as the total amount of data increases. Furthermore, vectorization of the data is commonly required in Artificial Intelligence and Machine Learning (AI/ML) approaches. We present BioVDB, a vector database for storage and analysis of gene expression data, which enhances the potential for integrating biological studies with AI/ML tools. We used a previously developed approach called Automatic Label Extraction (ALE) to extract sample labels from metadata, including age, sex, and tissue/cell-line. BioVDB stores 438,562 samples from eight microarray GEO platforms. We show that it allows for efficient querying of data using similarity search, which can also be useful for identifying and inferring missing labels of samples, and for rapid similarity analysis.

  5. Paired differential gene expression and splicing analyses results of 199...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Søren Helweg Dam; Søren Helweg Dam; Lars Rønn Olsen; Lars Rønn Olsen; Kristoffer Vitting-Seerup; Kristoffer Vitting-Seerup (2023). Paired differential gene expression and splicing analyses results of 199 baseline vs. case comparisons across 100 datasets (Limma) [Dataset]. http://doi.org/10.5281/zenodo.7866420
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Søren Helweg Dam; Søren Helweg Dam; Lars Rønn Olsen; Lars Rønn Olsen; Kristoffer Vitting-Seerup; Kristoffer Vitting-Seerup
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OBS! This is the limma results of the analysis. See https://doi.org/10.5281/zenodo.7032090 for the DESeq2/DEXSeq results.

    This dataset contains results from paired differential expression and differential splicing analyses as well as gene-set over-representation analysis results for 199 baseline vs. case comparisons across 100 randomly curated datasets with accompanying metadata (preprint).
    All results were computed using the R package pairedGSEA, which utilized Limma (Ritchie et al., 2015) and fgsea (Korotkevich et al., 2019).

    Each .RDS file contains a list with four objects: A 'metadata' object with the metadata of the respective raw data, a 'genes' object with gene-level differential splicing and expression results, a 'gene_set' object with over-representation results, and 'experiment' with the experiment title.

    The filenames follow this pattern: "[dataset ID]_[GEO accession number]_[Manually assigned comparison title].RDS".

    All datasets were obtained from a local copy of the ARCHS4 v11 database of transcript counts (Lachmann et al., 2018).

  6. f

    AUROC values from different algorithms for variables in microarray data.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pei-Yau Lung; Dongrui Zhong; Xiaodong Pang; Yan Li; Jinfeng Zhang (2023). AUROC values from different algorithms for variables in microarray data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1007450.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Pei-Yau Lung; Dongrui Zhong; Xiaodong Pang; Yan Li; Jinfeng Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AUROC values from different algorithms for variables in microarray data.

  7. Z

    Data from: Discrete regulatory modules instruct hematopoietic lineage...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vierstra, Jeff (2021). Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5291736
    Explore at:
    Dataset updated
    Aug 28, 2021
    Dataset provided by
    Georgolopoulos, Grigorios
    Som, Tannishtha
    Yiangou, Minas
    Stamatoyannopoulos, John A
    Psatha, Nikoletta
    Nishida, Andrew
    Vierstra, Jeff
    Iwata, Mineo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2020.04.02.022566v4

    Contact: Grigorios Georgolopoulos (ggeorgol@altius.org); Jeff Vierstra (jvierstra@altius.org)

    Lineage commitment and differentiation is driven by the concerted action of master transcriptional regulators at their target chromatin sites. Multiple efforts have characterized the key transcription factors (TFs) that determine the various hematopoietic lineages. However, the temporal interactions between individual TFs and their chromatin targets during differentiation and how these interactions dictate lineage commitment remains poorly understood. Here we delineate the temporal interplay between the cis- and the trans-regulatory landscape in establishing lineage commitment and differentiation in human hematopoiesis by performing a dense timecourse of chromatin accessibility (DNase I-seq), and gene expression (total and single cell RNA-seq).

    All data uploaded correspond to human genome build version GRCh38.

    Contents

    DNase I Hotspot (DHS) metadata: Supplementary_Data_1.txt

    DNase I Hotspot quantile-normalized counts: A tab-separated matrix with quantile-normalized DNase I density counts from 79,085 FDR 5% hotspots, across 12 erythroid differentiation timepoints from 3 donors, present in at least n=2 samples. Rows correspond to DHS information in Supplementary_Data_1.txt (hotspots.fdr.0.05.qnorm.counts.tsv.gz)

    Column information for DNase I Hotspot quantile-normalized counts: hotspots.fdr.0.05.qnorm.counts.info.tsv

    Developmentally regulated gene metadata (erythroid): Supplementary_Data_2.csv

    Gene matrix of quantile-normalized FPKM values (erythroid): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 erythroid differentiation timepoints from 3 donors. (fpkm_erythroid_qnorm.tsv.gz)

    Column information for the quantile-normalized FPKM gene matrix (erythroid): A tab-separated table (fpkm_erythroid_qnorm.info.tsv)

    CD34+ HSPC TADs at 10kb resolution: Supplementary_Data_3.bed

    Day 11 ex vivo erythroid progenitor TADs at 10kb resolution: Supplementary_Data_4.bed

    Transcription factor motif enrichment per DHS cluster: Supplementary_Data_5.csv

    Correlation information (links) between developmentally regulated DHS and target genes: Supplementary_Data_6.csv

    Chromatin anchor loops called from 10kb resolution Hi-C data: Supplementary_Data_7.bedgraph

    Developmentally regulated gene metadata (megakaryocytic): Supplementary_Data_8.csv

    Gene matrix of quantile-normalized FPKM values (megakaryocytic): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 megakaryocytic differentiation timepoints from 3 donors. (fpkm_megakaryocyte_qnorm.tsv.gz)

    Column information for the quantile-normalized FPKM gene matrix (megakaryocytic): A tab-separated table (fpkm_megakaryocyte_qnorm.info.tsv)

    Marker (differentially expressed) genes per single cell population: Supplementary_Data_9.csv

    A SCANPY h5ad Annotated DataFrame object: Annotated Data frame anndata in h5ad format including the gene-by-cell count matrix, Velocyto splicing kinetics (RNA velocity) information layer, along with obs, obsm, var, varm, and uns layers. (SCANPY_anndata_object.h5ad)

  8. o

    Sequenced genes (ureC gene) and a metagenome from Archaea in Arctic and...

    • obis.org
    • gbif.org
    Updated Mar 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koninklijk Belgisch Instituut voor Natuurwetenschappen (2019). Sequenced genes (ureC gene) and a metagenome from Archaea in Arctic and Antarctic marine environments [Dataset]. https://obis.org/dataset/50a9fc90-2fbc-47e4-8891-40502c347845
    Explore at:
    Dataset updated
    Mar 19, 2019
    Dataset authored and provided by
    Koninklijk Belgisch Instituut voor Natuurwetenschappen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Antarctica, Arctic
    Description

    Microbial dataset containing sequenced genes (ureC gene) from Thaumarchaeota from the Beaufort Sea (Arctic) and the Amundsen Sea (Antarctica), as well as a metagenome (454 pyrosequencing) the Beaufort Sea.

  9. Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB)

    • zenodo.org
    bin
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erick Quintanilla; Fajar Adnan; Kimberly Diwa; Ashley Nguyen; Lavang Vu; Mary Claryl Truz; Inimary Toby; Inimary Toby; Erick Quintanilla; Fajar Adnan; Kimberly Diwa; Ashley Nguyen; Lavang Vu; Mary Claryl Truz (2020). Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB) [Dataset]. http://doi.org/10.5281/zenodo.4033491
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 17, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erick Quintanilla; Fajar Adnan; Kimberly Diwa; Ashley Nguyen; Lavang Vu; Mary Claryl Truz; Inimary Toby; Inimary Toby; Erick Quintanilla; Fajar Adnan; Kimberly Diwa; Ashley Nguyen; Lavang Vu; Mary Claryl Truz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To better understand the gene level associations that are most relevant to Acute Respiratory Distress Syndrome (ARDS), a comprehensive resource is needed. There’s currently no freely available database dedicated to ARDS that provides comprehensive gene lists from experimentally verifiable studies, gene function, gene location, and additional metadata for tracking related link out resources. The need for such a database is only accentuated by the steep rise in ARDS cases due to the 2020 Coronavirus pandemic, in which infected patients admitted to the ICU develop ARDS at a rate of 67% to 85%, calling for an increase into ARDS research.

    Our goal was to develop such a resource for use by the scientific community to enhance our studies of ARDS and associated genes. Our first step was to perform data mining and curation of scientific literature through a robust review process. Subsequent steps enabled us to refine our data by capturing specific metadata and incorporating these into our database. The version 1 of the database will provide users with access to the database flat file with current genes, gene location, chromosomal information, and more in a freely accessible and downloadable format. Future project goals are to develop a standalone web portal that will integrate the gene level information with network analysis, and other visualizations for users.

  10. d

    Gene Expression Dataset 4

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verma, Ghanshyam (2023). Gene Expression Dataset 4 [Dataset]. https://search.dataone.org/view/sha256%3A012ed856972b9937db5977c20978dc16c56ada2ed357b2936a42191dc9bd8172
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Verma, Ghanshyam
    Description

    Gene Expression Dataset 4. Visit https://dataone.org/datasets/sha256%3A012ed856972b9937db5977c20978dc16c56ada2ed357b2936a42191dc9bd8172 for complete metadata about this dataset.

  11. Z

    Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB)

    • data.niaid.nih.gov
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adnan, Fajar (2020). Acute Respiratory Distress Syndrome-Database of Genes (ARDS-DB) [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4015738
    Explore at:
    Dataset updated
    Sep 17, 2020
    Dataset provided by
    Adnan, Fajar
    Nguyen, Ashley
    Quintanilla, Erick
    Truz, Mary Claryl
    Toby, Inimary
    Diwa, Kimberly
    Vu, Lavang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To better understand the gene level associations that are most relevant to Acute Respiratory Distress Syndrome (ARDS), a comprehensive resource is needed. There’s currently no freely available database dedicated to ARDS that provides comprehensive gene lists from experimentally verifiable studies, gene function, gene location, and additional metadata for tracking related link out resources. The need for such a database is only accentuated by the steep rise in ARDS cases due to the 2019 Coronavirus pandemic, in which infected patients admitted to the ICU develop ARDS at a rate of 67% to 85%, calling for an increase into ARDS research.

    Our goal was to develop such a resource for use by the scientific community to enhance our studies of ARDS and associated genes. Our first step was to perform data mining and curation of scientific literature through a robust review process. Subsequent steps enabled us to refine our data by capturing specific metadata and incorporating these into our database. The version 1 of the database will provide users with access to the database flat file with current genes, gene location, chromosomal information, and more in a freely accessible and downloadable format. Future project goals are to develop a standalone web portal that will integrate the gene level information with network analysis, and other visualizations for users.

  12. Mouse list of microarray datasets before and after hypoxic stress

    • figshare.com
    txt
    Updated Jan 23, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hidemasa Bono (2018). Mouse list of microarray datasets before and after hypoxic stress [Dataset]. http://doi.org/10.6084/m9.figshare.5811735.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 23, 2018
    Dataset provided by
    figshare
    Authors
    Hidemasa Bono
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    List of NCBI Gene Expression Omnibus (GEO) or EBI ArrayExpress IDs before and after hypoxic stress in mouse.

  13. f

    Data from: Meta-Analysis of Public RNA Sequencing Data of Abscisic...

    • figshare.com
    bin
    Updated Feb 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitsuo Shintani (2024). Meta-Analysis of Public RNA Sequencing Data of Abscisic Acid-Related Abiotic Stresses in Arabidopsis thaliana [Dataset]. http://doi.org/10.6084/m9.figshare.22566583.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 17, 2024
    Dataset provided by
    figshare
    Authors
    Mitsuo Shintani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    File 1 - Metadata for Curated DatasetsThis file contains the metadata for the curated datasets used in the meta-analysis, including Sequence Read Archive (SRA) study ID, run ID, sample tissue, treatment type, treatment time, and sequence library type.File 2 - TPM Data for Gene Expression under Stress ConditionsThis file contains the transcripts per million (TPM) data, five different treatment types (ABA, Salt, Dehydration, Mannitol, and Cold).File 3 - TN-Ratio Data for Gene Expression under Stress ConditionsThis file contains the TN-ratio data, which represents the ratio of gene expression between stress-treated (T) and non-treated (N) samples.File 4 - TN-Score Data for Gene Expression under Stress ConditionsThis file contains the TN-score data, calculated by subtracting the number of downregulated experiments from the number of upregulated experiments. The TN-score was used to assess changes in gene expression under stress conditions across experiments.File 5a - Lists of Upregulated Genes for Each of the Five Stress Treatment TypesThis file contains the lists of upregulated genes identified in the Meta-analysis for each of the five stress treatment types.File 5b - Lists of Downregulated Genes for Each of the Five Stress Treatment TypesThis file contains the lists of downregulated genes identified in the Meta-analysis for each of the five stress treatment types.File 6 - Enrichment Analysis of Differentially Expressed Genes for Five Stress Treatment TypesGene set enrichment analysis of the genes regulated under the five treatments is shown in A–J, indicating upregulated and downregulated genes in the ABA (A, B), salt (C, D), dehydration (E, F), mannitol (G, H), and cold (I, J) treatments, respectively. File 7a - Overlap of Commonly Regulated Genes across ABA, Salt, and Dehydration TreatmentsThis file contains the lists of commonly regulated genes across three stress treatments: ABA, Salt, and Dehydration.File 7b - The Results of Enrichment Analysis for Commonly Regulated Genes across ABA, Salt, and Dehydration TreatmentsThis file contains the results of the enrichment analysis focusing on 166 upregulated and 66 downregulated genes that are commonly regulated across three different stress treatments: ABA, Salt, and Dehydration.File 8a - Overlap of Commonly Upregulated Genes across ABA, Salt, Dehydration, Mannitol, and Cold TreatmentsThis file contains the lists of commonly upregulated genes across five stress treatments: ABA, Salt, Dehydration, Mannitol, and Cold.File 8b - Overlap of Commonly Downregulated Genes across ABA, Salt, Dehydration, Mannitol, and Cold TreatmentsThis file contains the lists of commonly downregulated genes across five stress treatments: ABA, Salt, Dehydration, Mannitol, and Cold.File 9a - Overlap of Commonly Upregulated Genes across ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia TreatmentsThis file contains the lists of commonly upregulated genes across six stress treatments: ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia.File 9b - Overlap of Commonly Downregulated Genes across ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia TreatmentsThis file contains the lists of commonly downregulated genes across six stress treatments: ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia.

  14. Z

    GEO gene expression dataset recompute for selected tumor samples

    • data.niaid.nih.gov
    Updated May 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Visentin, Luca (2024). GEO gene expression dataset recompute for selected tumor samples [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10817923
    Explore at:
    Dataset updated
    May 13, 2024
    Dataset authored and provided by
    Visentin, Luca
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

    All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

    Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

    Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:

    geo_accession: The GEO sample ID of the sample.

    ena_sample: The ENA sample ID of the sample.

    ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

    The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.

    Pipeline Details

    The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.

    Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.

  15. D

    Metadata for: ‘Long-read sequencing identifies copy-specific markers of SMN...

    • dataverse.nl
    txt, xlsx
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ewout Groen; Ewout Groen (2025). Metadata for: ‘Long-read sequencing identifies copy-specific markers of SMN gene conversion in spinal muscular atrophy’ [Dataset]. http://doi.org/10.34894/G7YG0V
    Explore at:
    xlsx(17140), txt(2141)Available download formats
    Dataset updated
    Feb 27, 2025
    Dataset provided by
    DataverseNL
    Authors
    Ewout Groen; Ewout Groen
    License

    https://dataverse.nl/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34894/G7YG0Vhttps://dataverse.nl/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34894/G7YG0V

    Description

    Description This DataverseNL item contains the metadata of the Nanopore sequencing dataset and limited clinical data used in ‘Long-read sequencing identifies copy-specific markers of SMN gene conversion in spinal muscular atrophy’. Access to this data is restricted due to privacy regulations; conditions and instructions for access are listed below. Abstract Background: The complex 2 Mb survival motor neuron (SMN) locus on chromosome 5q13, including the spinal muscular atrophy (SMA)-causing gene SMN1 and modifier SMN2, remains incompletely resolved due to numerous segmental duplications. Variation in SMN2 copy number, presumably influenced by SMN1 to SMN2 gene conversion, affects disease severity, though SMN2 copy number alone has insufficient prognostic value due to limited genotype-phenotype correlations. With advancements in newborn screening and SMN-targeted therapies, identifying genetic markers to predict disease progression and treatment response is crucial. Progress has thus far been limited by methodological constraints. Methods: To address this, we developed HapSMA, a method to perform polyploid phasing of the SMN locus to enable copy-specific analysis of SMN and its surrounding genes. We used HapSMA on publicly available Oxford Nanopore Technologies (ONT) sequencing data of 29 healthy controls and performed long-read, targeted ONT sequencing of the SMN locus of 31 patients with SMA. Results: In healthy controls, we identified single nucleotide variants (SNVs) specific to SMN1 and SMN2 haplotypes that could serve as gene conversion markers. Broad phasing including the NAIP gene allowed for a more complete view of SMN locus variation. Genetic variation in SMN2 haplotypes was larger in SMA patients. 42% of SMN2 haplotypes of SMA patients showed varying SMN1 to SMN2 gene conversion breakpoints, serving as direct evidence of gene conversion as a common genetic characteristic in SMA and highlighting the importance of inclusion of SMA patients when investigating the SMN locus. Conclusions: Our findings illustrate that both methodological advances and the analysis of patient samples are required to advance our understanding of complex genetic loci and address critical clinical challenges. Github The code for HapSMA is available at: https://github.com/UMCUGenetics/HapSMA (v1.0.0 was used for analyses in this study, v1.1.0 contains extra support for different types of data input). The code for analyses subsequent to HapSMA and input files used in these analyses are available at: https://github.com/UMCUGenetics/ManuscriptSMNGeneConversion. IRB approval The study protocol (09307/NL29692.041.09) was approved by the Medical Ethical Committee of the University Medical Center Utrecht and registered at the Dutch registry for clinical studies and trials (https://www.ccmo.nl/). Written informed consent was obtained from all adult patients, and from patients and/or parents additionally in case of children younger than 18 years old. Contact information Requests for data can be made by contacting the principal investigators of this study, Ludo van der Pol (w.l.vanderPol@umcutrecht.nl), Gijs van Haaften (G.vanHaaften@umcutrecht.nl) or Ewout Groen (e.j.n.groen-3@umcutrecht.nl) at University Medical Center Utrecht UMC Utrecht Brain Center Heidelberglaan 100 3584 CX Utrecht The Netherlands Expected response time for processing a data sharing agreement is 4 to 6 weeks.

  16. Z

    KG for heart failure gene expression data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luca Farinola (2023). KG for heart failure gene expression data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7790930
    Explore at:
    Dataset updated
    Apr 2, 2023
    Dataset authored and provided by
    Luca Farinola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre processed gene expression data for different heart failure. Includes count table, gene patiens metadata, gene lenght

  17. m

    NCBI accession metadata for 18S rRNA gene tag sequences from DNA and RNA...

    • darchive.mblwhoilibrary.org
    • bco-dmo.org
    • +1more
    pdf, text/tsv, txt +2
    Updated Jul 24, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah K Hu; David Caron (2019). NCBI accession metadata for 18S rRNA gene tag sequences from DNA and RNA from samples collected in coastal California in 2013 and 2014 [Dataset]. https://darchive.mblwhoilibrary.org/entities/publication/438f7d51-f9e5-5c8d-b797-b10f4b04156a
    Explore at:
    pdf, xml, text/tsv, zip, txtAvailable download formats
    Dataset updated
    Jul 24, 2019
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO). Contact: bco-dmo-data@whoi.edu
    Authors
    Sarah K Hu; David Caron
    Area covered
    Description

    NSF Division of Ocean Sciences (NSF OCE) OCE-1737409

  18. Z

    Historical NCI Genomic Data Commons data (09-14-2017)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seim, Inge (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1186944
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Seim, Inge
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

    TCGA-COAD.GDC_phenotype.tsv

    dataset: phenotype - Phenotype

    cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata samples570 version11-27-2017 hubhttps://gdc.xenahubs.net type of dataphenotype authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90 raw datahttps://api.gdc.cancer.gov/data/ input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix) 570 samples X 151 identifiersAll IdentifiersAll Samples

    TCGA-COAD.htseq_fpkm-uq.tsv

    dataset: gene expression RNAseq - HTSeq - FPKM-UQ

    cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata samples512 version09-14-2017 hubhttps://gdc.xenahubs.net type of datagene expression RNAseq unitlog2(fpkm-uq+1) platformIllumina ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80 raw datahttps://api.gdc.cancer.gov/data/ wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed. input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix) 60,484 identifiers X 512 samples

  19. d

    TWIS meta-analyzed summary statistics

    • search.dataone.org
    • datadryad.org
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Evans (2023). TWIS meta-analyzed summary statistics [Dataset]. http://doi.org/10.5061/dryad.866t1g1tw
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Luke Evans
    Time period covered
    Dec 9, 2022
    Description

    It remains unknown to what extent gene-gene interactions contribute to complex traits. Here, we introduce a new approach using predicted gene expression to perform exhaustive transcriptome-wide interaction studies (TWISs) for multiple traits across all pairs of genes expressed in several tissue types. Using imputed transcriptomes, we simultaneously reduce the computational challenge and improve interpretability and statistical power. We discover and replicate several interaction associations and find several hub genes with numerous interactions. We also demonstrate that TWIS can identify novel associated genes because genes with many or strong interactions have smaller single-locus model effect sizes. Finally, we develop a method to test gene set enrichment of TWIS associations (E-TWIS), finding numerous pathways and networks enriched in interaction associations. Epistasis is likely widespread, and our procedure represents a tractable framework for beginning to explore gene interactions...

  20. Data from: The new bioinformatics: integrating ecological data from the gene...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    csv
    Updated May 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman; Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman (2022). Data from: The new bioinformatics: integrating ecological data from the gene to the biosphere [Dataset]. http://doi.org/10.5061/dryad.qb0d6
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 30, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman; Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Bioinformatics, the application of computational tools to the management and analysis of biological data, has stimulated rapid research advances in genomics through the development of data archives such as GenBank, and similar progress is just beginning within ecology. One reason for the belated adoption of informatics approaches in ecology is the breadth of ecologically pertinent data (from genes to the biosphere) and its highly heterogeneous nature. The variety of formats, logical structures, and sampling methods in ecology create significant challenges. Cultural barriers further impede progress, especially for the creation and adoption of data standards. Here we describe informatics frameworks for ecology, from subject-specific data warehouses, to generic data collections that use detailed metadata descriptions and formal ontologies to catalog and cross-reference information. Combining these approaches with automated data integration techniques and scientific workflow systems will maximize the value of data and open new frontiers for research in ecology.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Helen Nigussie (2023). Metadata in fecundity gene polymorphism for Ethiopian sheep [Dataset]. http://doi.org/10.17632/39bb2vh37n.3

Metadata in fecundity gene polymorphism for Ethiopian sheep

Explore at:
Dataset updated
Feb 9, 2023
Authors
Helen Nigussie
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Ethiopia
Description

The current study was done to identify whether there is fecundity gene polymorphism or not in indigenous sheep and its association with litter size. The dataset has three parts. Metadata _file 1: It comprises genotype data generated from five locus linked to fecundity gene mutation in Ethiopian indigenous sheep for polymorphism analysis. Metadata File2: It Comprise genotype data, litter size data in parity 1 and parity2 for association analysis. High genetic diversity and strong association with litter size were observed in the current study which will used as a baseline information to design cost effective and sustainable genetic improvement program for commercialization. The information will be used for those who are working in animal genetics and breeding and animal science to repeat the study in other species in the same country/location or same species in different location. Besides, the data could also be integrated with other related genotype data for comparative analysis among different breeds and species of livestock.

Search
Clear search
Close search
Google apps
Main menu