43 datasets found
  1. RNA-Sequencing Part 1 Generation and characterization of a novel mouse model...

    • zenodo.org
    application/gzip, bin +2
    Updated Sep 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucie Perillat; Lucie Perillat (2025). RNA-Sequencing Part 1 Generation and characterization of a novel mouse model of Becker Muscular Dystrophy with a deletion of exons 52 to 55 [Dataset]. http://doi.org/10.5281/zenodo.17087788
    Explore at:
    tsv, application/gzip, txt, binAvailable download formats
    Dataset updated
    Sep 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lucie Perillat; Lucie Perillat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Becker muscular dystrophy (BMD) is a rare X-linked recessive neuromuscular disorder, frequently caused by in-frame deletions in the DMD gene that result in the production of a truncated, yet functional, dystrophin protein. The consequences of BMD-causing in-frame deletions on the organism are difficult to predict, especially in regard to long-term prognosis. Here, we used CRISPR-Cas9 to generate a new Dmd Δ52-55 mouse model by deleting exons 52-55 in the Dmd gene, resulting in a BMD-like in-frame deletion. To delineate the long-term effects of this deletion, we studied these mice over 52 weeks by performing histology and echocardiography analyses and assessing motor functions. To further delineate the effects of the exons 52-55 in-frame deletion, we performed RNA-Seq pre- and post-exercise and identified several differentially expressed pathways that could explain the abnormal muscle phenotype observed at 52 weeks in the BMD model.

    This dataset shows the results and raw data of the RNA-sequencing and transcriptomic analysis for 52-week-old exercised and non-exercised mice (4 BMD, 4 WT and 4 DMD, as mentioned on the names of each file).

    1. Due to size restrictions, this RNA-Seq dataset will be published on Zenodo in 3 parts. This first part contains the data for the exercised mice, including the fastq (R1 and R2) and associated (md5) files for the 4 BMD mice (15315-15318) and 2 DMD mice (15319 and 15320), all the raw gene counts (txt files), and all the differentially expressed genes (tsv files).

    Workflow (performed by TCAG at SickKids):

    2. RNA-Seq Library and Reference Genome Information

    Type of library: stranded, paired end

    Genome reference sequence: GRCm39, M31 Gencode gene models.

    3. Read Pre-processing, Alignment and Obtaining Gene Counts

    3.1 Read Pre-processing

    The sequencing data is in FASTQ format. The quality of the data is assessed using FastQC v.0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).

    Adaptors are trimmed using Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) v. 0.5.0. Trim Galore is running Cutadapt (https://cutadapt.readthedocs.org/en/stable/) v. 1.10. Trim Galore is run with the following parameters:

    -q 25 – the reads are trimmed from the 3' end base by base, trimming stops if the quality of the base is greater than 25;

    --clip_R1 6, --clip_R2 6 – clip the first 6 nucleotides from the 5' ends of read 1 and read 2;

    --stringency 5 – at least 5 nucleotides overlap with the Illumina primer sequence are needed for trimming;

    --length 40 – any read that is shorter than 40 nucleotides as a result of trimming is discarded;

    --paired – only pairs of reads are retained (for paired-end reads only, not for single reads).

    The type of adaptor is automatically detected by screening the first 1 million sequences of the first specified file for the first 12/13 nucleotides of the standard Illumina or Nextera primers and the sequence from the start of the primer to the 3' end of the read is trimmed.

    The quality of the trimmed reads is re-assessed with FastQC.

    The trimmed reads are also screened for presence of rRNA and mtRNA sequences using FastQ-Screen v.0.10.0 (http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/).

    To assess the read distribution, positional read duplication and to confirm the strandedness of the alignments we use the RSeQC package (http://rseqc.sourceforge.net/), v. 2.6.2. The distribution of reads across exonic, intronic and intergenic sequences is assessed by the read_distribution.py program, infer_experiment.py is used for confirming strandedness, and read_duplication.py is used to obtain the positional read duplication (percentage of reads mapping to exactly the same genomic location). Sufficient proportion of reads should map to the exonic sequences (ideally > 70-80%). Large amounts of reads mapping to intronic sequences in a poly-A mRNA library will suggest significant presence of pre-mRNA or other issues with RNA preparation. For stranded RNA-seq experiments the majority of the reads should map exclusively to one strand, same or opposite to the transcript, depending on the library preparation method. For non-stranded experiments the reads should be equally distributed to both strands.

    3.2. Read Alignment

    The raw trimmed reads are aligned to the reference genome using the STAR aligner, v.2.6.0c. (https://github.com/alexdobin/STAR, https://academic.oup.com/bioinformatics/article/29/1/15/272537). The alignments are contained in the .bam files. The “.bam” together with the “.bai” files can be used for viewing of the alignments in the Integrative Genomics Viewer (IGV, http://software.broadinstitute.org/software/igv/).

    3.3. Obtaining Gene Counts

    The filtered STAR alignments are processed to extract raw read counts for genes using htseq-count v.0.6.1p2 (HTSeq, http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html). Assigning reads to genes by htseq-count is done in the mode “intersection_nonempty”, i.e. if a read overlaps with two overlapping genes and the overlap to gene A is greater than the overlap to gene B, the read is counted towards gene A, while if a read overlaps equally with gene A and gene B, then it is not counted towards either gene. Htseq_count does not count reads with multiple alignments to avoid introducing bias in the expression results. Only uniquely mapping reads are counted.

    4. Pre-processing, Alignment and Gene Counts QC

    MultiQC (https://multiqc.info/) is a reporting tool that aggregates statistics generated by bioinformatics analyses across multiple samples. MultiQC v. 1.14 was used to generate a consolidated report from FastQC screening of both untrimmed and trimmed reads, and from RSeQC, FastQ Screen, STAR and htseq-count results. The MultiQC report is contained in MultiQC_Report_*.html file.

    5. DGE Analysis with edgeR

    Differential expression was done with the edgeR R package v.3.28.1, using R v.3.6.1 (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html). The data set was filtered to retain only genes whose gene counts were >50 in at least 3 samples. This is intended to remove genes that are notexpressed, or expressed at a very low level.

    The method used for normalizing the data was TMM, implemented by the calcNormFactors(y) function. All samples were normalized and filtered together. The glmLRT functionality in edgeR was used for the differential expression tests, with sample group taken into account.

    EdgeR Results Legend:

    · GeneID – Ensembl Gene ID;

    · Chr.Start.End - gene coordinates;

    · GeneName, GeneType, etc. – Gene attributes, derived from the genome annotation;

    · logFC - Log2 Fold Change (use this column for selection of DEGs);

    · logCPM - Log2 Counts Per Million, average for all libraries;

    · LR – Statistic calculated by the LR-Test;

    · PValue - Differential expression P value;

    · FDR – Differential expression False Discovery Rate, calculated by the Benjamini-Hochberg method (use this column for selection of DEGs);

    · (columns labeled with sample names) – Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) for the given samples.

  2. Results of Data analysis of RNA-Seq

    • figshare.com
    xlsx
    Updated Jan 11, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kiichi Hirota (2018). Results of Data analysis of RNA-Seq [Dataset]. http://doi.org/10.6084/m9.figshare.5353462.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 11, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kiichi Hirota
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data analysis of RNA-Seq FASTQ files for RCC4-EV cells (DRR100656) and RCC4-VHL cells (DRR100657) were obtained from the Sequence Read Archive (https://trace.ddbj.nig.ac.jp/dra/index_e.html). The quality of sequence data was evaluated by FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) after the trimming process by fastx_toolkit v 0.0.14 (http://hannonlab.cshl.edu/fastx_toolkit/). The human reference sequence file (hs37d5.fa) was downloaded from the 1000 genome ftp site (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/), and the annotated general feature format (gff) file was downloaded from the Illumina iGenome ftp site (ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/NCBI/build37.2/). The human genome index was constructed with bowtie-build in Bowtie v.2.2.9. The fastq files were aligned to the reference genomic sequence by TopHat v.2.1.1 with default parameters. Bowtie2 v2.2.9 and Samtools v.1.3.1 was used with the TopHat program47. Estimation of transcript abundance was calculated, and the count values were normalized to the upper quartile of the fragments per kilobase of transcript per million fragments mapped reads (FPKM) using Cufflinks (cuffdiff) v2.1.1. cuffdiff output (gene_exp. diff) was presentated (gene_exp.diff.txt).

  3. n

    Data from: RNA-seq of Arabidopsis root growth responses to mechanical...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Mar 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keith Lindsey; Amy Jacobsen; Jian Xu; Jennifer Topping; George Jervis (2021). RNA-seq of Arabidopsis root growth responses to mechanical impedance [Dataset]. http://doi.org/10.5061/dryad.wpzgmsbk0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 8, 2021
    Dataset provided by
    Radboud University Nijmegen
    Durham University
    Authors
    Keith Lindsey; Amy Jacobsen; Jian Xu; Jennifer Topping; George Jervis
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. The growth and development of root systems, essential for plant performance, is influenced by mechanical properties of the substrate in which the plants grow. Mechanical impedance, such as by compacted soil, can reduce root elongation and limit crop productivity.

    2. To understand better the mechanisms involved in plant root responses to mechanical impedance stress, we investigated changes in the root transcriptome and hormone signalling responses of Arabidopsis to artificial root barrier systems in vitro.

    3. We demonstrate that upon encountering a barrier, reduced Arabidopsis root growth and the characteristic 'step-like' growth pattern is due to a reduction in cell elongation associated with changes in signalling gene expression. Data from RNA-sequencing combined with reporter line and mutant studies identified essential roles for reactive oxygen species, ethylene and auxin signalling during the barrier response.

    4. We propose a model in which early responses to mechanical impedance include reactive oxygen signalling integrated with ethylene and auxin responses to mediate root growth changes. Inhibition of ethylene responses allows improved growth in response to root impedance, an observation that may inform future crop breeding programmes.

    Methods 20 mg of tissue was ground in liquid nitrogen using TissueLyser II (QIAGEN, Manchester, UK) and RNA extracted using the Qiagen ReliaPrepTM RNA Tissue Miniprep System. RNA quality was determined using the NanoDrop ND-1000 spectrophotometer (ThermoFisher Scientific) and Agilent 2200 TapeStation. Libraries were constructed from 100 ng and 1 μg total RNA using the NEBNext UltraTM Directional RNA Library Prep Kit for Illumina for use with the NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB, Hitchin, UK). mRNA was isolated, fragmented and primed, cDNA was synthesised and end prep was performed. NEBNext Adaptor was ligated and the ligation reaction was purified using AMPure XP Beads. PCR enrichment of adaptor ligated DNA was conducted using NEBNext Multiplex Oligos for Illumina (Set 1, NEB#E7335). The PCR reaction was purified using Agencourt AMPure XP Beads. Library quality was then assessed using a DNA analysis ScreenTape on the Agilent Technologies 2200 TapeStation. qPCR was used for sample quantification using NEBNext® Library Quant Kit Quick Protocol Quant kit for Illumina. Samples were diluted to 10 nM. 7 μl of each 10 nM sample was pooled together and all were run on two lanes using an Illumina HiSeq2500 (DBS Genomics facility, Durham University). Approximately 30M unique paired-end 125bp reads were carried per sample. Primers were designed using Primer-BLAST (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) and synthesised by MWG Eurofins (http://www.eurofinsdna.com/). FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to assess read quality and Trimmomatic (Bolger et al., 2014) was used to cut down and remove low quality reads. Salmon (Patro et al., 2017) was used for quasi-mapping of reads against the AtRTD2-QUASI (Brown et al., 2017; Zhang et al., 2017) transcriptome and to estimate transcript-level abundances. The tximport R package (Soneson et al., 2016) was used to import transcript-level abundance, estimate counts and transcript lengths, and summarise into matrices for downstream analysis in R. Before differential expression analysis, low quality reads were filtered out of the data set. Only genes with a count per million of 0.744 in 6 or more samples were retained. The DESeq2 (Love et al., 2014) R package was used to estimate variance-mean dependence in count data and test for differential expression (using the negative binomial distribution model). A padj-value of ≤0.05 and a log2fold change of ≥0.5 were selected to identify differentially expressed genes (DEGs). The 3D RNA-Seq online App (Guo et al., 2019; Calixto et al., 2018) was used for independent verification of estimated DEGs and for differential alternative splicing analysis.

  4. d

    Selective Functions of Individual Zinc Fingers Within the DNA-Binding Domain...

    • datamed.org
    Updated Feb 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Selective Functions of Individual Zinc Fingers Within the DNA-Binding Domain of Ikaros (RNA-seq: Thymocytes) [Dataset]. https://datamed.org/display-item.php?repository=0008&id=5914e1845152c67771b40fb8&query=ZNF3&datatypes=Unspecified
    Explore at:
    Dataset updated
    Feb 1, 2021
    Description

    The C2H2 zinc finger is the most prevalent DNA-binding motif in the mammalian proteome, with DNA-binding domains usually containing more tandem fingers than are needed for stable sequence-specific DNA recognition. To examine the reason for the frequent presence of multiple zinc fingers, we generated mice lacking finger 1 or finger 4 of the 4-finger DNA-binding domain of Ikaros, a critical regulator of lymphopoiesis and leukemogenesis. Each mutant strain exhibited a specific subset of the phenotypes observed with Ikaros null mice. Of particular relevance, fingers 1 and 4 contributed to distinct stages of B- and T-cell development and finger 4 was selectively required for tumor suppression in thymocytes and in a new model of BCR-ABL+ acute lymphoblastic leukemia. These results, combined with transcriptome profiling (this GEO submission: RNA-Seg of whole thymus from wt and the two ZnF mutants), reveal that different subsets of fingers within multi-finger transcription factors can regulate distinct target genes and biological functions, and they demonstrate that selective mutagenesis can facilitate efforts to elucidate the functions and mechanisms of action of this prevalent class of factors. Overall design: RNA-Seq from Whole Thymus comparing wt (3 replicates), Ikaros-ZnF1-/- mutant (2 replicates) and Ikaros-ZnF4-/- mutant (2 replicates) RPKM_Thymocytes.txt (linked below as a supplementary file) reports the relative mRNA expression levels (RPKM)values for all annotated Refseq genes that had at least one read in at least one of the samples, with duplicates for the same gene (different transcripts for same gene) filtered out. RPKM (Mortazavi et al., 2008) were calculated based on exonic reads obtained by using the software SeqMonk (Babraham Bioinformatics) and reference genome annotations from NCBI (mm9).

  5. d

    Genomic data reveal the biogeographic and demographic history of Ammospiza...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Walsh; Adrienne Kovach; Phred Benham; Gemma Clucas; Virginia Winder; Irby Lovette (2022). Genomic data reveal the biogeographic and demographic history of Ammospiza sparrows in northeast tidal marshes [Dataset]. http://doi.org/10.5061/dryad.73n5tb2x6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2022
    Dataset provided by
    Dryad
    Authors
    Jennifer Walsh; Adrienne Kovach; Phred Benham; Gemma Clucas; Virginia Winder; Irby Lovette
    Time period covered
    May 18, 2021
    Description

    Aim: Shaped by both climate change and sea-level rise, tidal salt marshes represent ephemeral systems that are home to only a few, highly specialized species. The dynamic ecological histories and spatial complexities of these habitats, however, render it challenging to reconstruct the complete biogeographic histories of their endemic taxa. Here, we leverage three species of North American Ammospiza sparrows that inhabit tidal marshes ( Ammospiza caudacuta, A. maritima, and A. n. subvirgatus) and closely related freshwater species to demonstrate the utility of whole-genome data in resolving demographic and evolutionary history as it relates to divergence and dispersal events in ephemeral ecosystems. We employ a combination of demographic and biogeographic reconstructions to shed new light on the colonization history of freshwater-saline environments in this system.

    Location: North America

    Taxon: Ammospiza Sparrows

    Methods: We sequenced whole genomes from Ammospiza sparrows to address...

  6. E

    [IODP360 - iTAG and metatranscriptome data] - Supplementary Table 4C:...

    • erddap.bco-dmo.org
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BCO-DMO (2020). [IODP360 - iTAG and metatranscriptome data] - Supplementary Table 4C: Statistics of reads retained through bioinformatic processing of iTAG data for the 11 samples and control samples and metatranscriptome data. (Collaborative Research: Delineating The Microbial Diversity and Cross-domain Interactions in The Uncharted Subseafloor Lower Crust Using Meta-omics and Culturing Approaches) [Dataset]. https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_813173/index.html
    Explore at:
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Biological and Chemical Oceanographic Data Management Office (BCO-DMO)
    Authors
    BCO-DMO
    License

    https://www.bco-dmo.org/dataset/813173/licensehttps://www.bco-dmo.org/dataset/813173/license

    Area covered
    Variables measured
    depth, iTAG_OTU, iTAG_Raw, latitude, Sample_ID, longitude, Metatr_Raw, iTAG_Paired_QC, Metatr_Paired_QC, Metatr_Reads_Remaining, and 2 more
    Description

    Supplementary Table 4C: Metatranscriptome data summary for cellular activities presented and statistics on sequencing and removal of potential contaminant sequences: Statistics of reads retained through bioinformatic processing of iTAG data for the 11 samples and control samples and metatranscriptome data. Samples taken on board of the R/V JOIDES Resolution between November 30, 2015 and January 30, 2016 access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=Rock material was crushed while still frozen in a Progressive Exploration Jaw Crusher (Model 150) whose surfaces were sterilized with 70% ethanol and RNase AWAY (Thermo Fisher Scientific, USA) inside a laminar flow hood. Powdered rock material was returned to the -80\u00b0C freezer until extraction.

    DNA was extracted from 20, 30, or 40 grams of powdered rock material, depending on the quantity of rock available. A DNeasy PowerMax Soil Kit (Qiagen, USA) was used following the manufacturer\u2019s protocol modified to included three freeze/thaw treatments prior to the addition of Soil Kit solution C1. Each treatment consisted of 1 minute in liquid nitrogen followed by 5 minutes at 65 \u00b0C. DNA extracts were concentrated by isopropanol precipitation overnight at 4\u00b0C.

    The low biomass in our samples required whole genome amplification (WGA) prior to PCR amplification of marker genes. Genomic DNA was amplified by Multiple Displacement Amplification (MDA) using the REPLI-g Single Cell Kit (Qiagen) as directed. MDA bias was minimized by splitting each WGA sample into triplicate 16 \u03bcL reactions after 1 hr of amplification and then resuming amplification for the manufacturer-specified 7 hrs (8 hrs total).

    DNA was also recovered from samples of drilling mud and drilling fluid (surface water collected during the coring process) for negative controls, as well as two \u201ckit control\u201d samples, in which no sample was added, to account for any contaminants originating from either the DNeasy PowerMax Soil Kit or the REPLI-g Single Cell Kit.

    Bacterial SSU rRNA gene fragments were PCR amplified from MDA samples and sequenced at Georgia Genomics and Bioinformatics Core (Univ. of Georgia). The primers used were: Bac515-Y and Bac926R. Dual-indexed libraries were prepared with (HT) iTruS (Kappa Biosystems) chemistry and sequencing was performed on an Illumina MiSeq 2 x 300 bp system with all samples combined equally on a single flow cell.

    Raw sequence reads were processed through Trim Galore [http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/], FLASH (ccb.jhu.edu/software/FLASH/) and FASTX Toolkit [http://hannonlab.cshl.edu/fastx_toolkit/] for trimming and removal of low quality/short reads.

    Quality filtering included requiring a minimum average quality of 25 and rejection of paired reads less than 250 nucleotides.

    Operational Taxonomic Unit (OTU) clusters were constructed at 99% similarity with the script pick_otus.py within the Quantitative Insights Into Microbial Ecology (QIIME) v.1.9.1 software and \u2018uclust\u2019. Any OTU that matched an OTU in one of our control samples (drilling fluids, drilling mud, extraction and WGA controls) was removed (using filter_otus_from_otu_table.py) along with any sequences of land plants and human pathogens that may have survived the control filtering due to clustering at 99% (filter_taxa_from_otu_table.py). As an additional quality control measure, genera that are commonly identified as PCR contaminants were removed. Unclassified OTUs were queried using BLAST against the GenBank nr database and further information about these OTUs is provided in the Supplementary Discussion text under the section \u201cTaxonomic diversity information from iTAGs.\u201d OTUs that could not be assigned to Bacteria or Archaea were removed from further analysis. For downstream analyses, any OTUs not representing more than 0.01% of relative abundance of sequences overall were removed as those are unlikely to contribute significantly to in situ communities. The OTU data table was transformed to a presence/absence table and the Jaccard method was used to generate a distance matrix using the dist.binary() function in the R package ade4. awards_0_award_nid=709555 awards_0_award_number=OCE-1658031 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1658031 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=David L. Garrison awards_0_program_manager_nid=50534 cdm_data_type=Other comment=Supplementary Table 4C: iTAG PI: Virginia Edgcomb
    Data Version 1: 2020-05-28 Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 dataset_current_state=Final and no updates defaultDataQuery=&time<now doi=10.26008/1912/bco-dmo.813173.1 Easternmost_Easting=57.278183 geospatial_lat_max=-32.70567 geospatial_lat_min=-32.70567 geospatial_lat_units=degrees_north geospatial_lon_max=57.278183 geospatial_lon_min=57.278183 geospatial_lon_units=degrees_east geospatial_vertical_max=747.7 geospatial_vertical_min=10.7 geospatial_vertical_positive=down geospatial_vertical_units=m infoUrl=https://www.bco-dmo.org/dataset/813173 institution=BCO-DMO instruments_0_acronym=Automated Sequencer instruments_0_dataset_instrument_description=DNA sequencing performed using the Illumina MiSeq 2 x 300 bp platform (Univ. of Georgia) instruments_0_dataset_instrument_nid=813183 instruments_0_description=General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step. instruments_0_instrument_name=Automated DNA Sequencer instruments_0_instrument_nid=649 instruments_0_supplied_name=Illumina MiSeq 2 x 300 bp platform metadata_source=https://www.bco-dmo.org/api/dataset/813173 Northernmost_Northing=-32.70567 param_mapping={'813173': {'Latitude': 'flag - latitude', 'Depth': 'flag - depth', 'Longitude': 'flag - longitude'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/813173/parameters people_0_affiliation=Woods Hole Oceanographic Institution people_0_affiliation_acronym=WHOI people_0_person_name=Virginia P. Edgcomb people_0_person_nid=51284 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=Woods Hole Oceanographic Institution people_1_affiliation_acronym=WHOI people_1_person_name=Virginia P. Edgcomb people_1_person_nid=51284 people_1_role=Contact people_1_role_type=related people_2_affiliation=Woods Hole Oceanographic Institution people_2_affiliation_acronym=WHOI BCO-DMO people_2_person_name=Karen Soenen people_2_person_nid=748773 people_2_role=BCO-DMO Data Manager people_2_role_type=related project=Subseafloor Lower Crust Microbiology projects_0_acronym=Subseafloor Lower Crust Microbiology projects_0_description=NSF abstract: The lower ocean crust has remained largely unexplored and represents one of the last frontiers for biological exploration on Earth. Preliminary data indicate an active subsurface biosphere in samples of the lower oceanic crust collected from Atlantis Bank in the SW Indian Ocean as deep as 790 m below the seafloor. Even if life exists in only a fraction of the habitable volume where temperatures permit and fluid flow can deliver carbon and energy sources, an active lower oceanic crust biosphere would have implications for deep carbon budgets and yield insights into microbiota that may have existed on early Earth. This is all of great interest to other research disciplines, educators, and students alike. A K-12 education program will capitalize on groundwork laid by outreach collaborator, A. Martinez, a 7th grade teacher in Eagle Pass, TX, who sailed as outreach expert on Drilling Expedition 360. Martinez works at a Title 1 school with ~98% Hispanic and ~2% Native American students and a high number of English Language Learners and migrants. Annual school visits occur during which the project investigators present hands on-activities introducing students to microbiology, and talks on marine microbiology, the project, and how to pursue science related careers. In addition, monthly Skype meetings with students and PIs update them on project progress. Students travel to the University of Texas Marine Science Institute annually, where they get a campus tour and a 3-hour cruise on the R/V Katy, during which they learn about and help with different oceanographic sampling approaches. The project partially supports two graduate students, a Woods Hole undergraduate summer student, the participation of multiple Texas A+M undergraduate students, and 3 principal investigators at two institutions, including one early career researcher who has not previously received NSF support of his own. Given the dearth of knowledge of the lower oceanic crust, this project is poised to transform our understanding of life in this vast environment. The project assesses metabolic functions within all three domains of life in this crustal biosphere, with a focus on nutrient cycling and evaluation of connections to other deep marine microbial habitats. The lower ocean crust represents a potentially vast biosphere whose microbial constituents and the biogeochemical cycles they mediate are likely linked to deep ocean processes through faulting and subsurface fluid flow. Atlantis Bank represents a tectonic

  7. d

    Genomic differentiation and local adaptation on a microgeographic scale in a...

    • datadryad.org
    • borealisdata.ca
    • +5more
    zip
    Updated Sep 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Walsh; Stepfanie Aguillon; Yvonne Chan; Peter Arcese; Phred Benham; Irby Lovette; Chloe Mikles (2020). Genomic differentiation and local adaptation on a microgeographic scale in a resident songbird [Dataset]. http://doi.org/10.5061/dryad.ncjsxkssb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 1, 2020
    Dataset provided by
    Dryad
    Authors
    Jennifer Walsh; Stepfanie Aguillon; Yvonne Chan; Peter Arcese; Phred Benham; Irby Lovette; Chloe Mikles
    Time period covered
    Sep 1, 2020
    Description

    All sample information for individuals included in this VCF can be found in Supporting Information Table S1. This is the filtered VCF used.

  8. d

    Data from: The cacao gene atlas: A transcriptome developmental atlas reveals...

    • dataone.org
    • data.niaid.nih.gov
    • +3more
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Guiltinan (2024). The cacao gene atlas: A transcriptome developmental atlas reveals highly tissue-specific and dynamically-regulated gene networks in Theobroma cacao L [Dataset]. http://doi.org/10.5061/dryad.0k6djhb59
    Explore at:
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Mark Guiltinan
    Time period covered
    Sep 5, 2023
    Description

    A large dataset of replicated transcriptomes was developed to accelerate Theobroma cocoa genomics research with the long-term goal of progressing breeding towards developing high-yielding elite varieties of cacao. RNAs were extracted and transcriptomes were sequenced from 123 different tissues and stages of development representing major organs and developmental stages of the cacao lifecycle. In addition, several experimental treatments and time courses were performed to measure gene expression in tissues responding to biotic and abiotic stressors. Samples were collected in replicates (3-5) to enable statistical analysis of gene expression levels for a total of 390 transcriptomes. We describe the creation of the atlas,and its global characterization and define sets of genes co-regulated in highly organ- and temporally-specific manners. To promote wider use of these data, all raw sequencing data, expression read mapping matrices, scripts, and other information used to create the resourc..., RNA was extracted form about 400 different tissues/treatments and replicates. Transcriptome sequencing was performed by Quant Seq (Lexogen). Raw QuantSeq reads were first examined with FASTQC (v0.11.9 https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to assess the overall data quality before processing. Reads were then processed using bbduk (BBMap tools v37.76; https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/) to trim the adapter sequences, poly-A tails, and low-quality bases and to discard fragments less than 20 bp in length after trimming. Trimmed reads were mapped to the CCN-51 and SCA6 Theobroma cacao genotype reference genomes using the STAR Aligner version 2.7.5b (Dobin et al. 2013). Expression quantification was performed with featureCounts from the Subread package version 2.0.1 (Liao et al. 2013) in a fractional read-counting mode to prop distribute muti-mapping reads among features using gene annotation GFF3 files modified wit..., Excel or any text editor or spreadsheet program., # The cacao gene atlas: A transcriptome developmental atlas reveals highly tissue-specific and dynamically-regulated gene networks in Theobroma cacao L

    Description of the Data and file structure

    1. The first row lists all tissues, replicates and time points for each sample. The first column lists each cacao gene that was detected. All other cells contain the number of transcripts that were mapped for each gene/sample combination.
    2. CPM counts are normalized by counts per million, they are used on the BAR website
    3. To compare values in the gene expression matrix with the BAR website, be sure to use the CPM read counts
    4. Fractional reads are unnormalized raw reads to be used for downsteam analysis such as DESeq2, do not compare these counts with the counts on BAR
    5. Genotype of the tissue is indicated in the metadata, the data was mapped to multiple genomes which may differ from the genotype of the tissue
    6. Ex: CCN51 tissues were mapped to the CCN51 genome AND SCA6 genome
    7. The ...
  9. f

    Dataset: The potential of genome-wide RAD sequences for resolving rapid...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khan, Gulzar; Zappi, Daniela C.; Franco, Fernando Faria; Ribolla, Paulo Eduardo Martins; Taylor, Nigel; Silva, Gislaine Angélica Rodrigues; Amaral, Danilo Trabuco; Moraes, Evandro Marsola; da Silva Andrade, Sónia Cristina; Eaton, Deren A. R.; Alonso, Diego Peres; Bombonato, Juliana Rodrigues (2020). Dataset: The potential of genome-wide RAD sequences for resolving rapid radiations: a case study in Cactaceae [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000568613
    Explore at:
    Dataset updated
    Jul 28, 2020
    Authors
    Khan, Gulzar; Zappi, Daniela C.; Franco, Fernando Faria; Ribolla, Paulo Eduardo Martins; Taylor, Nigel; Silva, Gislaine Angélica Rodrigues; Amaral, Danilo Trabuco; Moraes, Evandro Marsola; da Silva Andrade, Sónia Cristina; Eaton, Deren A. R.; Alonso, Diego Peres; Bombonato, Juliana Rodrigues
    Description

    The reconstruction of relationships within recently radiated groups is challenging even when massive amounts of sequencing data are available. The use of restriction site-associated DNA sequencing (RAD-Seq) to this end is promising. Here, we assessed the performance of RAD-Seq to infer the species-level phylogeny of the rapidly radiating genus Cereus (Cactaceae). To examine how the amount of genomic data affects resolution in this group, we used distinct datasets and implemented different analyses. We sampled 52 individuals of Cereus, representing 18 of the 25 species currently recognized, plus members of the closely allied genera Cipocereus and Praecereus, and other 11 Cactaceae genera as outgroups. Three scenarios of permissiveness to missing data were carried out in iPyRAD, assembling datasets with 4330% (333 loci), 45% (1440 loci), and 70% (6141 loci) of missing data. For each dataset, Maximum Likelihood (ML) trees were generated using two supermatrices, i.e., only SNPs and SNPs plus invariant sites. Accuracy and resolution were improved when the dataset with the highest number of loci was used (6141 loci), despite the high percentage of missing data included (70%). Coalescent trees estimated using SVDQuartets and ASTRAL are similar to those obtained by the ML reconstructions. Overall, we reconstruct a well-supported phylogeny of Cereus, which is resolved as monophyletic and composed of four main clades with high support in their internal relationships. Our findings also provide insights into the impact of missing data for phylogeny reconstruction using RAD loci. SamplingOur dataset includes 63 samples spanning 52 ingroups of Cereus and 11 outgroups (Table 1). ddRAD library preparation and sequencing 157Genomic DNA was extracted from root tissues using the DNeasy Plant Mini Kit (Qiagen). ddRAD libraries were prepared using high fidelity EcoRI and HPAII restriction enzymes following Campos et al. (2017) and Khan et al. (2019). Details of library preparation and sequencing are shown in Supplementary materialBioinformatics analyses Raw data were trimmed for adapters and quality filtered before SNPs calling. The quality of sequencing data was checked with FastQC 0.11.2 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc), visualized in MultiQC 1.0 (https://github.com/ewels/MultiQC), and filtered with SeqyClean 1.9.12 (Zhbannikov et al., 2017) using the following settings: minimum quality (Phred Score 20), minimum size (>65 bp), and Illumina contaminants (UniVec.fas). We used the iPyRAD pipeline (available at http://github.com/dereneaton/ipyrad) to identify homology among reads, make SNP calls, and format output files. The following parameter settings were implemented: mindepth_majrule = 6 (minimum depth for majority-rule base calling), clust_threshold = 0.85 (clustering threshold for de novo assembly), filter_adapters = 2 (strict filter), max_Hs_consens = 6 (maximum heterozygotes in consensus), min_samples_locus (minimum percentage of samples per locus 184for output). For the latter, values varied in three distinct scenarios concerning the permissiveness to missing data. These scenarios considered that the final set of loci should have at least 39 samples (scenario 1, approximately 30% of missing data), 26 samples (scenario 2, approximately 45% of missing data), or 13 samples (scenario 3, approximately 70% of missing data). After SNP calling, CD-HIT (Li and Godzik, 2006; Fu et al., 2012) was used to identify reverse-complement duplicates in the loci recovered by iPyRAD.

  10. Transcriptomic atlas in Pisum sativum

    • zenodo.org
    bin
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gatepe Cedoine Kodjovi; Ingrid Goma-Louamba; Bouziane MOUMEN; Bouziane MOUMEN; Joan Doidy; Joan Doidy; Gatepe Cedoine Kodjovi; Ingrid Goma-Louamba (2025). Transcriptomic atlas in Pisum sativum [Dataset]. http://doi.org/10.5281/zenodo.15635249
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gatepe Cedoine Kodjovi; Ingrid Goma-Louamba; Bouziane MOUMEN; Bouziane MOUMEN; Joan Doidy; Joan Doidy; Gatepe Cedoine Kodjovi; Ingrid Goma-Louamba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We integrated a total of 149 publicly available RNA-seq libraries from 7 international studies (see atlas_info). These transcriptomes were generated from 10 different pea varieties and covered a wide range of biological conditions, including a comprehensive collection of plant organs, various modalities of abiotic stress (mineral nutrition, water supply and temperature) and biotic interactions (nodule). The raw expression data from the source RNA-seq libraries were re-assembled to the reference genome (Kreplak et al., 2019) and the mean expression value was computed between biological replicates produced in individual studies (see atlas_info), thus providing a transcriptomic atlas of the 44.756 genes in the pea genome across 81 biological conditions.

    Method : RNA-seq data (sequenced reads and fastq files) generated from P. sativum were downloaded from the Sequence Read Archive publicly available at NCBI (Bioproject numbers listed in table info) using SRAtools v3.0.1 (SRA toolkit). A fastp v0.22.0 (Chen et al., 2018) analysis was performed to trim the adapters and filter out reads with a low-quality score, followed by a quality assessment performed using FastQC v0.12.1 (Babraham Bioinformatics). The RNA-seq reads were mapped to the P. sativum reference genome v1a (Kreplak et al., 2019) using STAR v2.7.10b (Dobin et al., 2012). Gene expression table counts were generated using FeatureCounts v2.0.1 (Liao et al., 2013) and normalized by median ratio using the DESeq2 R-package. The transcriptomic atlas comprises expression data of 44.756 genes of the pea genome across 81 biological conditions.

  11. Z

    Recurrent Campylobacter jejuni infections with in vivo selection of...

    • data.niaid.nih.gov
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nunes, Alexandra; Oleastro, Mónica; Alves, Frederico; Liassine, Nadia; Lowe, David M.; Benejat, Lucie; Ducounau, Astrid; Jehanne, Quentin; Borges, Vítor; Gomes, João Paulo; Godbole, Gauri; Lehours, Philippe (2023). Recurrent Campylobacter jejuni infections with in vivo selection of resistance to macrolides and carbapenems – assembly dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7684723
    Explore at:
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Public Health England, London, UK
    Bordeaux Hospital University Centre, Bordeaux, France
    National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal; Lusófona University, Campo Grande Lisbon, Portugal
    National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
    Bordeaux Hospital University Centre, Bordeaux, France; Bordeaux Institute of Oncology, Bordeaux, France
    Institute of Immunity and Transplantation, University College London, London, UK
    Laboratoire Dianalabs, Geneva, Switzerland
    Authors
    Nunes, Alexandra; Oleastro, Mónica; Alves, Frederico; Liassine, Nadia; Lowe, David M.; Benejat, Lucie; Ducounau, Astrid; Jehanne, Quentin; Borges, Vítor; Gomes, João Paulo; Godbole, Gauri; Lehours, Philippe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of the first isolate collected from each clinical case (A1 and B1), together with the respective annotation.

    File “Cje_metadata.xlsx” contains the genome assembly statistics for each isolate, including the European Nucleotide Archive (ENA) accession numbers, genotyping and antibiotic resistance profiles.

    The directory “Assemblies/” contains the genome assembly (.fasta and .gbk formats) of each isolate presented in the metadata file.

    Genome assembly and annotation

    Reads quality control and improvement, species confirmation (using the 8GB database available at https://ccb.jhu.edu/software/kraken/) and de novo assembly were performed using the INNUca v4.2.2 pipeline (https://github.com/B-UMMI/INNUca). Briefly, after reads’ quality analysis using FastQC v0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and cleaning with Trimmomatic v0.38 (http://www.usadellab.org/cms/?page=trimmomatic), genomes were de novo assembled with SPAdes 3.14.0 (http://bioinf.spbau.ru/spades) with a mean depth of coverage above 160x, and subsequently improved using Pilon v1.23. Multi-Locus Sequence Typing (MLST) was performed using mlst v2.18.1 software (https://github.com/tseemann/mlst). Genome annotation was performed with RAST server v2.0 (http://rast.nmpdr.org/).

    The raw sequence reads of each isolate were deposited at ENA under the study accession numbers PRJEB42628 and PRJNA505131.

    Funding

    This work was supported by GenomePT (ref. POCI-01-0145-FEDER-022184) from Fundação para a Ciência e Tecnologia, Portugal.

  12. Additional file 3: Figure S2. of TRAPLINE: a standardized and automated...

    • springernature.figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Markus Wolfien; Christian Rimmbach; Ulf Schmitz; Julia Jung; Stefan Krebs; Gustav Steinhoff; Robert David; Olaf Wolkenhauer (2023). Additional file 3: Figure S2. of TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation [Dataset]. http://doi.org/10.6084/m9.figshare.c.3631766_D9.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Markus Wolfien; Christian Rimmbach; Ulf Schmitz; Julia Jung; Stefan Krebs; Gustav Steinhoff; Robert David; Olaf Wolkenhauer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Visualization for RNA transcript quality control and comparison of per base quality score Q. The images are taken before (A) and after (B) quality trimming procedure (removes reads with Q ≤ 20) to estimate the effect of trimming. The quality score Q is plotted to the read position by using the FastQC package in Galaxy (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The color indicates the quality of the read: "red" low quality, "orange" median quality, "green" good quality. Red line expresses the mean of the measured values (yellow boxes are inter-quartile range) and the blue line represents the mean quality. (ZIP 81 kb)

  13. Supporting data for "The methylome of Biomphalaria glabrata and other...

    • zenodo.org
    bin
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nelia Luviano; Marie Lopez; Fleur Gawehns; Cristian Chaparro; Paola Arimondo; Ludovic Halby; Slavica Ivanovic; Patrice David; Céline Cosseau; Christoph Grunau; Christoph Grunau; Nelia Luviano; Marie Lopez; Fleur Gawehns; Cristian Chaparro; Paola Arimondo; Ludovic Halby; Slavica Ivanovic; Patrice David; Céline Cosseau (2025). Supporting data for "The methylome of Biomphalaria glabrata and other mollusks: enduring modification of epigenetic landscape and phenotypic traits by a new DNA methylation inhibitor" [Dataset]. http://doi.org/10.5281/zenodo.4277533
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nelia Luviano; Marie Lopez; Fleur Gawehns; Cristian Chaparro; Paola Arimondo; Ludovic Halby; Slavica Ivanovic; Patrice David; Céline Cosseau; Christoph Grunau; Christoph Grunau; Nelia Luviano; Marie Lopez; Fleur Gawehns; Cristian Chaparro; Paola Arimondo; Ludovic Halby; Slavica Ivanovic; Patrice David; Céline Cosseau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Methylome of the fresh water snail Biomphalaria glabrata. DNA was extracted from the feet of 10 individuals of B. glabrata originally isolated from Brazil. These snails have been cultivated in the laboratory since 1960. Tissue were grinded at 4°C and incubated in 1 ml volume of lysis buffer (20 mM TRIS pH 8; 1 mM EDTA; 100 mM NaCl; 0.5% SDS), with 0.3 mg of proteinase K at 55°C for 1 night. Afterwards, lysate was purified with phenol-chloroform and DNA was isopropanol precipitated. The extracted DNA (around 138ng/µL) was poled in equivalent amounts and Whole Genome Bisulfite Sequencing was done by GATC-biotech (www.gatc-biotech.com). The principle of this treatment is to convert non-methylated cytosines of gDNA into deoxy-uracil, whereas methylated cytosines remain intact. WGBS was done according to the Lister protocol (sequence 2 forward strands only). The reference genome (Biomphalaria-glabrata-BB02_SCAFFOLDS_BglaB1.fa) and annotation (Biomphalaria-glabrata-BB02_BASEFEATURES_BglaB1.3.gff3) used in this project are available on VectorBase (https://www.vectorbase.org/). To align our short reads, we chose to use two specific bisulfite mapping tools, BSMAP 1.0.0 (https://code.google.com/p/bsmap/) and Bismark 0.10.2 (www.bioinformatics.babraham.ac.uk /projects/bismark/), to compare their efficiency and convenience to finally work with the more suitable one on our datasets. IGV (Interactive Genomics Viewer, https://www.broadinstitute.org/igv/) was used to visualized final alignments.
    BSMAP performed better than Bismark and was used for downstream analyses. Without default parameters alignement efficiency for BSMAP is 47.1%, allowing for 2 mismatches increases it to 55.6%. Methylation occurs predominantly in CpGs. (C methylated in CpG context: 12.4%, C methylated in CHG context: 0.5%, C methylated in CHH context: 0.5%) The major part of CpG sites, 95.7% were unmethylated, of the remaining 4.3% of CpG sites around 3.8% had low methylation, and 0.5% were completely methylated. Methylation is of the mosaic type. Methylation is relatively low with 1.2% of total cytosines. Our analyses suggested that conserved genes and genes with stable expression are localized in high methylated regions of the genome. Finally, we see that repetitive sequences were predominantly situated in low methylated regions of B. glabrata.

    Wiggle files were generated for CpG pairs only.

    Produced at IHPE (http://ihpe.univ-perp.fr/)

  14. d

    Data from: High-resolution estimates of crossover and noncrossover...

    • datadryad.org
    • zenodo.org
    zip
    Updated Mar 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey Wall; Jacqueline Robinson; Laura Cox (2022). High-resolution estimates of crossover and noncrossover recombination from a captive baboon colony [Dataset]. http://doi.org/10.7272/Q6HH6H9D
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 30, 2022
    Dataset provided by
    Dryad
    Authors
    Jeffrey Wall; Jacqueline Robinson; Laura Cox
    Time period covered
    Mar 9, 2022
    Description

    VCF files VCF files contain raw unfiltered genotypes from 66 olive baboons (Papio anubis) from the Southwest National Primate Research Center (SNPRC). Genomes are aligned to the Panubis1.0 reference genome (GCA_008728515.1, Batra et al., 2020 (https://doi.org/10.1093/gigascience/giaa134)). Sequencing was performed with HiSeq 4000 and X machines (450 bp mean insert size, 150 bp x 150 bp paired-end sequencing) using DNA extracted from blood samples. Sequences generated for this study (n=23) were combined with previously generated sequence data from Robinson et al., 2019 (https://doi.org/10.1101/gr.247122.118) and Wu et al., 2020 (https://doi.org/10.1371/journal.pbio.3000838). All raw sequence data are available from the Sequence Read Archive under BioProject PRJNA433868. Median depth of coverage across samples is 35.6X. Briefly, reads were trimmed with TrimGalore v0.6.4 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore) using the following options: -q 20 --stringency 1 --len...

  15. Raw RNAseq data - Lemarchand et al 2024

    • figshare.com
    xlsx
    Updated Mar 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eloise Lemarchand (2024). Raw RNAseq data - Lemarchand et al 2024 [Dataset]. http://doi.org/10.6084/m9.figshare.25471963.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 25, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Eloise Lemarchand
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Unmapped paired-end sequences from an Illumina HiSeq4000 sequencer were assessed by FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Sequence adapters were removed, and reads were quality trimmed using Trimmomatic_0.36. The reads were mapped against the reference mouse genome (mm10) and counts per gene were calculated using annotation from GENCODE M25 (http://www.gencodegenes.org/) using STAR_2.7.2b.

  16. Human RNA-seq ratio data before and after hypoxic stress

    • figshare.com
    txt
    Updated Jan 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hidemasa Bono (2018). Human RNA-seq ratio data before and after hypoxic stress [Dataset]. http://doi.org/10.6084/m9.figshare.5812704.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 23, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Hidemasa Bono
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA-seq reads were processed using RSEM after the adapter trimming by Trim Galore! (version 0.4.1), which is a wrapper script to automate quality and adapter trimming as well as quality control (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). As a reference mapping tool, Bowtie2 (version 2.3.2) was used from RSEM (version1.2.31) following a short tutorial (https://github.com/bli25ucb/RSEM_tutorial).

  17. D

    Deep enzymology data related to Adam et al.: Flanking sequences influence...

    • darus.uni-stuttgart.de
    Updated Feb 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Jeltsch; Pavel Bashtrykov; Sabrina Adam (2022). Deep enzymology data related to Adam et al.: Flanking sequences influence the activity of TET1 and TET2 methylcytosine dioxygenases and affect genomic 5hmC patterns [Dataset]. http://doi.org/10.18419/DARUS-2114
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2022
    Dataset provided by
    DaRUS
    Authors
    Albert Jeltsch; Pavel Bashtrykov; Sabrina Adam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    Experimental procedures for deep enzymology reactions with randomized substrates: For analysis of flanking sequence preferences of the TET enzymes, a similar approach as described for DNMTs (Emperle et al., 2019; Gao et al., 2020; Adam et al., 2020; Dukatz et al., 2020) was used. Briefly, the following single-stranded oligonucleotides containing a methylated or hydroxymethylated CpG or CpH site flanked by 10 randomized nucleotides on either side were obtained from IDT and primer extension was performed to obtain the double stranded DNA substrates. A CpN substrate was prepared as a mixture of CpG and CpH in a 1:3 ratio. For the randomized hydroxymethylated substrate, the single-stranded oligo was purchased coupled to Desthiobiotin-TEG. Primer extension was conducted and the substrate was purified via Streptavidin beads (Dynabeads M-280, ThermoFisher Scientific) and eluted with a biotin solution. HM rand. GAGTGTGACTAGGCTCTCACTGCCNNNNNNNNNN mC GNNNNNNNNNNGAGAGGAGACCTAGTGAGAAG OH rand. GAGTGTGACTAGGCTCTCACTGCCNNNNNNNNNN hmC GNNNNNNNNNNGAGAGGAGACCTAGTGAGAAG CH rand. GAGTGTGACTAGGCTCTCACTGCCNNNNNNNNNN mC HNNNNNNNNNNGAGAGGAGACCTAGTGAGAAG The randomized double stranded substrates were incubated with the TET enzyme at 37 °C for 45 min (CN context) or 1 h (CG context) using mixtures containing 1x reaction buffer (50 mM HEPES pH 6.8, 100 mM NaCl, 1 mM DTT, 1 mM alpha-ketoglutarate and 2 mM ascorbic acid), 100 µM ammonium iron(II) sulfate, using different enzyme concentrations and variable amounts of dialysis buffer to keep a fixed salt and glycerol concentration. Reactions were stopped by freezing in liquid nitrogen. Afterwards, Proteinase K (NEB) treatment was used for enzyme inactivation for 1 h at 50 °C, followed by purification with a PCR clean-up kit (MACHEREY-NAGEL). Hairpin ligation and bisulfite conversion was performed using EZ DNA Methylation-Lightning kit (ZYMO). Library preparation for Illumina Next Generation Sequencing was conducted using a two-step PCR approach as described in (Gao et al., 2020). Unique combinations of barcode and index sequences were introduced to distinguish different samples and experiments. For bioinformatic analysis of the NGS datasets, a local instance of a Galaxy server (Afgan et al., 2018) was used. Sequence reads were trimmed with Trim Galore! (Galaxy Version 0.4.3.1, https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) keeping only the sequences with a quality score above 20 for further analysis, and filtered according to the expected DNA size using the Filter FASTQ tool (Blankenberg et al., 2010). The data in this entry contain the Fastq sequence files and extracted DNA sequences obtained with the hemimethylated CpG substrate (HM CG), hemimethylated CpN substrate mixture (HM CN) and hemihydroxymethylated CpG substrate (OH CG). Enzyme kinetics were conducted with TET1 and two versions of TET2 (V1 and V2) as described in the accompanying paper. Individual repeats of experiments are indicated with R1-R5 as appropriate. Control reaction refer to samples treated identically but without enzyme. The cited references are listed in the accompanying publication to this dataset.

  18. d

    Population genetic structure of the gastropod species Bulinus truncatus

    • datadryad.org
    • dataone.org
    • +2more
    zip
    Updated Oct 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl Vangestel (2022). Population genetic structure of the gastropod species Bulinus truncatus [Dataset]. http://doi.org/10.5061/dryad.37pvmcvpc
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 10, 2022
    Dataset provided by
    Dryad
    Authors
    Carl Vangestel
    Time period covered
    Sep 19, 2022
    Description

    Linux platform

  19. e

    SEEDSTICK Is a Master Regulator of Development and Metabolism in Arabidopsis...

    • ebi.ac.uk
    Updated Dec 16, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IGNACIO EZQUER; Ignacio Ezquer; Chiara Mizzotti (2014). SEEDSTICK Is a Master Regulator of Development and Metabolism in Arabidopsis Seed Coat [Dataset]. https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-GEOD-59637
    Explore at:
    Dataset updated
    Dec 16, 2014
    Authors
    IGNACIO EZQUER; Ignacio Ezquer; Chiara Mizzotti
    Description

    Purpose: The goals of this study are to compare NGS-derived transcriptome profiling (RNA-seq) from the SEEDSTICK mutant in Arabidopsis with a wild type, to unveil the role of this transcription factor in seed development and decipher the impact of this factor in PAs metabolism Methods: Total RNA was extracted from two biological replicates from both wild-type and stk mutant inflorescences and siliques until 5 DAP with the Qiagen Kit according to the manufacturer's instructions. DNA contaminations were removed using the PROMEGA RQ1 RNase-Free DNase according to the manufacturer's instructions. RNA quality integrity was analyzed by electrophoresis gel and was validated on a Bioanalyzer 2100 (Aligent, Santa Clara, CA); RNA Integrity Number (RIN) values were greater than 7 for all the samples. In order to confirm that in stk mutant samples STK was not expressed, STK expression was checked by real time PCR with primer RT 780 (5â??-TGCGATGCAGAAGTTGCGCTC-3â??) and RT 781 (5â??-AGTACGCGGCATTGATTTCTTG-3â??). Sequencing libraries were prepared according to the manufacturerâ??s instructions by TruSeq RNA Sample Prep kit (Illumina Inc.) and sequenced on Illumina HiSeq2000 in one lane single-read 50bp. The processing of fluorescent images into sequences, base-calling and quality value calculations were performed using the Illumina data processing pipeline (version 1.8). Raw reads were filtered to obtain high-quality reads by removing low-quality reads containing more than 30% bases with Q < 20. Finally, a quality control of the raw sequence data was performed using FastQC [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/]. Results: A total of 102,278,242 reads passed a quality filter and 85% were mapped back to the Arabidopsis TAIR10 genome. Approximately 90% mapped uniquely to only one location and could be assigned to a single annotated TAIR10 gene. Normalization of expression values was performed using RPKM values. All other parameters were kept at default levels. The CLC Genomic Workbench was also further used to determine all differentially expressed transcripts found in each cDNA library. Baggerley's test and a FDR correction were used for statistical analysis of samples. Our analysis revealed that 156 genes were upregulated, whereas for 90 genes a reduction in their mRNA level was observed in the stk mutant when compared to wild-type . Data analysis revealed a significant enrichment for terms related to the phenylpropanoid metabolic process, flavonoid biosynthetic process as well as cellular amino acid derivative metabolic process. Conclusion: Our genome-wide transcriptomic analysis suggests that the ovule identity factor STK is involved in the regulation of several metabolic processes providing a strong connection between cell fate determination, development and metabolism. In particular we characterize, through phenotypic, genetic, biochemical and transcriptomic approaches, the role of STK in PAs biosynthesis. Our results indicate that STK exerts this role through the direct regulation of the gene encoding for BANYULS/ANTHOCYANIDIN REDUCTASE (BAN/ANR), which converts anthocyanidins into their corresponding 2,3-cis-flavan-3-ols. Our study also demonstrates that the levels of H3K9ac chromatin modification directly correlate with the active state of BAN in an STK-dependent way. This supports the theory that MADS-domain proteins control the expression of their target genes through the modification of the chromatin states. STK might recruit or negatively regulate histone modifying factors to control their activity. Moreover, we show that STK controls through a complex regulatory network not only directly BAN but also other regulators of this key gene in tannin production mRNA profiles from both Arabidopsis wild-type and stk mutant inflorescences and siliques until 5 DAP were generated by deep sequencing, in duplicate according to the manufacturerâ??s instructions by TruSeq RNA Sample Prep kit (Illumina Inc.) and sequenced on Illumina HiSeq2000 in one lane single-read 50bp. In order to confirm that in stk mutant samples STK was not expressed, STK expression was checked by real time PCR with primer RT 780 (5â??-TGCGATGCAGAAGTTGCGCTC-3â??) and RT 781 (5â??-AGTACGCGGCATTGATTTCTTG-3â??).

  20. d

    Data from: Finding complexity in complexes: assessing the causes of...

    • datadryad.org
    • search.datacite.org
    zip
    Updated Jun 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Firneno; Justin O'Neill; Daniel Portik; Alyson Emery; Josiah Townsend; Matthew Fujita (2020). Finding complexity in complexes: assessing the causes of mitonuclear discordance in a problematic species complex of Mesoamerican toads [Dataset]. http://doi.org/10.5061/dryad.q573n5tfw
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2020
    Dataset provided by
    Dryad
    Authors
    Thomas Firneno; Justin O'Neill; Daniel Portik; Alyson Emery; Josiah Townsend; Matthew Fujita
    Description

    We collected ddRADseq data for 84 individuals following the protocol described in Peterson et al. (2012) and following parameters specified in Streicher et al. (2014). Our final library was analyzed on one Illumina HiSeq2500 lane (150 bp single end reads) at the Genomic Sequencing and Analysis Facility (GSAF) at The University of Texas (https://www.wikis.utexas.edu/display/GSAF). The workflow for data processing, filtering, and formatting was automated using scripts available from Portik et al. 2017 (https://github.com/dportik/Stacks_pipeline). In brief, the raw Illumina reads were demultiplexed using stacks v1.35 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013), the restriction site overhangs were removed using the fastx_trimmer module of the fastx-toolkit (www.hannonlab.cshl.edu/fastx_toolkit), and the sequencing quality was examined on a per sample basis using fastqc v0.10.1 (www.bioinformatics.babraham.ac.uk/projects/fastqc). Loci were created, catalogued, and identified us...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lucie Perillat; Lucie Perillat (2025). RNA-Sequencing Part 1 Generation and characterization of a novel mouse model of Becker Muscular Dystrophy with a deletion of exons 52 to 55 [Dataset]. http://doi.org/10.5281/zenodo.17087788
Organization logo

RNA-Sequencing Part 1 Generation and characterization of a novel mouse model of Becker Muscular Dystrophy with a deletion of exons 52 to 55

Explore at:
tsv, application/gzip, txt, binAvailable download formats
Dataset updated
Sep 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lucie Perillat; Lucie Perillat
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Becker muscular dystrophy (BMD) is a rare X-linked recessive neuromuscular disorder, frequently caused by in-frame deletions in the DMD gene that result in the production of a truncated, yet functional, dystrophin protein. The consequences of BMD-causing in-frame deletions on the organism are difficult to predict, especially in regard to long-term prognosis. Here, we used CRISPR-Cas9 to generate a new Dmd Δ52-55 mouse model by deleting exons 52-55 in the Dmd gene, resulting in a BMD-like in-frame deletion. To delineate the long-term effects of this deletion, we studied these mice over 52 weeks by performing histology and echocardiography analyses and assessing motor functions. To further delineate the effects of the exons 52-55 in-frame deletion, we performed RNA-Seq pre- and post-exercise and identified several differentially expressed pathways that could explain the abnormal muscle phenotype observed at 52 weeks in the BMD model.

This dataset shows the results and raw data of the RNA-sequencing and transcriptomic analysis for 52-week-old exercised and non-exercised mice (4 BMD, 4 WT and 4 DMD, as mentioned on the names of each file).

1. Due to size restrictions, this RNA-Seq dataset will be published on Zenodo in 3 parts. This first part contains the data for the exercised mice, including the fastq (R1 and R2) and associated (md5) files for the 4 BMD mice (15315-15318) and 2 DMD mice (15319 and 15320), all the raw gene counts (txt files), and all the differentially expressed genes (tsv files).

Workflow (performed by TCAG at SickKids):

2. RNA-Seq Library and Reference Genome Information

Type of library: stranded, paired end

Genome reference sequence: GRCm39, M31 Gencode gene models.

3. Read Pre-processing, Alignment and Obtaining Gene Counts

3.1 Read Pre-processing

The sequencing data is in FASTQ format. The quality of the data is assessed using FastQC v.0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).

Adaptors are trimmed using Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) v. 0.5.0. Trim Galore is running Cutadapt (https://cutadapt.readthedocs.org/en/stable/) v. 1.10. Trim Galore is run with the following parameters:

-q 25 – the reads are trimmed from the 3' end base by base, trimming stops if the quality of the base is greater than 25;

--clip_R1 6, --clip_R2 6 – clip the first 6 nucleotides from the 5' ends of read 1 and read 2;

--stringency 5 – at least 5 nucleotides overlap with the Illumina primer sequence are needed for trimming;

--length 40 – any read that is shorter than 40 nucleotides as a result of trimming is discarded;

--paired – only pairs of reads are retained (for paired-end reads only, not for single reads).

The type of adaptor is automatically detected by screening the first 1 million sequences of the first specified file for the first 12/13 nucleotides of the standard Illumina or Nextera primers and the sequence from the start of the primer to the 3' end of the read is trimmed.

The quality of the trimmed reads is re-assessed with FastQC.

The trimmed reads are also screened for presence of rRNA and mtRNA sequences using FastQ-Screen v.0.10.0 (http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/).

To assess the read distribution, positional read duplication and to confirm the strandedness of the alignments we use the RSeQC package (http://rseqc.sourceforge.net/), v. 2.6.2. The distribution of reads across exonic, intronic and intergenic sequences is assessed by the read_distribution.py program, infer_experiment.py is used for confirming strandedness, and read_duplication.py is used to obtain the positional read duplication (percentage of reads mapping to exactly the same genomic location). Sufficient proportion of reads should map to the exonic sequences (ideally > 70-80%). Large amounts of reads mapping to intronic sequences in a poly-A mRNA library will suggest significant presence of pre-mRNA or other issues with RNA preparation. For stranded RNA-seq experiments the majority of the reads should map exclusively to one strand, same or opposite to the transcript, depending on the library preparation method. For non-stranded experiments the reads should be equally distributed to both strands.

3.2. Read Alignment

The raw trimmed reads are aligned to the reference genome using the STAR aligner, v.2.6.0c. (https://github.com/alexdobin/STAR, https://academic.oup.com/bioinformatics/article/29/1/15/272537). The alignments are contained in the .bam files. The “.bam” together with the “.bai” files can be used for viewing of the alignments in the Integrative Genomics Viewer (IGV, http://software.broadinstitute.org/software/igv/).

3.3. Obtaining Gene Counts

The filtered STAR alignments are processed to extract raw read counts for genes using htseq-count v.0.6.1p2 (HTSeq, http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html). Assigning reads to genes by htseq-count is done in the mode “intersection_nonempty”, i.e. if a read overlaps with two overlapping genes and the overlap to gene A is greater than the overlap to gene B, the read is counted towards gene A, while if a read overlaps equally with gene A and gene B, then it is not counted towards either gene. Htseq_count does not count reads with multiple alignments to avoid introducing bias in the expression results. Only uniquely mapping reads are counted.

4. Pre-processing, Alignment and Gene Counts QC

MultiQC (https://multiqc.info/) is a reporting tool that aggregates statistics generated by bioinformatics analyses across multiple samples. MultiQC v. 1.14 was used to generate a consolidated report from FastQC screening of both untrimmed and trimmed reads, and from RSeQC, FastQ Screen, STAR and htseq-count results. The MultiQC report is contained in MultiQC_Report_*.html file.

5. DGE Analysis with edgeR

Differential expression was done with the edgeR R package v.3.28.1, using R v.3.6.1 (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html). The data set was filtered to retain only genes whose gene counts were >50 in at least 3 samples. This is intended to remove genes that are notexpressed, or expressed at a very low level.

The method used for normalizing the data was TMM, implemented by the calcNormFactors(y) function. All samples were normalized and filtered together. The glmLRT functionality in edgeR was used for the differential expression tests, with sample group taken into account.

EdgeR Results Legend:

· GeneID – Ensembl Gene ID;

· Chr.Start.End - gene coordinates;

· GeneName, GeneType, etc. – Gene attributes, derived from the genome annotation;

· logFC - Log2 Fold Change (use this column for selection of DEGs);

· logCPM - Log2 Counts Per Million, average for all libraries;

· LR – Statistic calculated by the LR-Test;

· PValue - Differential expression P value;

· FDR – Differential expression False Discovery Rate, calculated by the Benjamini-Hochberg method (use this column for selection of DEGs);

· (columns labeled with sample names) – Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) for the given samples.

Search
Clear search
Close search
Google apps
Main menu