16 datasets found

Training data for de novo transcriptome reconstruction from RNA-seq data
zenodo.org
data.niaid.nih.gov
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Heydarian; Mallory Freeberg; Krzysztof Poterlowicz; Mohammad Heydarian; Mallory Freeberg; Krzysztof Poterlowicz (2020). Training data for de novo transcriptome reconstruction from RNA-seq data [Dataset]. https://zenodo.org/records/583140
Explore at:
binAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mohammad Heydarian; Mallory Freeberg; Krzysztof Poterlowicz; Mohammad Heydarian; Mallory Freeberg; Krzysztof Poterlowicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data provided here are part of a Galaxy Training Network tutorial that analyzes RNA-seq data using a de novo transcriptome reconstruction strategy from a study published by Wu et al., 2014 (DOI:10.1101/gr.164830.113). The goal of this study was to investigate "the dynamics of occupancy and the role in gene regulation of the transcription factor Tal1, a critical regulator of hematopoiesis, at multiple stages of hematopoietic differentiation." To this end, RNA-seq libraries were constructed from multiple mouse cell types including G1E - a GATA-null immortalized cell line derived from targeted disruption of GATA-1 in mouse embryonic stem cells - and megakaryocytes. This RNA-seq data was used to determine differential gene expression between G1E and megakaryocytes and later correlated with Tal1 occupancy. This dataset (GEO Accession: GSE51338) consists of biological replicate, paired-end, polyA selected RNA-seq libraries. Because of the long processing time for the large original files, we have downsampled the original raw data files to include only reads that align to a subset of interesting genomic loci identified by Wu et al. This dataset represents an even smaller set of data than another training data set (DOI:10.5281/zenodo.254485).
The Galaxy platform for accessible, reproducible and collaborative...
ckan.earlham.ac.uk
Updated Apr 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.earlham.ac.uk (2019). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update - Datasets - CKAN [Dataset]. https://ckan.earlham.ac.uk/dataset/27a03fa3-12ad-40a6-9f80-cf348da2899d
Explore at:
Dataset updated
Apr 2, 2019
Dataset provided by
CKANhttps://ckan.org/
Description
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.
Z
GTN_PAR-CLIP_workflow
data.niaid.nih.gov
Updated Aug 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fallmann Joerg (2022). GTN_PAR-CLIP_workflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2553518
Explore at:
Dataset updated
Aug 4, 2022
Dataset authored and provided by
Fallmann Joerg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data from https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?exp=SRX105188&cmd=search&m=downloads&s=seq and ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/985/GCA_000002985.3_WBcel235

For Galaxy Training https://rna.usegalaxy.eu/workflows/run?id=a108b575b16e6cb9
f
Additional file 5 of LotuS2: an ultrafast and highly accurate tool for...
figshare.com
zip
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ezgi Özkurt; Joachim Fritscher; Nicola Soranzo; Duncan Y. K. Ng; Robert P. Davey; Mohammad Bahram; Falk Hildebrand (2023). Additional file 5 of LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis [Dataset]. http://doi.org/10.6084/m9.figshare.21359930.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21359930.v1
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Authors
Ezgi Özkurt; Joachim Fritscher; Nicola Soranzo; Duncan Y. K. Ng; Robert P. Davey; Mohammad Bahram; Falk Hildebrand
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 5: Supplementary Figure S1. Galaxy web interface of LotuS2. Raw reads can be uploaded into the LotuS2 via the Galaxy web interface and analysed (accessible on https://usegalaxy.eu/ ).
Training material for the SIGU course "Data analysis and interpretation for...
zenodo.org
data.niaid.nih.gov
application/gzip, bin +1
Updated Apr 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paolo Uva; Paolo Uva; Alessandro Bruselles; Alessandro Bruselles; Andrea Ciolfi; Andrea Ciolfi; Gianmauro Cuccuru; Gianmauro Cuccuru; Giuseppe Marangi; Giuseppe Marangi; Tommaso Pippucci; Tommaso Pippucci (2021). Training material for the SIGU course "Data analysis and interpretation for clinical genomics" (part 1/4) [Dataset]. http://doi.org/10.5281/zenodo.3689711
Explore at:
application/gzip, vcf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3689711
Dataset updated
Apr 26, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paolo Uva; Paolo Uva; Alessandro Bruselles; Alessandro Bruselles; Andrea Ciolfi; Andrea Ciolfi; Gianmauro Cuccuru; Gianmauro Cuccuru; Giuseppe Marangi; Giuseppe Marangi; Tommaso Pippucci; Tommaso Pippucci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In years 2018-2019, we organized on behalf of the Italian Society of Human Genetics (SIGU) an itinerant Galaxy-based “hands-on-computer” training activity entitled “Data analysis and interpretation for clinical genomics”. This one-day course was offered to participants including clinical doctors, biologists, laboratory technicians and bioinformaticians. Topics covered by the course were NGS data quality check, detection of variants, copy number alterations and runs of homozygosity, annotation and filtering and clinical interpretation of sequencing results.

To meet the constant need for training on basic NGS analysis and interpretation of sequencing data in the clinical setting, we designed an on-line Galaxy-based training resource dedicated to this topic, articulated in presentations and practical assignments by which students will learn how to approach NGS data processing at the level of FASTQ, BAM and VCF files and clinically-oriented examination of variants emerging from sequencing experiments such as whole exomes.

This repository contains datasets required for the online training "Data analysis and interpretation for clinical genomics" available at https://sigu-training.github.io/clinical_genomics/.

Tools used in the training are available at the European Galaxy instance running at https://usegalaxy.eu, which also includes a copy of this repository in the Shared Data Libraries. Files named Fam_*.bam are based on hg38 reference genome; all the other files refer to hg19.

This is part of a 4 dataset submission.
Z
Training material for the SIGU course "Data analysis and interpretation for...
data.niaid.nih.gov
zenodo.org
Updated Apr 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paolo Uva (2021). Training material for the SIGU course "Data analysis and interpretation for clinical genomics" (part 4/4) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4270090
Explore at:
Dataset updated
Apr 26, 2021
Dataset provided by
Paolo Uva
Gianmauro Cuccuru
Giuseppe Marangi
Alessandro Bruselles
Andrea Ciolfi
Tommaso Pippucci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains datasets required for the online training "Data analysis and interpretation for clinical genomics" available at https://sigu-training.github.io/clinical_genomics/.

Tools used in the training are available at the European Galaxy instance running at https://usegalaxy.eu, which also includes a copy of this repository in the Shared Data Libraries. BAM files in this dataset are based on the hg38 reference genome.

This is part of a 4 dataset submission. Refer to this dataset for details.
E
EOSC4Cancer Longitudinal Synthetic Colorectal Cancer Genomic data developed...
ega-archive.org
Updated Feb 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). EOSC4Cancer Longitudinal Synthetic Colorectal Cancer Genomic data developed at BSC [Dataset]. https://ega-archive.org/datasets/EGAD50000000276
Explore at:
Dataset updated
Feb 9, 2024
License
https://ega-archive.org/dacs/EGAC00001000514https://ega-archive.org/dacs/EGAC00001000514
Description
The synthetic genomes have been created trying to mimic real cancer data of 4 patients (Named 185,186,187 and 188). Mutations are based on real CRC patients from the PCAWG dataset. For each patient, two tumor samples at different time points and one healthy sample have been simulated. The cancer intra-tumor heterogeneity and evolution in the patients is depicted by simulating reads from tumor subclones separately and then mixing them according to their clonal proportions in each sample. For rapid use and transfer only selected chromosomes have been generated for each patient.

Chromosomes per patient: -185: chr4, chr5, chr7, chr17 -186: chr1, chr7, chr12, chr17 -187: chr1, chr2, chr5, chr12, chr17 -188: chr2, chr5, chr12, chr13, chr17

Worflows used to create BAM/BAI, VCF and MAF files from FASTQ (Alignment with GRCh38): - https://usegalaxy.eu/published/workflow?id=2c3d05023c02113e - https://usegalaxy.eu/published/workflow?id=1da86d74f8535f4e
f
Data from: Genetic Characteristics and Phylogenetic Relationships of 18...
figshare.com
zip
Updated Mar 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenyu Sun (2025). Genetic Characteristics and Phylogenetic Relationships of 18 Anchovy Species Based on Mitochondrial Genomes in the Seas Around China [Dataset]. http://doi.org/10.6084/m9.figshare.28227167.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28227167.v2
Dataset updated
Mar 29, 2025
Dataset provided by
figshare
Authors
Wenyu Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We downloaded the complete mitochondrial genome data of 18 Engraulidae fish species from the NCBI database (https://www.ncbi.nlm.nih.gov/). These files were stored in the “Download data” folder. Subsequently, we reannotated these mitochondrial genomes using the MITOS2 online tool available on the Galaxy website (https://usegalaxy.org/) and manually modified the original gb files to adjust the inaccurately annotated control regions and to add the annotation information for the light-strand replication origin. The revised files were saved in the “Reannotation” folder and were used for subsequent analyses.
D
Data from: NGS data related to Albrecht et al.: Locus specific and stable...
darus.uni-stuttgart.de
Updated Jan 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Jeltsch; Pavel Bashtrykov; Claudia Albrecht (2024). NGS data related to Albrecht et al.: Locus specific and stable DNA demethylation at the H19/IGF2 ICR1 by epigenome editing using a dCas9-SunTag system and the catalytic domain of TET1 [Dataset]. http://doi.org/10.18419/DARUS-3790
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-3790
Dataset updated
Jan 10, 2024
Dataset provided by
DaRUS
Authors
Albert Jeltsch; Pavel Bashtrykov; Claudia Albrecht
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
DFG
BW Foundation
Description
Method overview For targeted DNA demethylation of the H19/IGF2 ICR1, HEK293 cells were transfected with two plasmids, one containing dCas9 fused to a SunTag with five repeats of the GCN4 peptide, separated by 22 aa long linkers, and scFv-fused TET1CD, as well as a GFP reporter protein. The second plasmid encodes five sgRNAs targeting the ICR1 and a DsRed fluorophore. On day 3 post-transfection, GFP- and DsRed-positive cells were sorted by FACS. A part of the sorted cells was used immediately for downstream analysis, the other part was re-seeded to harvest at later time points. Genomic DNA was isolated from the cells and bisulfite or oxidative bisulfite conversion was conducted. For amplicon-based DNA methylation analysis, libraries were prepared from bisulfite-converted DNA using two consecutive PCRs in which barcodes, indices and sequencing adapters are added. Samples were sequenced by NGS and data was analyzed. Method details The gDNA of transfected HEK293 cells sorted by FACS was isolated using the QIAmp DNA Mini Kit (Qiagen) according to the manufacturer's instructions. 500 ng gDNA was fragmented enzymatically by overnight digestion using 40 U EcoRV-HF (a non-cutter in the genomic regions desired for amplification) (New England BioLabs, Inc.) in CutSmart buffer in a total volume of 20 µl. The next day, bisulfite conversion was conducted using the EZ DNA Methylation-Lightning™ Kit (ZYMO RESEARCH) according to the manufacturer's protocol. Oxidative bisulfite conversion was performed using the TrueMethyl® oxBS Module (Part No. 0414, Tecan Genomics, Inc.) according to the manufacturer's instructions. Amplicons of interest were amplified in a first PCR1 with locus-specific primers, which also contained barcodes and adapters complementary to PCR2 primers. The PCR1 product was used as template for PCR2, in which Illumina TruSeq sequencing indices are added to the amplicons. Sample concentrations were measured using the NanoDrop 1000 (Thermo Fisher Scientific) and equimolar amounts of samples were pooled. Paired-end Illumina sequencing with 250 bp read length was performed by Novogene (UK) Company Limited. Data analysis NGS data in a FASTQ format was analyzed basically as described (Rajaram et al., 2023) on the Galaxy platform (https://usegalaxy.org/) (The Galaxy platform for accessible, reproducible and collaborative biomedical analyses, 2022), where all the following tools are available. In brief, Illumina adapter sequences were removed using Trim Galore!. Afterwards, two paired-end reads were merged using Pear and reads with low quality were removed with Filter FASTQ. De-multiplexing of individual samples tagged with combinations of barcodes and Illumina indices was done by converting the FASTQ files using FASTQ to Tabular, followed by selection of lines with the tool Select and re-conversion of the files to a FASTQ format with Tabular to FASTQ. For the alignment of reads to a reference sequence, bwameth was used and the DNA methylation at each CpG site was analyzed by applying the tool MethylDackel. The output files were processed using Microsoft Excel. References The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic acids research 2022, 50, W345-W351, doi: 10.1093/nar/gkac247 Rajaram, N.; Kouroukli, A.G.; Bens, S.; Bashtrykov, P.; Jeltsch, A. Development of super-specific epigenome editing by targeted allele-specific DNA methylation. Epigenetics Chromatin 2023, 16, 41, doi: 10.1186/s13072-023-00515-5
Data from: Aequatus: An open-source homology browser
ckan.earlham.ac.uk
Updated Jun 2, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.earlham.ac.uk (2019). Aequatus: An open-source homology browser [Dataset]. https://ckan.earlham.ac.uk/dataset/fc5855ca-bfc7-4f70-82e2-53d5f505f1dc
Explore at:
Dataset updated
Jun 2, 2019
Dataset provided by
CKANhttps://ckan.org/
Description
Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterisation enables the identification of syntenic blocks, which can then be visualised with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. We present Aequatus, a standalone web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualisations. It relies on pre-calculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfils the visualisation aspects of Aequatus, available within the Galaxy web platform as a visualisation plugin, which can be used to visualise gene trees generated by the GeneSeqToFamily workflow. Aequatus is an open-source tool freely available to download under the MIT license at https://github.com/TGAC/Aequatus A demo server is available at http://aequatus.earlham.ac.uk/ A publicly available instance of the GeneSeqToFamily workflow to generate gene tree information and visualise it using Aequatus is available on the Galaxy EU server at https://usegalaxy.eu
Restart dataset for a single location in Norway ALP1 (61.0243N,8.12343E) for...
zenodo.org
data.niaid.nih.gov
tar, txt
Updated Mar 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fouilloux Anne; Fouilloux Anne (2021). Restart dataset for a single location in Norway ALP1 (61.0243N,8.12343E) for CTSM/FATES EMERALD Galaxy tutorial [Dataset]. http://doi.org/10.5281/zenodo.4126404
Explore at:
tar, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4126404
Dataset updated
Mar 18, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Fouilloux Anne; Fouilloux Anne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Norway
Description
Restart files for CLM-FATES version 2.0.1 for CLM-FATES EMERALD version 2.0.1.

CTSM_FATES-EMERALD_on_inputdata_version2.0.0_ALP1.tar_(restart_info):
- ALP1_refcase.datm.r.2300-01-01-00000.nc
- ALP1_refcase.datm.rs1.2300-01-01-00000.bin
- ALP1_refcase.cpl.r.2300-01-01-00000.nc
- ALP1_refcase.clm2.r.2300-01-01-00000.nc

This dataset is being used in the Galaxy Training tutorial on CLM-FATES.

This work has been done in in collaboration with Galaxy Europe and EOSC-Life:
- Within the 1st EOSC-Life Training Open Call, two out of four proposals have been awarded to the European Galaxy team to develop climate science e-learning material and mentoring and training opportunities for our communities.

CLM-FATES documentation can be found here.
Z
Sentiment analysis in Galaxy with IMDB movie review dataset
data.niaid.nih.gov
Updated Aug 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaivan Kamali (2022). Sentiment analysis in Galaxy with IMDB movie review dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4477880
Explore at:
Dataset updated
Aug 4, 2022
Dataset authored and provided by
Kaivan Kamali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IMDB movie review sentiment classification dataset (Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011)). For more information please refer to: https://ai.stanford.edu/~amaas/data/sentiment/

The IMDB dataset was modified as follows to prepare it for use in a Galaxy Training Tutorial (https://training.galaxyproject.org/):

The top 50 words are excluded (mostly stop words). Included the next 10,000 top words. Reviews are limited to 500 words max (Longer reviews trimmed and shorter reviews are padded). 25,000 reviews are used for training and testing each. Files are in tsv (tab separated value) format to be consumed by Galaxy (www.usegalaxy.org).
raw_mapped_bam_from_Galaxy
figshare.com
application/gzip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caroline Werlang (2023). raw_mapped_bam_from_Galaxy [Dataset]. http://doi.org/10.6084/m9.figshare.14214131.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14214131.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Caroline Werlang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Mapped reads (BAM files) on the UA159 genome. Data published in Werlang et. al. Nature Microbiology 2021https://doi.org/10.1038/s41564-021-00876-1Generated using Galaxy: usegalaxy.orgSteps used to generate the files from raw sequencing data. This represents Step 0 of the full analysis pipeline available at https://github.com/cwerlang/Smutans-MUC5B-RNASeq0. The raw bam files are available from the Gene Expression Omnibus under accession number GSE1632581. Load the 'data/external/GCF_000007465.2_ASM746v2_genomic.fna' file and raw bam files into Galaxy2. Map with BWA (Burrows-Wheeler Aligner). Options: unpaired single end short reads3. Download the .bam files into the "raw_mapped_bam_from_Galaxy" folder4. Rename the files to 'xxxx.mapped.bam'5. Fill in the 'data/raw_mapped_bam_from_Galaxy/file_list_mapped_bam.csv' file with the new filenames
D
Data from: NGS data related to Rajaram et al.: Allele specific DNA...
darus.uni-stuttgart.de
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Jeltsch; Pavel Bashtrykov; Nivethika Rajaram (2024). NGS data related to Rajaram et al.: Allele specific DNA demethylation ... [Dataset]. http://doi.org/10.18419/DARUS-4230
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-4230
Dataset updated
Aug 12, 2024
Dataset provided by
DaRUS
Authors
Albert Jeltsch; Pavel Bashtrykov; Nivethika Rajaram
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
BW Foundation
Description
Method overview To achieve targeted locus and allele-specific DNA demethylation, HEK293 cells were transfected with two plasmids. One plasmid contains, dCas9 fused to a SunTag with five repeats of the GCN4 peptide, separated by 22 aa long linkers, and scFv-fused TET1CD, as well as a GFP reporter protein. The other plasmid is a multiguide plasmid with 4 individual sgRNAs flanked by U6 promoter and gRNA scaffold, and a DsRed fluorophore. Control experiments were conducted with a scrambled sgRNA that does not have a binding site in the human genome. Initial studies showed that cells positive for two plasmids exhibited detectable fluorescence of the corresponding reporter proteins on day 3 post-transfection. Hence, FACS sorting was conducted at this time point. A part of the sorted cells was used immediately for downstream analysis, the other part was re-seeded to harvest at later time points. For DNA methylation analysis, genomic DNA was isolated from the cell samples and subjected to bisulfite treatment. Library preparation was performed using the bisulfite-converted samples, followed by NGS and data analysis. All methylation experiments were conducted in three independent biological replicates. For measurement of the genomic allele frequencies, genomic DNA of the untreated samples was used for the amplification of the region around the target SNP and an exonic region with additional SNP for each target, which was followed by library preparation, NGS and data analysis. To monitor the variation in the expression of the target genes, RNA was isolated from the treated cells on Day 6. cDNA synthesized from the isolated RNA was used for the library preparation of the exonic region. The library was subjected to NGS followed by data analysis. All experiments were conducted in three independent biological replicates. Method details The gDNA of transfected HEK293 cells sorted by FACS was extracted using QIAmp DNA Mini Kit (Qiagen). 500 ng of genomic DNA was subjected to overnight digestion with EcoRV which is not cutting in any of the target amplicons. Zymo EZ DNA Methylation-Lightning Kit (D5030-E) was used for bisulfite conversion. The library for NGS was prepared by two consecutive PCR reactions (Leitao et al, 2018). Firstly, bisulfite converted genomic DNA of each sample was amplified with target gene specific primers. The gene specific optimized amount of a product from the first PCR was used as a template for the second PCR to add the Illumina TruSeq sequencing adapters. Final products were quantified, pooled in equimolar amounts and purified using SPRIselect beads (Beckman Coulter). Ready-to-use pools of libraries were sequenced on NovaSeq 6000 using a PE250 flow cell (Novogene). For expression analysis, RNA was isolated from the sorted cells using Qiagen RNeasy extraction kit (Cat. No. 74034). By an additional treatment with TURBO DNA-free™ Kit (Ambion #AM1907) the residual genomic DNA from the samples were removed. 500 ng of the DNase-free RNA was used for cDNA synthesis with Applied Biosystems- High-Capacity cDNA Reverse Transcription Kit (Cat No 4368814). NRT was used as a negative control for cDNA synthesis, where the reaction was conducted without addition of the reverse transcriptase enzyme. In addition, NTC (no template control) reactions were included. The transcripts were subjected to library preparation in a two-step PCR process as mentioned above. For amplification of the genomic regions, 10 ng of the isolated genomic DNA was used. Two-step library preparation was carried out for NGS of genomic regions. All NGS data were obtained in the form of FASTQ files. Data analysis NGS data in a FASTQ format was analyzed as described (Rajaram et al., 2023) on the Galaxy platform (https://usegalaxy.org/) (The Galaxy platform for accessible, reproducible and collaborative biomedical analyses, 2022), where all the following tools are available. First, Illumina adapter sequences were removed using Trim Galore!. Afterwards, two paired-end reads were merged using Pear and reads with low quality were removed with Filter FASTQ. All NGS data files were subjected to this processing. For quantitative analysis of the methylation at individual CpG sites, the following steps were carried out. De-multiplexing of individual samples tagged with combinations of barcodes and Illumina indices was done by converting the FASTQ files using FASTQ to Tabular, followed by selection of lines with the tool Select and re-conversion of the files to a FASTQ format with Tabular to FASTQ. For the alignment of reads to a reference sequence, bwameth was used and the DNA methylation at each CpG site was analyzed by applying the tool MethylDackel. The output files were processed using Microsoft Excel. For the analysis of the allelic ratios of the transcript and genomic region, de-multiplexing of individual samples tagged with combinations of barcodes and Illumina indices was done by converting the FASTQ files using FASTQ to...
n
Data from: Two male-killing Wolbachia from Drosophila birauraia that are...
data.niaid.nih.gov
dataone.org
+1more
zip
Updated Dec 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hiroshi Arai; Masayoshi Watada; Daisuke Kageyama (2023). Two male-killing Wolbachia from Drosophila birauraia that are closely related but distinct in genome structure [Dataset]. http://doi.org/10.5061/dryad.j9kd51cjh
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j9kd51cjh
Dataset updated
Dec 22, 2023
Dataset provided by
Ehime University
National Agriculture and Food Research Organization
Authors
Hiroshi Arai; Masayoshi Watada; Daisuke Kageyama
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Insects harbour diverse maternally inherited bacteria and viruses, some of which have evolved to kill the male progeny of their hosts (male killing: MK). The fly species Drosophila biauraria carries a maternally transmitted MK-inducing partiti-like virus, but it was unknown if it carries other MK-inducing endosymbionts. Here, we identified two male-killing Wolbachia strains (wBiau1 and wBiau2) from D. biauraria and compared their genomes to elucidate their evolutionary processes. The two strains were genetically closely related but had exceptionally different genome structures with considerable rearrangements compared with combinations of other Wolbachia strains. Despite substantial changes in the genome structure, the two Wolbachia strains did not experience gene losses that would disrupt the male-killing expression or persistence in the host population. The two Wolbachia-infected matrilines carried distinct mitochondrial haplotypes, suggesting that wBiau1 and wBiau2 have invaded D. biauraria independently and undergone considerable genome changes owing to unknown selective pressures in evolutionary history. This study demonstrated the presence of three male-killers from two distinct origins in one fly species and highlighted the diverse and rapid genome evolution of MK Wolbachia in the host. Methods Methods Collection and rearing of Drosophila biauraria D. biauralia samples were collected from the Field Science Center for Northern Biosphere, Hokkaido University, Tomakomai, Hokkaido, Japan in 2015 and 2017. Flies were collected by sweeping and banana traps. The collected females were individually maintained at 19 °C with the standard banana medium [24]. The sex ratios of the lines derived from field-collected females were determined at the adult stage. The normal sex ratio (NSR) isofemale line SP11-20 [25] was maintained for more than 70 generations. The all-female matrilines (W1 and W2), each derived from a single female, were maintained by crossing with males of the SP11-20 line. Wolbachia and DbMKPV1 infections were detected by PCR, as described previously [24-25]. Tetracycline treatment All-female matrilines (W1 and W2) were reared on tetracycline-containing banana medium (0.05% [w/v]) [24] for two generations. Egg hatching rates Egg-hatching rates were estimated by counting the number of hatched and unhatched larvae. A total of 50–100 females of either W1 or NSR (SP11-20) were allowed to oviposit on grape juice agar medium for 1 d [25]. The eggs were collected and maintained in phosphate-buffered saline with Tween 20 (PBST; 137 mmol/l NaCl, 8.1 mmol/l Na2HPO4, 2.68 mmol/l KCl, 1.47 mmol/l KH2PO4, 0.02% Tween 20, pH7.4) for 4 d. The number of neonates and remaining embryos were counted manually under a microscope. This treatment was repeated at least four times. Sex determination of embryos and hatchlings of D. biauraria We determined the sex of embryos and hatchlings by PCR targeting a male-specific Y chromosome marker. Briefly, each embryo and neonate was squashed in 20 μL of PrepMan™ Ultra Sample Preparation Reagent (ThermoFisher). Samples were then incubated at 100°C for 10 min, vortexed for 15 s, centrifuged at 20,000 × g for 2 min, and finally subjected to PCR. A Y chromosome-linked male-specific marker for D. biauraria [25] was amplified using a pair of primers, DbY_c52202_F2 (5′-ACCGAGCGCGAAATCATAAAACCAGCATC-3′) and DbY_c52202_R2 (5′-CTCATATCACTTCATGTATCCCACACTTTTAACAG-3′). Db-actin5C-68-F (5′-GGCCATCCAGGCCGTGCTCTC-3′) and Db-actin5C-68-R (5′-GCGCTCGGCAGTGGTGGTGAAG-3′) were used to amplify actin-5C to confirm proper D. biauraria genomic DNA extraction. These markers were amplified using the Emerald Amp Max Master mix (TaKaRa) at 94°C for 3 min; the cycling conditions were as follows: 35 cycles of denaturation at 94°C for 30 s, annealing at 55°C for 30 s, and extension at 72°C for 30 sec, followed by a final extension at 72°C for 7 min. Of the actin-positive samples, those that were positive for Y-markers were classified as male. Those that were negative for Y-markers were classified as female. Genome sequencing of flies and constructions of Wolbachia genomes For genome sequencing of fly lines W1 and W2, high molecular weight DNA was extracted from 0.1 g adult females (approximately 100–200 individuals) by using Nanobind Tissue Big DNA Kit (Circulomics Inc., Baltimore, MD, USA) and was used for library construction using Ligation Sequencing Kit v14 (Oxford Nanopore Technologies, Oxford, UK) following the manufacturer’s protocol. The constructed libraries were sequenced using the ONT MinION flow cell (R 10.4) (Oxford Nanopore Technologies). The extracted DNA was also subjected to Illumina paired-end 150 bp sequencing (PE-150) at the Bioengineering Lab. Co., Ltd. (Japan). The obtained nanopore reads were assembled using Flye 2.3 [26] in Galaxy Europe (https://usegalaxy.eu/). Homologies between the assembled contigs of W1 and W2 and all Wolbachia genomes available in the NCBI database were assessed using BLASTn searches. Contigs showing homology to known Wolbachia genomes were designated as candidate contigs of Wolbachia strains in D. biauraria. The raw data of W1 and W2 were mapped to Wolbachia-like contigs using minimap2 v2.17-r941 [27], and the mapped reads were extracted using SAMtools v.1.9 [28] and assembled using Flye 2.3 [26]. The circularity of the Wolbachia wBiau1 and wBiau2 genomes was confirmed using Bandage v0.8.1 [29]. Circular Wolbachia genomes were polished against Illumina data using minimap2 [20] and Pilon v. 1.23 [30]. The polished closed genomes of the wBiau1 and wBiau2 strains were annotated via the DFAST web server [31]. Prophage regions were annotated using the PHASTER web server [32]. Wolbachia genes wmk [33], cifs (cifA and cifB) [34-35], and oscar [21, 36] were used to identify homologues in the wBiau1 and wBiau2 genomes using local BLASTn and BLASTp searches (default parameters). Motifs in the wmk, cifA, cifB, and oscar gene homologues were surveyed using InterPro (https://www.ebi.ac.uk/interpro/) and HHpred (https://toolkit.tuebingen.mpg.de/tools/hhpred). Phylogenetic trees of Wolbachia wsp and MLST genes were constructed based on maximum likelihood with bootstrap re-sampling of 1,000 replicates using MEGA7 [37]. Phylogenetic analysis of mitochondrial CO1 The mitochondrial CO1 of D. biauraria lines was amplified using HCO and LCO primer sets targeting the CO1 gene [38]. Amplicons were purified with Wizard® SV Gel and PCR Clean-Up System (Promega), which were subjected to sequencing using BigDye terminator v3.1 (Applied Biosystems) with the following conditions: 96°C for 1 min, followed by 25 cycles of 96°C for 10 sec, 50°C for 5 sec, and 60°C for 4 min. A phylogenetic tree of CO1 was constructed based on maximum likelihood with bootstrap re-sampling of 1,000 replicates using MEGA7 [37].
Galaxy brand profile in the UK 2022
statista.com
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Galaxy brand profile in the UK 2022 [Dataset]. https://www.statista.com/forecasts/1352680/galaxy-chocolate-brand-profile-in-the-uk
Explore at:
Dataset updated
Apr 3, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 12, 2022 - Oct 17, 2022
Area covered
United Kingdom
Description
How high is the brand awareness of Galaxy in the UK?When it comes to chocolate eaters, brand awareness of Galaxy is at 96% in the UK. The survey was conducted using the concept of aided brand recognition, showing respondents both the brand's logo and the written brand name.How popular is Galaxy in the UK?In total, 70% of UK chocolate eaters say they like Galaxy. However, in actuality, among the 96% of UK respondents who know Galaxy, 73% of people like the brand.What is the usage share of Galaxy in the UK?All in all, 64% of chocolate eaters in the UK use Galaxy. That means, of the 96% who know the brand, 67% use them.How loyal are the customers of Galaxy?Around 57% of chocolate eaters in the UK say they are likely to use Galaxy again. Set in relation to the 64% usage share of the brand, this means that 89% of their customers show loyalty to the brand.What's the buzz around Galaxy in the UK?In October 2022, about 26% of UK chocolate eaters had heard about Galaxy in the media, on social media, or in advertising over the past three months. Of the 96% who know the brand, that's 27%, meaning at the time of the survey there's some buzz around Galaxy in the UK.If you want to compare brands, do deep-dives by survey items of your choice, filter by total online population or users of a certain brand, or drill down on your very own hand-tailored target groups, our Consumer Insights Brand KPI survey has you covered.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohammad Heydarian; Mallory Freeberg; Krzysztof Poterlowicz; Mohammad Heydarian; Mallory Freeberg; Krzysztof Poterlowicz (2020). Training data for de novo transcriptome reconstruction from RNA-seq data [Dataset]. https://zenodo.org/records/583140

Training data for de novo transcriptome reconstruction from RNA-seq data

Explore at:

binAvailable download formats

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Mohammad Heydarian; Mallory Freeberg; Krzysztof Poterlowicz; Mohammad Heydarian; Mallory Freeberg; Krzysztof Poterlowicz

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The data provided here are part of a Galaxy Training Network tutorial that analyzes RNA-seq data using a de novo transcriptome reconstruction strategy from a study published by Wu et al., 2014 (DOI:10.1101/gr.164830.113). The goal of this study was to investigate "the dynamics of occupancy and the role in gene regulation of the transcription factor Tal1, a critical regulator of hematopoiesis, at multiple stages of hematopoietic differentiation." To this end, RNA-seq libraries were constructed from multiple mouse cell types including G1E - a GATA-null immortalized cell line derived from targeted disruption of GATA-1 in mouse embryonic stem cells - and megakaryocytes. This RNA-seq data was used to determine differential gene expression between G1E and megakaryocytes and later correlated with Tal1 occupancy. This dataset (GEO Accession: GSE51338) consists of biological replicate, paired-end, polyA selected RNA-seq libraries. Because of the long processing time for the large original files, we have downsampled the original raw data files to include only reads that align to a subset of interesting genomic loci identified by Wu et al. This dataset represents an even smaller set of data than another training data set (DOI:10.5281/zenodo.254485).

Clear search

Close search

Google apps

Main menu

Training data for de novo transcriptome reconstruction from RNA-seq data

The Galaxy platform for accessible, reproducible and collaborative...

GTN_PAR-CLIP_workflow

Additional file 5 of LotuS2: an ultrafast and highly accurate tool for...

Training material for the SIGU course "Data analysis and interpretation for...

Training material for the SIGU course "Data analysis and interpretation for...

EOSC4Cancer Longitudinal Synthetic Colorectal Cancer Genomic data developed...

Data from: Genetic Characteristics and Phylogenetic Relationships of 18...

Data from: NGS data related to Albrecht et al.: Locus specific and stable...

Data from: Aequatus: An open-source homology browser

Restart dataset for a single location in Norway ALP1 (61.0243N,8.12343E) for...

Sentiment analysis in Galaxy with IMDB movie review dataset

raw_mapped_bam_from_Galaxy

Data from: NGS data related to Rajaram et al.: Allele specific DNA...

Data from: Two male-killing Wolbachia from Drosophila birauraia that are...

Galaxy brand profile in the UK 2022

Training data for de novo transcriptome reconstruction from RNA-seq data