Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data here is a copy of the corresponding SRR records in the NCBI SRA. The duplication serves a dual purpose:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Galaxy is an open source, web-based platform for data intensive biomedical research. It makes accessible bioinformatics applications to users lacking programming skills, enabling them to easily build analysis workflows for NGS data.
The course "Exome analysis using Galaxy" is aimed at PhD student, biologists, clinicians and researchers who are analysing, or need to analyse in the near future, high throughput exome sequencing data. The aim of the course is to make participants familiarise with the Galaxy platform and prepare them to work independently, using state-of-the art tools for the analysis of exome sequencing data.
The course will be delivered using a mixture of lectures and computer based hands-on practical sessions. Lectures will provide an up-to-date overview of the strategies for the analysis of exome next-generation experiments, starting from the raw sequence data. Analyses include sequence quality control, alignment to a reference genome, refinement of aligned sequences, variant calling, annotation and interpretation, and tools for visual inspection of results. Participants will apply the knowledge gained during the course to the analysis of Illumina’s real exome datasets, and implement workflows to reproduce the complete analysis. After the course, participants will be able to create pipeline for their individual analyses.
Those are the needed datasets for this course.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
With NGS technologies, life sciences face a raw data deluge. Classical analysis processes of such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to directly focus on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools. Dedicated to ”whole genome assembly-free” treatments, the Colib’read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of de Bruijn graph and bloom filter, such analyses can be performed in few hours, using small amounts of memory. Applications on real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories. With the Colib’read Galaxy tools suite, we give the possibility to a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows to keep the maximum of biological information from data and use very low memory footprint.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Visualization for RNA transcript quality control and comparison of per base quality score Q. The images are taken before (A) and after (B) quality trimming procedure (removes reads with Q ≤ 20) to estimate the effect of trimming. The quality score Q is plotted to the read position by using the FastQC package in Galaxy (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The color indicates the quality of the read: "red" low quality, "orange" median quality, "green" good quality. Red line expresses the mean of the measured values (yellow boxes are inter-quartile range) and the blue line represents the mean quality. (ZIP 81 kb)
The NCBI BLAST suite has become ubiquitous in modern molecular biology and is used for small tasks such as checking capillary sequencing results of single PCR products, genome annotation or even larger scale pan-genome analyses. For early adopters of the Galaxy web-based biomedical data analysis platform, integrating BLAST into Galaxy was a natural step for sequence comparison workflows. Here we provide the command line NCBI BLAST+ tool suite wrapped for use within Galaxy. The integration of the BLAST+ tool suite into Galaxy has the goal of making common BLAST tasks easy and advanced tasks possible. This project is an informal international collaborative effort, it is deployed and used on Galaxy servers worldwide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains datasets required for the online training "Data analysis and interpretation for clinical genomics" available at https://sigu-training.github.io/clinical_genomics/.
Tools used in the training are available at the European Galaxy instance running at https://usegalaxy.eu, which also includes a copy of this repository in the Shared Data Libraries. BAM files in this dataset are based on the hg38 reference genome.
This is part of a 4 dataset submission. Refer to this dataset for details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy Training Network tutorial that analyzes RAD-seq data from a study published by Hohelnlohe et al., 2010 (DOI:10.1371/journal.pgen.1000862) to identify and type single nucleotide polymorphisms (SNPs) in each of 100 individuals from two oceanic and three freshwater populations and thus estimate genetic diversity and differentiation among populations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ReadMe. This file gives instructions concerning the prerequisites and the installation of sRNAPipe. (TXT 3Â kb)
Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. Findings: We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. Conclusions: ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
to date, genome assembly of non-model organisms is usually not at chromosomal level and higly fragmented. this fragmentation is recognized to be, in part, the result of a bad assembly of the transposable elements (tes) copies, increasing the difficulty to detect and annotate them.in this context, we designed a new bioinformatics pipeline named pirate for detect, classify and annotate tes of non-model organisms. pirate combines multiple analysis packages representing all the major approaches for te detection. the goal is to promote the detection of complete te sequences of every te families. the detection of complete te sequences, bearing recognizable conserved domains or specific motifs, allows to facilitate the classification step. the classification step of pirate has been optimized for algal genomes.each tools used by pirate are automated into a stand-alone galaxy. this pirate-galaxy can be used through a virtual machine, which can be download below.this pirate-galaxy is a suitable and flexible platform to study tes in the genome of every organisms.you can find a tutorial below.please contact us if you have any issues or comments : berthelier.j [at] laposte.net or gregory.carrier [at] ifremer.fror you can leave a message on github: https://github.com/jberthelier/pirate/issues
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The table summarizes the report generated by Metavisitor from a batch of 40 sequence datasets (S14 File). Metadata associated with each indicated sequence dataset as well as the ability of Metavisitor to detect HIV in datasets and patients are indicated.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Proteogenomics combines large-scale genomic and transcriptomic data with mass-spectrometry-based proteomic data to discover novel protein sequence variants and improve genome annotation. In contrast with conventional proteomic applications, proteogenomic analysis requires a number of additional data processing steps. Ideally, these required steps would be integrated and automated via a single software platform offering accessibility for wet-bench researchers as well as flexibility for user-specific customization and integration of new software tools as they emerge. Toward this end, we have extended the Galaxy bioinformatics framework to facilitate proteogenomic analysis. Using analysis of whole human saliva as an example, we demonstrate Galaxy’s flexibility through the creation of a modular workflow incorporating both established and customized software tools that improve depth and quality of proteogenomic results. Our customized Galaxy-based software includes automated, batch-mode BLASTP searching and a Peptide Sequence Match Evaluator tool, both useful for evaluating the veracity of putative novel peptide identifications. Our complex workflow (approximately 140 steps) can be easily shared using built-in Galaxy functions, enabling their use and customization by others. Our results provide a blueprint for the establishment of the Galaxy framework as an ideal solution for the emerging field of proteogenomics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In years 2018-2019, we organized on behalf of the Italian Society of Human Genetics (SIGU) an itinerant Galaxy-based “hands-on-computer” training activity entitled “Data analysis and interpretation for clinical genomics”. This one-day course was offered to participants including clinical doctors, biologists, laboratory technicians and bioinformaticians. Topics covered by the course were NGS data quality check, detection of variants, copy number alterations and runs of homozygosity, annotation and filtering and clinical interpretation of sequencing results.
To meet the constant need for training on basic NGS analysis and interpretation of sequencing data in the clinical setting, we designed an on-line Galaxy-based training resource dedicated to this topic, articulated in presentations and practical assignments by which students will learn how to approach NGS data processing at the level of FASTQ, BAM and VCF files and clinically-oriented examination of variants emerging from sequencing experiments such as whole exomes.
This repository contains datasets required for the online training "Data analysis and interpretation for clinical genomics" available at https://sigu-training.github.io/clinical_genomics/.
Tools used in the training are available at the European Galaxy instance running at https://usegalaxy.eu, which also includes a copy of this repository in the Shared Data Libraries. Files named Fam_*.bam are based on hg38 reference genome; all the other files refer to hg19.
This is part of a 4 dataset submission.
This record includes training materials associated with the Australian BioCommons webinar ‘Here’s one we prepared earlier: (re)creating bioinformatics methods and workflows with Galaxy Australia’. This webinar took place on 26 October 2022. Event description Have you discovered a brilliant bioinformatics workflow but you’re not quite sure how to use it? In this webinar we will introduce the power of Galaxy for construction and (re)use of reproducible workflows, whether building workflows from scratch, recreating them from published descriptions and/or extracting from Galaxy histories. Using an established bioinformatics method, we’ll show you how to: Use the workflows creator in Galaxy Australia Build a workflow based on a published method Annotate workflows so that you (and others) can understand them Make workflows finable and citable (important and very easy to do!) Materials are shared under a Creative Commons Attribution 4.0 International agreement unless otherwise specified and were current at the time of the event. Files and materials included in this record: Event metadata (PDF): Information about the event including, description, event URL, learning objectives, prerequisites, technical requirements etc. Index of training materials (PDF): List and description of all materials associated with this event including the name, format, location and a brief description of each file. GalaxyWorkflows_Slides (PDF): A PDF copy of the slides presented during the webinar. Materials shared elsewhere: A recording of this webinar is available on the Australian BioCommons YouTube Channel: https://youtu.be/IMkl6p7hkho
Computational_Genomics_ Instructor: Name: Dr. Rodolfo Aramayo, PhD Email address: raramayo@tamu.edu Location: Department of Biology Room 412A, Biological Sciences Building West (BSBW) Texas A&M University College Station, TX 77843-3258 Description: This repository contains materials used to teach Computational Genomics in the Spring 2023. This course was heavily based on materials extracted from and/or adapted from: ENSEMBL, and ENSEMBL Tutorials and Examples. Genomes. 2nd edition Current Topics in Genome Analysis Galaxy Training Materials Course Topics: History of Bioinformatics History of Genomics Cloning Basics The Carbon Clarke Formula Introduction to Galaxy Genome Files: FASTA Format Uploading Data into Galaxy Introduction to Text Manipulations Introduction to Regular Expressions Introduction to Gene Models and Tables: GFF3 Files Introduction to Genome Annotation Cyverse User Portal Introduction to Genome Browsers (ENSEMBL) Introduction to Comparative Genomics Working with Genome Files Introduction to Sequence Analysis Computational Arithmetics Author: Rodolfo Aramayo (raramayo@tamu.edu) License: All content produced in this site is licensed by: CC BY-NC-SA 4.0
Method overview To achieve targeted locus and allele-specific DNA demethylation, HEK293 cells were transfected with two plasmids. One plasmid contains, dCas9 fused to a SunTag with five repeats of the GCN4 peptide, separated by 22 aa long linkers, and scFv-fused TET1CD, as well as a GFP reporter protein. The other plasmid is a multiguide plasmid with 4 individual sgRNAs flanked by U6 promoter and gRNA scaffold, and a DsRed fluorophore. Control experiments were conducted with a scrambled sgRNA that does not have a binding site in the human genome. Initial studies showed that cells positive for two plasmids exhibited detectable fluorescence of the corresponding reporter proteins on day 3 post-transfection. Hence, FACS sorting was conducted at this time point. A part of the sorted cells was used immediately for downstream analysis, the other part was re-seeded to harvest at later time points. For DNA methylation analysis, genomic DNA was isolated from the cell samples and subjected to bisulfite treatment. Library preparation was performed using the bisulfite-converted samples, followed by NGS and data analysis. All methylation experiments were conducted in three independent biological replicates. For measurement of the genomic allele frequencies, genomic DNA of the untreated samples was used for the amplification of the region around the target SNP and an exonic region with additional SNP for each target, which was followed by library preparation, NGS and data analysis. To monitor the variation in the expression of the target genes, RNA was isolated from the treated cells on Day 6. cDNA synthesized from the isolated RNA was used for the library preparation of the exonic region. The library was subjected to NGS followed by data analysis. All experiments were conducted in three independent biological replicates. Method details The gDNA of transfected HEK293 cells sorted by FACS was extracted using QIAmp DNA Mini Kit (Qiagen). 500 ng of genomic DNA was subjected to overnight digestion with EcoRV which is not cutting in any of the target amplicons. Zymo EZ DNA Methylation-Lightning Kit (D5030-E) was used for bisulfite conversion. The library for NGS was prepared by two consecutive PCR reactions (Leitao et al, 2018). Firstly, bisulfite converted genomic DNA of each sample was amplified with target gene specific primers. The gene specific optimized amount of a product from the first PCR was used as a template for the second PCR to add the Illumina TruSeq sequencing adapters. Final products were quantified, pooled in equimolar amounts and purified using SPRIselect beads (Beckman Coulter). Ready-to-use pools of libraries were sequenced on NovaSeq 6000 using a PE250 flow cell (Novogene). For expression analysis, RNA was isolated from the sorted cells using Qiagen RNeasy extraction kit (Cat. No. 74034). By an additional treatment with TURBO DNA-free™ Kit (Ambion #AM1907) the residual genomic DNA from the samples were removed. 500 ng of the DNase-free RNA was used for cDNA synthesis with Applied Biosystems- High-Capacity cDNA Reverse Transcription Kit (Cat No 4368814). NRT was used as a negative control for cDNA synthesis, where the reaction was conducted without addition of the reverse transcriptase enzyme. In addition, NTC (no template control) reactions were included. The transcripts were subjected to library preparation in a two-step PCR process as mentioned above. For amplification of the genomic regions, 10 ng of the isolated genomic DNA was used. Two-step library preparation was carried out for NGS of genomic regions. All NGS data were obtained in the form of FASTQ files. Data analysis NGS data in a FASTQ format was analyzed as described (Rajaram et al., 2023) on the Galaxy platform (https://usegalaxy.org/) (The Galaxy platform for accessible, reproducible and collaborative biomedical analyses, 2022), where all the following tools are available. First, Illumina adapter sequences were removed using Trim Galore!. Afterwards, two paired-end reads were merged using Pear and reads with low quality were removed with Filter FASTQ. All NGS data files were subjected to this processing. For quantitative analysis of the methylation at individual CpG sites, the following steps were carried out. De-multiplexing of individual samples tagged with combinations of barcodes and Illumina indices was done by converting the FASTQ files using FASTQ to Tabular, followed by selection of lines with the tool Select and re-conversion of the files to a FASTQ format with Tabular to FASTQ. For the alignment of reads to a reference sequence, bwameth was used and the DNA methylation at each CpG site was analyzed by applying the tool MethylDackel. The output files were processed using Microsoft Excel. For the analysis of the allelic ratios of the transcript and genomic region, de-multiplexing of individual samples tagged with combinations of barcodes and Illumina indices was done by converting the FASTQ files using FASTQ to Tabular, followed by selection of lines with the tool Select. Input for the selection of lines was provided in accordance to the SNP of interest. Output of the tool Select provides the number of reads corresponding to each allele.
Methylation experiments:
For the competitive nucleosome methylation experiments, 0.6 pmol of each nucleosome variant were digested with MluI (NEB) for 60 min at 37°C in 10 µL NEB Cutsmart buffer (50 mM KOAc/20 mM Tris-acetate pH 7.9, 10 mM Magnesium Acetate, 100 µg/mL BSA) to remove residual unbound DNA. Afterwards, DNMT3A2 or DNMT3AC was added to the mixture to a final concentration ranging from 0.5 µM to 3 µM and in 80 µL NEB Cutsmart buffer supplemented with 10 mM EDTA and 25 µM AdoMet (Perkin Elmer). The methylation reaction was allowed to proceed for 2 h at 37°C. To stop the reaction and remove all nucleosome-bound proteins, proteinase K was added to the reaction and the sample was incubated for further 60 min at 37°C. The resulting unbound DNA was purified from the reaction mixture using the Nucleospin Gel and PCR cleanup kit (Macherey-Nagel). Bisulfite conversion of the methylated DNA was performed using the EZ DNA Methylation-Lightning kit (Zymo Research). Methylation of free DNA was conducted the same way using 15 µM DNA.
Library preparation and sequencing analysis:
Sample-specific barcodes and indices were added to the DNA by PCR amplification in a two-step PCR process. Briefly, in the first PCR, barcoded primers were used to amplify the bisulfite converted nucleosome DNA using the HotStartTaq Polymerase (Qiagen) and the resulting 321 bp fragment was purified using the Nucleospin Gel and PCR cleanup kit (Macherey-Nagel). In the second PCR step, adaptors and indices required for sequencing were added by amplification with the respective primers and the Phusion polymerase (ThermoFisher). The final 390-bp product was purified and used for Illumina paired end 2x250 bp sequencing. Datasets were analyzed using a local instance of the Galaxy bioinformatics server. Sequence reads were trimmed with the Trim Galore! Tool (developed by Felix Krueger at the Babraham Institute) and subsequently paired using PEAR. The reads were filtered according to the expected DNA length using the Filter FASTQ tool and mapped to the corresponding reference sequence using bwameth to determine the percentage of methylated CpGs.
The naming of the files is described in the Supplemental Table 1 of the accompanying manuscript.
Methylation of substrate libraries
Single-stranded DNA oligonucleotides used for generation of double stranded substrates with different distance between CpG sites were obtained from IDT. Sixteen single-stranded oligonucleotides were pooled in equimolar amounts and the second strand synthesis was conducted by a primer extension reaction using one universal primer. The obtained mix of double-stranded DNA oligonucleotides was methylated by DNMT3A catalytic domain and DNMT3A/3L and incubated for 60 min at 37 °C in the presence of 0.8 mM S-adenosyl-L-methionine (Sigma) in reaction buffer (20 mM HEPES pH 7.5, 1 mM EDTA, 50 mM KCl, 0.05 mg/mL bovine serum albumin). For DNMT3A, concentrations of 0.25 µM, 0,5 µM, 1 µM and 2 µM were used, for DNMT3A/3L 0.125 µM and 0.25 µM. In addition, a no-enzyme control was processed identically to all other samples. Reactions were stopped by shock freezing in liquid nitrogen, then treated with proteinase K for 2 hours at 42 °C. Afterwards DNA was digested with BsaI-HFv2 enzyme and a hairpin (pGAGAAGGGATGTGGATACACATCCCT) was ligated using T4 DNA ligase (NEB). DNA was bisulfite converted using EZ DNA Methylation-Lightning kit (ZYMO RESEARCH) according to the manufacturer protocol, purified and eluted with 10 µL ddH2O.
NGS library generation
Libraries for Illumina Next Generation Sequencing (NGS) were produced with the two-step PCR approach. In the first PCR, 2 µL of bisulfite-converted DNA were amplified with the HotStartTaq DNA Polymerase (QIAGEN) and primers containing internal barcodes using following conditions: 15 min at 95 °C, 10 cycles of 30 sec at 94 °C, 30 sec at 50 °C, 1 min and 30 sec at 72 °C, and final 5 min at 72 °C; using a mixture containing 1x PCR Buffer, 1x Q-Solution, 0.2 mM dNTPs, 0.05 U/µL HotStartTaq DNA Polymerase, 0.4 µM forward and 0.4 µM reverse primers in a total volume of 20 µL. In the second PCR, 1 µL of obtained products were amplified by Phusion Polymerase (Thermo) with another set of primers to introduce adapters and indices needed for NGS (30 sec at 98 °C, 10 cycles - 10 sec at 98 °C, 40 sec at 72 °C, and 5 min at 72 °C). PCRII was carried out in 1x Phusion HF Buffer, 0.2 mM dNTPs, 0.02 U/µL Phusion HF DNA Polymerase, 0.4 µM forward and 0.4 µM reverse primers in a total volume of 20 µL. Obtained libraries were pooled in equimolar amounts, purified and sequenced in the Max Planck Genome Centre Cologne.
Bioinformatic analysis
Bioinformatic analysis of obtained NGS data was conducted with a local Galaxy server and with home written scripts. Briefly, fastq files were analyzed by FastQC, 3’ ends of the reads with a quality lower than 20 were trimmed and reads containing both full-length sense and antisense strands were selected. Next, the samples were split using the internal barcodes with respect to the different experimental conditions. Afterwards the insert DNA sequence was extracted and used for further downstream analysis. The uploaded text files contain the bisulfite converted sequences with pairs of CpG sites in variable distance as described in the furhter documentation (info.pdf).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
See Method section for a description of the columns.
Methylation experiments: For the competitive nucleosome methylation experiments, 0.6 pmol of each nucleosome variant were digested with MluI (NEB) for 60 min at 37°C in 10 µL NEB Cutsmart buffer (50 mM KOAc/20 mM Tris-acetate pH 7.9, 10 mM Magnesium Acetate, 100 µg/mL BSA) to remove residual unbound DNA. Afterwards, DNMT3A2 or DNMT3AC was added to the mixture to a final concentration ranging from 0.5 µM to 3 µM and in 80 µL NEB Cutsmart buffer supplemented with 10 mM EDTA and 25 µM AdoMet (Perkin Elmer). The methylation reaction was allowed to proceed for 2 h at 37°C. To stop the reaction and remove all nucleosome-bound proteins, proteinase K was added to the reaction and the sample was incubated for further 60 min at 37°C. The resulting unbound DNA was purified from the reaction mixture using the Nucleospin Gel and PCR cleanup kit (Macherey-Nagel). Bisulfite conversion of the methylated DNA was performed using the EZ DNA Methylation-Lightning kit (Zymo Research). Methylation of free DNA was conducted the same way using 15 µM DNA. Library preparation and sequencing analysis: Sample-specific barcodes and indices were added to the DNA by PCR amplification in a two-step PCR process. Briefly, in the first PCR, barcoded primers were used to amplify the bisulfite converted nucleosome DNA using the HotStartTaq Polymerase (Qiagen) and the resulting 321 bp fragment was purified using the Nucleospin Gel and PCR cleanup kit (Macherey-Nagel). In the second PCR step, adaptors and indices required for sequencing were added by amplification with the respective primers and the Phusion polymerase (ThermoFisher). The final 390-bp product was purified and used for Illumina paired end 2x250 bp sequencing. Datasets were analyzed using a local instance of the Galaxy bioinformatics server. Sequence reads were trimmed with the Trim Galore! Tool (developed by Felix Krueger at the Babraham Institute) and subsequently paired using PEAR. The reads were filtered according to the expected DNA length using the Filter FASTQ tool and mapped to the corresponding reference sequence using bwameth to determine the percentage of methylated CpGs. The naming of the files is described in the Supplemental Table 1 of the accompanying manuscript.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data here is a copy of the corresponding SRR records in the NCBI SRA. The duplication serves a dual purpose: