Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets produced during the validation of CWL-based pipelines, designed for the analysis of data from RNA-Seq, ChIP-Seq and germline variant calling experiments. Specifically, the workflows were tested using publicly available High-throughput (HTS) data from published studies on Chronic Lymphocytic Leukemia (CLL) (accession numbers: E-MTAB-6962, GSE115772) and Genome in a Bottle (GIAB) project samples (accession numbers: SRR6794144, SRR22476789, SRR22476790, SRR22476791).
The supporting data include:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy tutorial that analyzes ChIP-seq data from a study published by Wu et al., 2014 (DOI:10.1101/gr.164830.113). The goal of this study was to investigate "the dynamics of occupancy and the role in gene regulation of the transcription factor Tal1, a critical regulator of hematopoiesis, at multiple stages of hematopoietic differentiation." To this end, ChIP-seq experiments were performed in multiple mouse cell types including a G1E cell line and megakaryocytes, the two cell types represented here. The dataset contains biological replicate Tal1 ChIP-seq and input control experiments (*.fastqsanger files). Because of the long processing time for the large original files, we have downsampled the original raw data files to include only reads that align to chromosome 19 and a subset of interesting genomic loci (ChIPseq_regions_of_interest_v4.bed) pulled from the Wu et al. publication. Also included is a gene annotation file (RefSeq_gene_annotations_mm10.bed) with gene names added for viewing in a genome browser.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test files for running snakePipes workflows
snakePipes are pipelines built using snakemake and python for the analysis of epigenomic datasets. Please refer to this link for further information on snakePipes.
This folder contains test files that can be used to run the ChIP-seq workflow under snakePipes. To test the workflow, follow the following steps :
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2 Results_of_intePareto. Full list of the results of integrative analysis using intePareto.
Information from the GEO states sample type, source name, organism, characteristics, Extracted molecule genomic DNA, Extraction protocol Library construction protocol, Library strategy, Library source, Library selection, Instrument model, Description, and Data processing. Design description depicts Human Chromatin IP REMC Sequencing on Illumina.
Data was also deposited in the Baylor College of Medicine's Genboee platform.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We report a computational approach for investigation of chromatin state plasticity. We applied this approach to investigate an ENCODE ChIP-seq dataset profiling the genome-wide distribution of H3K27me3 in 19 human cell lines. We found that high plasticity regions (HPRs) can be divided into two functionally and mechanistically distinct groups, consisting of CpG island proximal and distal regions. We identified cell-type specific regulators correlating with H3K27me3 patterns at distal HPRs in ENCODE cell lines. Furthermore, we applied this approach to investigate mechanisms for poised enhancer establishment in primary human erythroid precursors. We predicted and validated a previously unrecognized role of TAL1 in modulating H3K27me3 patterns through interaction with additional cofactors, such as GFI1B. Our integrative approach provides mechanistic insights into chromatin state plasticity and is broadly applicable to other epigenetic marks.
modENCODE_submission_5166 This submission comes from a modENCODE project of Jason Lieb. For full list of modENCODE projects, see Project Goal: The focus of our analysis will be elements that specify nucleosome positioning and occupancy, control domains of gene expression, induce repression of the X chromosome, guide mitotic segregation and genome duplication, govern homolog pairing and recombination during meiosis, and organize chromosome positioning within the nucleus. Our 126 strategically selected targets include key histone modifications and histone variants. We will integrate information generated with existing knowledge on the biology of the targets and perform ChIP-seq analysis on mutant and RNAi extracts lacking selected target proteins. For data usage terms and conditions, please refer to and EXPERIMENT TYPE: CHIP-seq. BIOLOGICAL SOURCE: Strain: N2; Developmental Stage: Early Embryo; Genotype: wild type; Sex: mixed Male and Hermaphrodite population; EXPERIMENTAL FACTORS: Developmental Stage Early Embryo; temp (temperature) 20 degree celsius; Strain N2; Antibody WA305-34819 H3K4me3 (target is H3K4me3)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet1 ToolsComparison: ChIP-seq pipeline software comparison. Sheet2 Examples of ChiLin report. A summary of example data annotation of transcription factor, chromatin regulatory factor and histone modification ChIP-seq data. Sheet3 Protein classification standard for the 8 categories. Sheet4 Protein classification results. Sheet5 BWA QC Database. ChiLin samples and datasets quality metrics across three layers. A clean up table of cistrome samples and datasets quality metrics for ChiLin users’ reference. The QC results is based on the reference of hg38 and mm10 assembly. (XLSX 10363 kb)
E12.5 wild-type embryos were dissected to collect ventral midbrain regions. Samples were crosslinked 10min with 1% formaldehyde and processed for chromatin extraction and Chromatin Immunoprecipitation (ChIP) following the Millipore upstate protocol. Libraries were prepared using the illumina ChIP-Seq DNA Sample Prep Kit. ChIP-Seq libraries were sequenced on the Illumina GAIIx.
In order to identify how MnTE-2-PyP affects p300 association to chromatin genome-wide, we performed a p300 chromatin Immunoprecipitation assay followed by Next Generation Sequencing on PC3 cells treated with or without MnTE-2-PyP one hour post-irradiation (Figure 3A). Based on the called peaks near genes, we predicted that HIF-1βand CREB transcription factors were associating DNA less in the presence of MnTE-2-PyP. DNA was ChIP-Fixed from Pc3 cells treated with 20 Gy radiation and with and without T2E drug. There are 2 biological replicates of PC3 untreated cells and 3 biological replicates of PC3 cells treated with MnTE-2-PyP. There are two corresponding input samples for the biological replicates.
Genomic locations of V5-tagged budding yeast Rif1 (including wilt-type and designer mutants) were analysed by ChIP-Seq. Mutants tested were rif1-7A and rif1-7E, in which Ser/Thr residues in the cluster of SQ/TQ sites were mutated to Ala or Glu, respectively. We also tested tested rif1-?594, in which the C-terminal 594 amino acids were deleted.
This study aims to investigate whether the passage of human chromosome 21 through the mouse male germline results in changes in the transcriptional deployment of the exogenous chromosome in the offspring generation. We used the Tc1 mouse model that stably carries almost an entire copy of human chromosome 21 and profiled the genome-wide pattern of H3K4me3, H3K27ac, CEBPA, HNF4A and RNA polymerase II in liver tissue of male and female-germline derived Tc1 mice using ChIP-Seq. Furthermore, the genome-wide pattern of H3K4me3 was profiled in additional tissues including kidney, liver and brain.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets for Galaxy Training on ChIP-SEQ analysis. Raw files can be downloaded from SRA project SRP051214
The experiment contains ChIP-seq data for Vibrio cholerae strain E7946. The strain was grown at 37 degrees in LB medium and crosslinked with 1 % (v/v) formaldehyde. After sonication, to break open cells and fragment DNA, immunoprecipitations were done using anti-FLAG antibodies. Libraries were prepared using DNA remaining after immunoprecipitation.
To identify direct transcriptional targets of RFX6, we performed chromatin immunoprecipitation of HA epitope tagged RFX6 followed by massively parallel DNA sequencing (ChIP-seq). Using CRISPR/Cas9 gene editing, the HA epitope was inserted into the 3' end of the RFX6 gene in H9 hESC. Pluripotent cells were then differentiated into PDX1+RFX6+ pancreatic progenitors and endogenous RFX6-HA was immunoprecipitated with an anti-HA antibody. To eliminate background signal caused by non-specific antibody binding, a control experiment using wild-type H9 hESC was performed in parallel.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:
For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.
This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl
Steps to reproduce
To build the research object again, use Python 3 on macOS. Built with:
Install cwltool
pip3 install cwltool==1.0.20180912090223
Install git lfs
The data download with the git repository requires the installation of Git lfs:
https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs
Get the data and make the analysis environment ready:
git clone https://github.com/FarahZKhan/cwl_workflows.git
cd cwl_workflows/
git checkout CWLProvTesting
./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh
Run the following commands to create the CWLProv Research Object:
cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json
zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac
sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256
The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120
This study describes the epigenetic profiling of H3K4me2 and Pol II in growth stage of Tetrahymena thermophila. ChIP-Seq analysis of Pol II and H3K4me2 occupancy.
ChIP-seq (chromatin immunoprecipitation followed by sequencing) is commonly used to identify genome-wide protein-DNA interactions. However, ChIP-seq often gives a low yield, which is not ideal for quantitative outcomes. An alternative method to ChIP-seq is ChEC-seq (Chromatin endogenous cleavage with high-throughput sequencing). In this method, the endogenous TF (transcription factor) of interest is fused with MNase (micrococcal nuclease) that non-specifically cleaves DNA near binding sites. Compared to the original ChEC-seq method, the modified version requires far less amplification. Since MACS3 failed to identify peaks in data generated from the modified ChEC-seq method, a new peak finder has been developed specifically for it. There are three functions in the peak_finder/. callpeaks() is used to identify peaks from BAM files. goanalysis() is used to make GO (Gene Ontology) term plots from peaks. bedtomeme() is a wrapper function to perform MEME analysis in R after MEME Suite is inst..., ****EXCERPTED FROM BIORXIV PREPRINT; SEE PREPRINT OR PUBLISHED PAPER FOR REFERENCES AND DETAILS**** Yeast strains All yeast strains were derived from BY4741. A C-terminal micrococcal nuclease fusion was introduced to the protein of interest through transformation and homologous recombination of PCR-amplified DNA. Primers were designed with 50-bp of homology to the 3’ end of the coding sequence of interest. The 3xFLAG-MNase with a KanR marker was amplified from pGZ108 (Zentner et al., 2015) and transformed into BY4741 as previously described. Successful transformation was confirmed by immunoblotting and PCR, followed by sequencing. Lyophilized DNA oligonucleotides were resuspended in molecular-grade water to a concentration of 100 µM. For ligation, the following pair of oligonucleotides were annealed to produce the Y-adapter: Tn5ME-A (5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’) and Y-Adapt-i5 R (5’-CTGTCTCTTATACACATCTTCATAGTAATCATC-3’). For Tn5 Tagmentation, the following i7 oligonucle..., , # DoubleChEC TF binding site finder
ChIP-seq (chromatin immunoprecipitation followed by sequencing) is commonly used to identify genome-wide protein-DNA interactions. However, ChIP-seq often gives a low yield, which is not ideal for quantitative outcomes. An alternative method to ChIP-seq is ChEC-seq (Chromatin endogenous cleavage with high-throughput sequencing). In this method, an endogenous TF (transcription factor) fused to MNase (micrococcal nuclease) cleaves DNA near binding sites. This package is designed to identify high-confidence binding sites from cleavage patterns from ChEC-seq2, a variant form of ChEC-seq.
There are three functions in the peak_finder/
. callpeaks()
is used to identify peaks from single-end mapped reads input as BAM files. goanalysis()
is used to make GO (Gene Ontology) term plots from peaks. bedtomeme()
is a wrapper function to perform MEME analysis in R **after [MEME Suite](https://meme-...
modENCODE_submission_5236 This submission comes from a modENCODE project of Jason Lieb. For full list of modENCODE projects, see http://www.genome.gov/26524648 Project Goal: The focus of our analysis will be elements that specify nucleosome positioning and occupancy, control domains of gene expression, induce repression of the X chromosome, guide mitotic segregation and genome duplication, govern homolog pairing and recombination during meiosis, and organize chromosome positioning within the nucleus. Our 126 strategically selected targets include key histone modifications and histone variants. We will integrate information generated with existing knowledge on the biology of the targets and perform ChIP-seq analysis on mutant and RNAi extracts lacking selected target proteins. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf EXPERIMENT TYPE: CHIP-seq. BIOLOGICAL SOURCE: Strain: fem-2(b245); Developmental Stage: Germline containing young adult; Genotype: fem-2(b245)III; Sex: Hermaphrodite; EXPERIMENTAL FACTORS: Developmental Stage Germline containing young adult; temp (temperature) 20 degree celsius; Strain fem-2(b245); Antibody HIM-3 SDQ4498 (target is HIM-3)
This SuperSeries is composed of the following subset Series: GSE32141: Expression analysis LPS stimulated THP-1 cells in four paired samples GSE32324: ChIP-seq analysis LPS stimulated THP-1 cells Refer to individual Series
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets produced during the validation of CWL-based pipelines, designed for the analysis of data from RNA-Seq, ChIP-Seq and germline variant calling experiments. Specifically, the workflows were tested using publicly available High-throughput (HTS) data from published studies on Chronic Lymphocytic Leukemia (CLL) (accession numbers: E-MTAB-6962, GSE115772) and Genome in a Bottle (GIAB) project samples (accession numbers: SRR6794144, SRR22476789, SRR22476790, SRR22476791).
The supporting data include: