Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy tutorial that analyzes ChIP-seq data from a study published by Wu et al., 2014 (DOI:10.1101/gr.164830.113). The goal of this study was to investigate "the dynamics of occupancy and the role in gene regulation of the transcription factor Tal1, a critical regulator of hematopoiesis, at multiple stages of hematopoietic differentiation." To this end, ChIP-seq experiments were performed in multiple mouse cell types including a G1E cell line and megakaryocytes, the two cell types represented here. The dataset contains biological replicate Tal1 ChIP-seq and input control experiments (*.fastqsanger files). Because of the long processing time for the large original files, we have downsampled the original raw data files to include only reads that align to chromosome 19 and a subset of interesting genomic loci (ChIPseq_regions_of_interest_v4.bed) pulled from the Wu et al. publication. Also included is a gene annotation file (RefSeq_gene_annotations_mm10.bed) with gene names added for viewing in a genome browser.
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC00001002224https://ega-archive.org/dacs/EGAC00001002224
This dataset gather ChIP-seq data produced by immunoprecipitating CTCF factor in own laboratory in MM.1S cell line in EtOH and Dex conditions. It also gather ChIP-seq dataset produced by external laboratory (Active Motif) for H3K27ac mark and GR transcription factor in same cell line and conditions ( MM.1S ETOH/Dex)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains datasets necessary for using the Virtual ChIP-seq software.
Virtual ChIP-seq requires the following datasets to predict transcription factor binding:
chipExpDir_AtoH_V1.0.0.tar.gz: Reference matrices of correlation between TF binding and gene expression for TFs starting with letters A-H.
chipExpDir_ItoZ_V1.0.0.tar.gz: Reference matrices of correlation between TF binding and gene expression for TFs starting with letters I-Z.
refTables_V1.1.0.tar.gz: PhastCons genomic conservation, FIMO PWM scores for JASPAR motifs, and ChIP-seq data of ENCODE and Cistrome database.
hg38_chrsize.tsv: Length of chromosomes in hg38
trainedModels_V1.0.0.tar.gz: Virtual ChIP-seq scikit-learn trained models saved in joblib format
.tar.gz: Pre-calculated matrices suitable for training with other algorithms or re-training with Virtual ChIP-seq.
Some predictive features of TF binding are the same in each cell type and are stored together for simplicity in refTables_V1.0.0.tar.gz. You can use datasets from other cell types (named here as .tar.gz) for the purpose of re-training the model. The .tar.gz files contain pre-calculated predictive features of transcription factor binding in 4 chromosomes (5, 10, 15, 20).
These features include:
PhastCons genomic conservation
FIMO score for sequence motifs of TF in the JASPAR database
Chromatin accessibility
TF binding in ENCODE + Cistrome DB datasets
Virtual ChIP-seq expression score
Facebook
TwitterDatabase for visualizing and making use of public ChIP-seq data. ChIP-Atlas covers almost all public ChIP-seq experiments and data submitted to the SRA (Sequence Read Archives) in NCBI, DDBJ, or ENA.
Facebook
TwitterTarget genes of transcription factors from published ChIP-chip, ChIP-seq, and other transcription factor binding site profiling studies
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The number of experiments in which gene was up/down regulated in RNA-seq data and the average of ChIP-seq MACS2 values of HIF1A and EPAS1(HIF2A) in ChIP-Atlas database.Both were calculated from public NGS database (SRA).For up/donw regulated gene selection, 2 fold threshold was adopted.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy Training Network tutorial that analyzes ChIP-seq data from a study published by Ross-Inness et al., 2012 (DOI:10.1038/nature10730) to identify the binding sites of the Estrogen receptor, a transcription factor known to be associated with different types of breast cancer.
Facebook
TwitterAttribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
To combat DNA damage, organisms mount a DNA damage response (DDR) that results in cell cycle regulation, DNA repair and, in severe cases, cell death. Underscoring the importance of gene regulation in this response, studies in Arabidopsis have demonstrated that all of the aforementioned processes rely on SUPPRESSOR OF GAMMA RESPONSE 1 (SOG1), a NAC family transcription factor (TF) that has been functionally equated to the mammalian tumor suppressor, p53. However, the expression networks connecting SOG1 to these processes remain largely unknown and, although the DDR spans from minutes to hours, most transcriptomic data correspond to single time-point snapshots. Here, we generated transcriptional models of the DDR from GAMMA (γ)-irradiated wild-type and sog1 seedlings during a 24-hour time course using DREM, the Dynamic Regulatory Events Miner, revealing 11 coexpressed gene groups with distinct biological functions and cis-regulatory features. Within these networks, additional chromatin immunoprecipitation and transcriptomic experiments revealed that SOG1 is the major activator, directly targeting the most strongly up-regulated genes, including TFs, repair factors, and early cell cycle regulators, while three MYB3R TFs are the major repressors, specifically targeting the most strongly down-regulated genes, which mainly correspond to G2/M cell cycle-regulated genes. Together these models reveal the temporal dynamics of the transcriptional events triggered by γ-irradiation and connects these events to TFs and biological processes over a time scale commensurate with key processes coordinated in response to DNA damage, greatly expanding our understanding of the DDR.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Shown are the number of peaks called and the total number of bp covered by each peak set for H3K4me3, H3K36me3, and H3K9me3 using the original Sole-search program or the program which has been modified to identify broad regions covered by modified histones. Also shown in the increase in genome coverage (fold difference) that results when using the modified peak calling program. Both the original and the modified program can be accessed at http://chipseq.genomecenter.ucdavis.edu/cgi-bin/chipseq.cgi.
Facebook
TwitterThis data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Philip Cayting mailto:pcayting@stanford.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu).This track shows probable binding sites of the specified transcription factors (TFs) in the given cell types as determined by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq). Included for each cell type is the input signal, which represents the control condition where no antibody targeting was performed. For each experiment (cell type vs. antibody) this track shows a graph of enrichment for TF binding (Signal), along with sites that have the greatest evidence of transcription factor binding (Peaks).For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Further preparations were similar to those previously published (Euskirchen et al., 2007) with the exceptions that the cells were unstimulated and sodium orthovanadate was omitted from the buffers. For details on the chromatin immunoprecipitation protocol used, see Euskirchen et al. (2007) and Rozowsky et al. (2009).DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome.For each 1 Mb segment of each chromosome a peak height threshold was determined by requiring a false discovery rate <= 0.05 when comparing the number of peaks above threshold as compared the number obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.05 are considered to be significantly enriched compared to the input DNA control.
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC00001000135https://ega-archive.org/dacs/EGAC00001000135
ChIP-Seq data for 7 Acute myeloid leukemia sample(s). 23 run(s), 23 experiment(s), 23 alignment(s). Part of BLUEPRINT release January 2015. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/releases/20140811/homo_sapiens/README_chipseq_analysis_ebi_20140811
Facebook
TwitterChromatin immunoprecipitation and sequencing (ChIP-seq) has been widely used to map DNA-binding proteins, histone proteins and their modifications. ChIP-seq data contains redundant reads termed duplicates, referring to those mapping to the same genomic location and strand. There are two main sources of duplicates: polymerase chain reaction (PCR) duplicates and natural duplicates. Unlike natural duplicates that represent true signals from sequencing of independent DNA templates, PCR duplicates are artifacts originating from sequencing of identical copies amplified from the same DNA template. In analysis, duplicates are removed from peak calling and signal quantification. Nevertheless, a significant portion of the duplicates is believed to represent true signals. Obviously, removing all duplicates will underestimate the signal level in peaks and impact the identification of signal changes across samples. Therefore, an in-depth evaluation of the impact from duplicate removal is needed. Using eight public ChIP-seq datasets from three narrow-peak and two broad-peak marks, we tried to understand the distribution of duplicates in the genome, the extent by which duplicate removal impacts peak calling and signal estimation, and the factors associated with duplicate level in peaks. The three PCR-free histone H3 lysine 4 trimethylation (H3K4me3) ChIP-seq data had about 40% duplicates and 97% of them were within peaks. For the other datasets generated with PCR amplification of ChIP DNA, as expected, the narrow-peak marks have a much higher proportion of duplicates than the broad-peak marks. We found that duplicates are enriched in peaks and largely represent true signals, more conspicuous in those with high confidence. Furthermore, duplicate level in peaks is strongly correlated with the target enrichment level estimated using nonredundant reads, which provides the basis to properly allocate duplicates between noise and signal. Our analysis supports the feasibility of retaining the portion of signal duplicates into downstream analysis, thus alleviating the limitation of complete deduplication.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
See "Read Me" document and "Data Dictionary" file for detailed information. ChIP-seq: processed and ready for visualization a public genome browser (.bigwig).
Facebook
TwitterThis data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Florencia Pauli mailto:fpauli@hudsonalpha.org). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). The ChIP-Seq method was used to assay chromatin fragments bound by specific or general transcription factors as described below. DNA isolated by ChIP-Seq was size-selected (~225 bp) and sequenced. Short reads of 25-36 bp were mapped to the human reference genome, and enriched regions of high read density relative to a total input chromatin control reads were identified. The sequence reads with quality scores (fastq files) and alignment coordinates (BAM files) from these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). Cross-linked chromatin was immunoprecipitated with an antibody. The Protein:DNA crosslinks were then reversed and the DNA fragments were recovered and sequenced. Please see protocol notes below and go to http://hudsonalpha.org/myers-lab/protocols for the most current version of the protocol. Biological replicates from each experiment were completed. Libraries were sequenced with an Illumina Genome Analyzer I or an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. Sequence data produced by the Illumina data pipeline software were quality filtered and then mapped to NCBI Build37 (hg19) using the integrated Eland software; 32 nt of the sequence reads were used for alignment; up to two mismatches were tolerated; reads that mapped to multiple sites in the genome were discarded. To identify likely binding sites, peak calling was applied to the aligned sequence data sets using Model-based Analysis of Chip-Seq MACS (Zhang Y, et al., 2008) (http://liulab.dfci.harvard.edu/MACS/00README.html). MACS models the shift size of ChIP-Seq tags empirically, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to capture local biases in the genome, allowing for more robust predictions (Zhang Y, et al., 2008). Protocol Notes: Several changes and improvements were made to the original ChIP-Seq protocol (Jonshon et al.,2008). The major differences between protocols are the number of cells and magnetic beads used for IP, the method of sonication used to fragment DNA, and the number of cycles of PCR used to amplify the sequencing library. The most current protocol used by the Myers lab can be found at http://hudsonalpha.org/myers-lab/protocols. The protocol field for each file denotes the version of the protocol used as being PCR1x, PCR2x or a version number (for examples, v041610.1). The sequencing libraries labeled as PCR2x were made with two rounds of amplification (25 and 15 cycles) and those labeled as PCR1x were made with one 15-cycle round of amplification. These experiments were completed prior to January 2010 and were originally aligned to NCBI Build36 (hg18). They have been re-aligned to NCBI Build37 (hg19) with the Bowtie software (Langmead, et al., 2009) for this data release (http://bowtie-bio.sourceforge.net/index.shtml). The libraries labeled with a protocol version number were competed after January 2010 and were only aligned to NCBI Build37 (hg19). Please refer to the Myers Lab website (http://hudsonalpha.org/myers-lab/protocols) for details on each protocol version. Verification: The MACS (http://liulab.dfci.harvard.edu/MACS/00README.html) peak caller was used to call significant peaks on the individual replicates of a ChIP-Seq experiment. Afterwards, the irreproducible discovery rate (IDR) method, developed by Li et al. (submitted), was used to quantify the consistency between pairs of ranked peaks lists from replicates. The IDR methods uses a model that assumes that the ranked lists of peaks in a pair of replicates consist of two groups - a reproducible group and an irreproducible group. In general, the signals in the reproducible group are more consistent (i.e. with a larger rank correlation coefficient) and are ranked higher than the irreproducible group. The proportion of peaks that belong to the irreproducible component and the correlation of the reproducible component are estimated adaptively from the data. The model also provides an IDR score for each peak, which reflects the posterior probability of the peak belonging to the irreproducible group. The aligned reads were pooled from all replicates and the MACS peak caller was used to call significant peaks on the pooled data. Only datasets containing at least 100 peaks passing the IDR threshold are considered valid and submitted for release.
Facebook
TwitterWe have analyzed publicly available K562 Hi-C data, which enables genome-wide unbiased capturing of chromatin interactions, using a Mixture Poisson Regression Model to define a highly specific set of interacting genomic regions. We integrated multiple ENCODE Consortium resources with the Hi-C data, using DNase-seq data and ChIP-seq data for 46 transcription factors and 8 histone modifications. We classified 12 different sets (clusters) of interacting loci that can be distinguished by their chromatin modifications and which can be categorized into three types of chromatin hubs. The different clusters of loci display very different relationships with transcription factor binding sites. As expected, many of the transcription factors show binding patterns specific to clusters composed of interacting loci that encompass promoters or enhancers. However, cluster 6, which is distinguished by marks of open chromatin but not by marks of active enhancers or promoters, was not bound by most transcription factors but was highly enriched for 3 transcription factors (GATA1, GATA2, and c-Jun) and 3 chromatin modifiers (BRG1, INI1, and SIRT6). To validate the identification of the clusters and to dissect the impact of chromatin organization on gene regulation, we performed RNA-seq analyses before and after knockdown of GATA1 or GATA2. We found that knockdown of the GATA factors greatly alters the expression of genes within cluster 6. Our work, in combination with previous studies linking regulation by GATA factors with c-Jun and BRG1, provide genome-wide evidence that Hi-C data identifies sets of biologically relevant interacting loci. RNA-seq of control, siGATA1 and siGATA2 K562 cells
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC00001001626https://ega-archive.org/dacs/EGAC00001001626
This dataset contains CTCF ChIP-sequencing data from seven samples (six tumor samples and one tumor derived cell line). Following library amplification, DNA fragments were sequenced using Illumina HiSeq 2000 paired-end sequencing resulting in 14 FASTQ files.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
ChIP-seq (chromatin immunoprecipitation followed by sequencing) is commonly used to identify genome-wide protein-DNA interactions. However, ChIP-seq often gives a low yield, which is not ideal for quantitative outcomes. An alternative method to ChIP-seq is ChEC-seq (Chromatin endogenous cleavage with high-throughput sequencing). In this method, the endogenous TF (transcription factor) of interest is fused with MNase (micrococcal nuclease) that non-specifically cleaves DNA near binding sites. Compared to the original ChEC-seq method, the modified version requires far less amplification. Since MACS3 failed to identify peaks in data generated from the modified ChEC-seq method, a new peak finder has been developed specifically for it. There are three functions in the peak_finder/. callpeaks() is used to identify peaks from BAM files. goanalysis() is used to make GO (Gene Ontology) term plots from peaks. bedtomeme() is a wrapper function to perform MEME analysis in R after MEME Suite is installed locally. Methods ****EXCERPTED FROM BIORXIV PREPRINT; SEE PREPRINT OR PUBLISHED PAPER FOR REFERENCES AND DETAILS**** Yeast strains All yeast strains were derived from BY4741. A C-terminal micrococcal nuclease fusion was introduced to the protein of interest through transformation and homologous recombination of PCR-amplified DNA. Primers were designed with 50-bp of homology to the 3’ end of the coding sequence of interest. The 3xFLAG-MNase with a KanR marker was amplified from pGZ108 (Zentner et al., 2015) and transformed into BY4741 as previously described. Successful transformation was confirmed by immunoblotting and PCR, followed by sequencing. Lyophilized DNA oligonucleotides were resuspended in molecular-grade water to a concentration of 100 µM. For ligation, the following pair of oligonucleotides were annealed to produce the Y-adapter: Tn5ME-A (5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’) and Y-Adapt-i5 R (5’-CTGTCTCTTATACACATCTTCATAGTAATCATC-3’). For Tn5 Tagmentation, the following i7 oligonucleotides were annealed: Tn5ME-B (5’ -GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3’) and Tn5MErev, (5’-PO4-CTGTCTCTTATACACATCT-3’). Pairs of oligonucleotides were annealed as follows: 45 µl of each oligo (100 µM) was combined with 10 µl of 1 M Potassium Acetate, 300 mM HEPES, pH 7.5 in a 0.2 ml PCR tube. In a thermocycler, the mixture was heated to 95˚C for 4 minutes, cooled 1°C/minute until 50°C, incubated at 50°C for 5 minutes, and then cooled 1°C/minute until 4°C. Hybridized oligos were stored in 15 µl aliquots at -20˚C. Tn5 purification and adapter loading Tn5 E54K L372P was purified as previously described (Hennig et al., 2017). We found that Tn5 was sufficiently pure following purification on Ni2+-chromatography and we therefore omitted the final gel filtration step. Purified Tn5 was aliquoted and stored at -80°C. Optimal Tn5 activity was determined by cleaving genomic DNA and assessing fragmentation using the Femto Pulse (Figure S2d), and resulting DNA libraries were confirmed to be of appropriate length for Illumina Sequencing by TapeStation (Figure S2e). Tn5 was thawed on ice and 100 µl Tn5 was added to 10 µl i7 (45 µM) in a 1.7 ml tube and mixed by gently pipetting. The mixture was incubated at 23°C, mixing at 350 rpm for 45 minutes. Adapter-loaded Tn5 was stored at -20°C and used within 24 hours.
Chromatin endogenous cleavage detailed protocol Chromatin digestion 1. Grow cells in 10ml overnight at 30°C, 200 rpm. 2. Dilute cultures into 50ml media to OD600 ~ 0.1. 3. When cultures reach OD600 = 0.5 - 0.8, harvest 25 ODs (i.e. 50ml if the OD600 = 0.5) of cells by centrifugation at 2500 x g for 1 minute. 4. Resuspend cells in 1 ml Buffer A and transfer to a 1.5 ml tube. 5. Pellet cells by centrifugation at 2500 x g for 1 minute, remove supernatant. 6. Wash cells 2 x 1 ml Buffer A, removing supernatant. 7. Resuspend cells in 600 µl Buffer A + 0.1% Digitonin. 8. Transfer tube to a 30°C heat block and incubate for 5 minutes. 9. Add 5 µl of 333 mM CaCl2, mix by inverting, incubate at 30°C for the appropriate cleavge time (determined empirically for each protein). 10. To stop the reaction, remove 200 µl cells and combine with 200 µl 2x Stop Buffer. 11. Add 8 µl Proteinase K (20 µg/µl) and mix. 12. Incubate at 50°C, agitating 800 rpm for 30 minutes in a thermomixer. 13. Remove samples from thermomixer and cool at room temperature for 5 minutes. 14. Add 400 µl Phenol-Chloroform-Isoamyl Alcohol (25:24:1), pH 7.8, mix. 15. Centrifuge at 24,000 x g, 5 minutes. 16. Transfer aqueous phase to a phase-lock tube. 17. Add 200 µl Phenol-Chloroform-Isoamyl Alcohol (25:24:1), pH 7.8. 18. Invert 10x to mix. 19. Centrifuge at 24,000 x g for 5 minutes. 20. Transfer aqueous phase to a tube containing 1 ml 100% Ethanol. 21. Add 2 µl of linear acrylamide (5 µg/µl). 22. Invert 10x to mix. 23. Incubate at -80°C for 30 minutes. 24. Centrifuge at 24,000 x g, 4°C for 10 minutes. 25. Pour off supernatant. 26. Wash DNA pellet in 1 ml of 70% ethanol. 27. Centrifuge at 24,000 x g for 1 minute. 28. Pour off supernatant. Collect residual ethanol by centrifugation and remove by pipetting. 29. Dry DNA pellet until all ethanol had evaporated. 30. Add 58 µl of 10 mM Tris-HCl, pH 8.5 to DNA pellet. 31. Incubate overnight at room temperature. 32. Incubate at 37°C for 30 minutes. 33. Add 2 µl RNase A (10 µg/µl) to DNA. 34. Incubate at 37°C for 30 minutes. 35. Evaluate molecular weight of DNA by gel electrophoresis, 0.8% agarose or TapeStation. 36. Quantify DNA concentration with the Qubit double-stranded DNA, Broad Range Assay. 37. Stored DNA at 4°C until library preparation was performed (up to a month), then stored at -20°C.
Buffer A 15 mM Tris-HCl, pH 7.5 80 mM KCl 0.1 mM EGTA 1.0 mM PMSF 0.5 mM Spermidine 0.2 mM Spermine -Add 1 EDTA-Free Protease Inhibitor Tab per 50 ml Buffer A (Roche; Sigma # 11873580001)
2x Stop Buffer 400 mM NaCl 20 mM EDTA 4 mM EGTA
Bioinformatic analysis Quality Control, Trimming, and Mapping Read quality and sequencer performance was evaluated with FASTQC. Reads were adapter and quality trimmed with Trimmomatic (Bolger et al., 2014) using single-end settings. Bases at either end of a read were trimmed if base-call quality was less than 30, and only reads of length ≥25 bp were retained. Trimmed reads were mapped to the Saccharomyces cerevisiae genome (Engel et al., 2013), version R64-4-1 with Bowtie2 (Langmead and Salzberg, 2012)and mapped reads with a MAPQ <10 were removed with Samtools (Li et al., 2009). DoubleChEC identification of high-confidence TF binding sites For peak calling analysis, BAM files for three or more biological replicates of the TF-MNase and soluble MNase were read and trimmed to the first base pair. Unnormalized counts and normalized counts per million (CPMn) were tallied for each base pair in the yeast genome and the average CPMn values among replicates were calculated for each position. Next, mean CPMn values were smoothed using a sliding window of 3 and a step size of 2. Windows with CPMn values less than three times the genome average were filtered out. After this filtering, local maxima (windows with values greater than their immediate neighbors) were identified. Unnormalized reads were smoothed, retaining positions that were identified as local maxima, and inputted them in DESeq2 (version 1.36.0) to identify windows with values significantly higher than those in the soluble MNase control. Only TF-MNase peaks with a greater log2-fold change of 1.7 and an adjusted p-value less than 0.0001 over soluble MNase were retained. Finally, the peaks were filtered again to identify doublet peaks that are between 15bp and 50bp apart, which were merged to single peaks. GO term plot A list of genes whose 700bp upstream regions overlap with peaks identified by the peak finder was input to enrichGO (Wu et al., 2021) to generate GO term plots based on biological functions. The 10 most significant GO terms with adjusted p-values less than 0.05 were plotted. MEME analyses The MEME Suite (version 5.5.1) was installed onto the local computer and two custom wrapper functions were written in R for the local bed2fasta and meme programs. These functions were then used to convert bed files, generated from peak calling, into FASTA files. These FASTA files were subsequently to generate motif logos. Both bed2fasta and meme programs were run using their default parameter values.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
See "Read Me" document and "Data Dictionary" file for detailed information. ChIP-seq and ATAC-seq: processed and ready for visualization a public genome browser (.bigwig).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis was performed to evaluate whether CgCrzA plays a role in regulating CWI-related genes. Compared to the control, the ChIP-seq samples exhibited enrichment of CgCrzA-bound DNA fragments under CFW conditions
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transcriptomic profiling is an immensely powerful hypothesis generating tool. However, accurately predicting the transcription factors (TFs) and cofactors that drive transcriptomic differences between samples is challenging. A number of algorithms draw on ChIP-seq tracks to define TFs and cofactors behind gene changes. These approaches assign TFs and cofactors to genes via a binary designation of ‘target’, or ‘non-target’ followed by Fisher Exact Tests to assess enrichment of TFs and cofactors. ENCODE archives 2314 ChIP-seq tracks of 684 TFs and cofactors assayed across a 117 human cell lines under a multitude of growth and maintenance conditions. The algorithm presented herein, Mining Algorithm for GenetIc Controllers (MAGIC), uses ENCODE ChIP-seq data to look for statistical enrichment of TFs and cofactors in gene bodies and flanking regions in gene lists without an a priori binary classification of genes as targets or non-targets. When compared to other TF mining resources, MAGIC displayed favourable performance in predicting TFs and cofactors that drive gene changes in 4 settings: 1) A cell line expressing or lacking single TF, 2) Breast tumors divided along PAM50 designations 3) Whole brain samples from WT mice or mice lacking a single TF in a particular neuronal subtype 4) Single cell RNAseq analysis of neurons divided by Immediate Early Gene expression levels. In summary, MAGIC is a standalone application that produces meaningful predictions of TFs and cofactors in transcriptomic experiments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy tutorial that analyzes ChIP-seq data from a study published by Wu et al., 2014 (DOI:10.1101/gr.164830.113). The goal of this study was to investigate "the dynamics of occupancy and the role in gene regulation of the transcription factor Tal1, a critical regulator of hematopoiesis, at multiple stages of hematopoietic differentiation." To this end, ChIP-seq experiments were performed in multiple mouse cell types including a G1E cell line and megakaryocytes, the two cell types represented here. The dataset contains biological replicate Tal1 ChIP-seq and input control experiments (*.fastqsanger files). Because of the long processing time for the large original files, we have downsampled the original raw data files to include only reads that align to chromosome 19 and a subset of interesting genomic loci (ChIPseq_regions_of_interest_v4.bed) pulled from the Wu et al. publication. Also included is a gene annotation file (RefSeq_gene_annotations_mm10.bed) with gene names added for viewing in a genome browser.