Facebook
TwitterThis data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Philip Cayting mailto:pcayting@stanford.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu).This track shows probable binding sites of the specified transcription factors (TFs) in the given cell types as determined by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq). Included for each cell type is the input signal, which represents the control condition where no antibody targeting was performed. For each experiment (cell type vs. antibody) this track shows a graph of enrichment for TF binding (Signal), along with sites that have the greatest evidence of transcription factor binding (Peaks).For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Further preparations were similar to those previously published (Euskirchen et al., 2007) with the exceptions that the cells were unstimulated and sodium orthovanadate was omitted from the buffers. For details on the chromatin immunoprecipitation protocol used, see Euskirchen et al. (2007) and Rozowsky et al. (2009).DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome.For each 1 Mb segment of each chromosome a peak height threshold was determined by requiring a false discovery rate <= 0.05 when comparing the number of peaks above threshold as compared the number obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.05 are considered to be significantly enriched compared to the input DNA control.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transcriptomic profiling is an immensely powerful hypothesis generating tool. However, accurately predicting the transcription factors (TFs) and cofactors that drive transcriptomic differences between samples is challenging. A number of algorithms draw on ChIP-seq tracks to define TFs and cofactors behind gene changes. These approaches assign TFs and cofactors to genes via a binary designation of ‘target’, or ‘non-target’ followed by Fisher Exact Tests to assess enrichment of TFs and cofactors. ENCODE archives 2314 ChIP-seq tracks of 684 TFs and cofactors assayed across a 117 human cell lines under a multitude of growth and maintenance conditions. The algorithm presented herein, Mining Algorithm for GenetIc Controllers (MAGIC), uses ENCODE ChIP-seq data to look for statistical enrichment of TFs and cofactors in gene bodies and flanking regions in gene lists without an a priori binary classification of genes as targets or non-targets. When compared to other TF mining resources, MAGIC displayed favourable performance in predicting TFs and cofactors that drive gene changes in 4 settings: 1) A cell line expressing or lacking single TF, 2) Breast tumors divided along PAM50 designations 3) Whole brain samples from WT mice or mice lacking a single TF in a particular neuronal subtype 4) Single cell RNAseq analysis of neurons divided by Immediate Early Gene expression levels. In summary, MAGIC is a standalone application that produces meaningful predictions of TFs and cofactors in transcriptomic experiments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We downloaded 2,216 ChIP-seq experiment data from the ENCODE Project. The list of the data is in Supplementary Table S8. The data were lifted over from hg19 to hg38. We found overlapping peaks on four different categories: (1) 500bp upstream the promoter region of pcRNA-associated coding genes, (2) 500bp upstream promoter region of pcRNAs, (3) pcRNA genomic loci, and (4) pcRNA genomic loci but not overlapping with promoter region. To understand the correlation of TF binding patterns in the four categories, we made a binary matrix per category that consists of rows of TFs and columns of pcRNA/coding genes. Hence, the matrix contains connections between TF and pcRNA/associate coding genes. The matrix of category 2 is clustered by Euclidian Distance. To check the extent to which promoter sharing or proximity determines TFBS correlation, we also separated the clustered heat-map in the pcRNA bidirectional transcript (BIDIR) subgroup to the other subgroups (Non-BIDIR). To directly compare the TF binding patterns between each category, the other three matrices were sorted by the same order of the clustered matrix. We used the MatLab function corr2 to calculate r-value between category (1) and (2). We performed Monte Carlo simulation to calculate the p-value and test the significance of the r-value.
Facebook
Twitterwww.encodeproject.org/help/citing-encode/www.encodeproject.org/help/citing-encode/
CTCF - TF ChIP-seq - Homo sapiens HepG2 - ENCODE - U54HG004576 - Richard Myers, HAIB
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ENCODE data sets used in ChromNet (1451 total). (TXT 34 kb)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Classification dataset for machine learning on epigenomic landscapes from ENCODE and Roadmap Epigenomics. The dataset includes the original peak files (BED format) for the DNase-seq and transcription factor (TF) ChIP-seq experiments that were used to label genomic regions as positives or negatives. The DNase-seq peak files along with processing details are in the file "encode-roadmap.DNase-seq.peaks.tar.gz". The TF ChIP-seq peak files along with processing details are in the file "encode.ChIP-seq.peaks.tar.gz". The processed dataset stored in hdf5 format files along with processing details are in the file "nn.encode-roadmap.hdf5_files.tar.gz".
Facebook
Twitterwww.encodeproject.org/help/citing-encode/www.encodeproject.org/help/citing-encode/
H3K4me3 - Histone ChIP-seq - Homo sapiens K562 - ENCODE - U54HG004570 - Bradley Bernstein, Broad
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A vast amount of SNPs derived from genome-wide association studies are represented by non-coding ones, therefore exacerbating the need for effective identification of regulatory SNPs (rSNPs) among them. However, this task remains challenging since the regulatory part of the human genome is annotated much poorly as opposed to coding regions. Here we describe an approach aggregating the whole set of ENCODE ChIP-seq data in order to search for rSNPs, and provide the experimental evidence of its efficiency. Its algorithm is based on the assumption that the enrichment of a genomic region with transcription factor binding loci (ChIP-seq peaks) indicates its regulatory function, and thereby SNPs located in this region are more likely to influence transcription regulation. To ensure that the approach preferably selects functionally meaningful SNPs, we performed enrichment analysis of several human SNP datasets associated with phenotypic manifestations. It was shown that all samples are significantly enriched with SNPs falling into the regions of multiple ChIP-seq peaks as compared with the randomly selected SNPs. For experimental verification, 40 SNPs falling into overlapping regions of at least 7 TF binding loci were selected from OMIM. The effect of SNPs on the binding of the DNA fragments containing them to the nuclear proteins from four human cell lines (HepG2, HeLaS3, HCT-116, and K562) has been tested by EMSA. A radical change in the binding pattern has been observed for 29 SNPs, besides, 6 more SNPs also demonstrated less pronounced changes. Taken together, the results demonstrate the effective way to search for potential rSNPs with the aid of ChIP-seq data provided by ENCODE project.
Facebook
TwittermodENCODE_submission_5017 This submission comes from a modENCODE project of Kevin White. For full list of modENCODE projects, see http://www.genome.gov/26524648 Project Goal: The White Lab is aiming to map the association of all the Transcription Factors (TF) on the genome of Drosophila melanogaster. One technique that we use for this purpose is chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) utilizing an Illumina next generation sequencing platform. The data generated by ChIP-seq experiments consist basically of a plot of signal intensity across the genome. The highest signals correspond to positions in the genome occupied by the tested TF. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf EXPERIMENT TYPE: CHIP-seq. BIOLOGICAL SOURCE: Strain: Y cn bw sp; Developmental Stage: Embryo 0-8; Genotype: y[1] oc[R3.2]; Gr22b[1] Gr22d[1] cn[1] CG33964[R4.2] bw[1] sp[1]; LysC[1] lab[R4.2] MstProx[1] GstD5[1] Rh6[1]; Sex: Unknown; EXPERIMENTAL FACTORS: Developmental Stage Embryo 0-8; Strain Y cn bw sp; Antibody Su(H) (target is fly genes:Su(H))
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.
Facebook
TwitterControl ChIP-seq on embryonic 13.5 day mouse liver For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODE_Data_Use_Policy_for_External_Users_03-07-14.pdf https://www.encodeproject.org/ENCSR220PXJ/
Facebook
TwitterH3K9ac ChIP-seq on embryonic 14.5 day mouse intestine For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODE_Data_Use_Policy_for_External_Users_03-07-14.pdf https://www.encodeproject.org/ENCSR260OUP/
Facebook
Twitterwww.encodeproject.org/help/citing-encode/www.encodeproject.org/help/citing-encode/
CTCF - TF ChIP-seq - Homo sapiens GM12878 - ENCODE - U54HG004570 - Bradley Bernstein, Broad
Facebook
Twitterwww.encodeproject.org/help/citing-encode/www.encodeproject.org/help/citing-encode/
CTCF - TF ChIP-seq - Homo sapiens HeLa-S3 - ENCODE - U54HG004570 - Bradley Bernstein, Broad
Facebook
TwitterChIP-Seq on HepG2For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODE_Data_Use_Policy_for_External_Users_03-07-14.pdf https://www.encodeproject.org/ENCSR224NQI/
Facebook
TwitterH3K9me3 ChIP-seq on embryonic 13.5 day mouse hindbrain For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODE_Data_Use_Policy_for_External_Users_03-07-14.pdf https://www.encodeproject.org/ENCSR325JLI/
Facebook
TwitterH3K27me3 ChIP-seq on embryonic 15.5 day mouse kidney For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODE_Data_Use_Policy_for_External_Users_03-07-14.pdf https://www.encodeproject.org/ENCSR740DYF/
Facebook
TwittermodENCODE_submission_2437 This submission comes from a modENCODE project of Michael Snyder. For full list of modENCODE projects, see http://www.genome.gov/26524648 Project Goal: We are identifying the DNA binding sites for 300 transcription factors in C. elegans. Each transcription factor gene is tagged with the same GFP fusion protein, permitting validation of the gene's correct spatio-temporal expression pattern in transgenic animals. Chromatin immunoprecipitation on each strain is peformed using an anti-GFP antibody, and any bound DNA is deep-sequenced using Solexa GA2 technology. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf EXPERIMENT TYPE: CHIP-seq. BIOLOGICAL SOURCE: Strain: N2(genotype : wild type genotype : DR subclone of DB original (Tc1 pattern I) official name : N2 ); Developmental Stage: fed L1; Genotype: wild type; Sex: Hermaphrodite; EXPERIMENTAL FACTORS: Developmental Stage fed L1; Target gene ama-1; Strain N2(genotype : wild type genotype : DR subclone of DB original (Tc1 pattern I) official name : N2 ); temp (temperature) 20 degree celsius Series_type: CHIP-seq
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Appendix C Morrissey Dissertation
Facebook
TwitterH3K9ac ChIP-seq on embryonic 11.5 day mouse embryonic facial prominence For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODE_Data_Use_Policy_for_External_Users_03-07-14.pdf https://www.encodeproject.org/ENCSR076FAM/
Facebook
TwitterThis data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Philip Cayting mailto:pcayting@stanford.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu).This track shows probable binding sites of the specified transcription factors (TFs) in the given cell types as determined by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq). Included for each cell type is the input signal, which represents the control condition where no antibody targeting was performed. For each experiment (cell type vs. antibody) this track shows a graph of enrichment for TF binding (Signal), along with sites that have the greatest evidence of transcription factor binding (Peaks).For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Further preparations were similar to those previously published (Euskirchen et al., 2007) with the exceptions that the cells were unstimulated and sodium orthovanadate was omitted from the buffers. For details on the chromatin immunoprecipitation protocol used, see Euskirchen et al. (2007) and Rozowsky et al. (2009).DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome.For each 1 Mb segment of each chromosome a peak height threshold was determined by requiring a false discovery rate <= 0.05 when comparing the number of peaks above threshold as compared the number obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.05 are considered to be significantly enriched compared to the input DNA control.