https://ega-archive.org/dacs/EGAC00001002224https://ega-archive.org/dacs/EGAC00001002224
This dataset contains ATAC-seq data performed in MM.1S cell line in ETOH (control) or Dexamethasone condition (Treatment)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We generated a total of 17 simulated datasets from bulk ATAC-seq data of bone marrow, which contains six FACS-sorted cell populations. Following a previously published benchmarking framework for scATAC-seq tools, we set the parameter n, which determines fragment counts within a single cell, at 250, 500, 1500, 2500, and 5000, respectively, thus obtaining five datasets of varying sequencing depth. We set the parameter q, which controls the proportion of cell-specific reads at 0, 0.1, 0.2, 0.3, 0.4, thus obtaining five datasets of differing noise levels. Lastly, we randomly dropped valid reads at rates ranging from 10% to 70%, generating seven datasets with a varying degree of dropout. Additionally, we collected 11 publicly available scATAC-seq datasets with given cell type labels for benchmarking to validate the effectiveness of scAGDE. These datasets, generated from different platforms and including human and mouse samples, vary in sparsity and scalability. Four datasets annotated through computational approaches included ‘Forebrain’ (GSE100033),‘Splenocyte’ (E-MTAB-6714), ‘GM12878vsHEK’ (GSE65360), ‘GM12878vsHL’ (GSE149683), ‘Lung’ (GSE149683) and‘Liver’ (GSE65360). Three datasets containing FACS-sorted cell populations were ‘Blood2K’ (GSE96772), ‘10XBlood’(GSE129785), and ‘DropBlood’ (GSE123581). The remaining two datasets were ‘Leukemia’ (GSE74310), which mixes cells from a healthy donor with leukemia cells from two acute myeloid leukemia (AML) patients, and ‘InSilico’ (GSE65360) combining six individual scATAC-seq data from distinct cell lines. The human fetal atlas dataset from Domcke et al., can be obtained from the public resource at https://descartes.brotmanbaty.org/bbi/human-chromatin-during-development/. The human brain dataset, downloadable from GSE184462, comes from a single-cell atlas of scATAC-seq of the human genome. The reference single-cell RNA-seq dataset from brain tissue used in our study can be found at GSE207334 and utilizes data from human samples. All processed datasets for benchmarking analysis and the human brain dataset have been deposited in the Zenodo database at https://doi.org/10.5281/zenodo.11609252.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is a fundamental epigenomics approach and has been widely used in profiling the chromatin accessibility dynamics in multiple species. A comprehensive reference of ATAC-seq datasets for mammalian tissues is important for the understanding of regulatory specificity and developmental abnormality caused by genetic or environmental alterations. Here, we report a mouse ATAC-seq atlas by producing a total of 66 ATAC-seq profiles from 20 primary tissues of both male and female mice. The ATAC-seq read enrichment, fragment size distribution, and reproducibility between replicates demonstrated the high quality of the full dataset.
https://ega-archive.org/dacs/EGAC00001000010https://ega-archive.org/dacs/EGAC00001000010
ATAC-seq data 72cases
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It contains ATAC-seq data files from Shim et al. 2021.Reference : https://arxiv.org/abs/2106.13634
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Assay for Transposase Accessible Chromatin by sequencing (ATAC-seq) is becoming popular in the neuroscience field where chromatin regulation is thought to be involved in neurodevelopment, activity-dependent gene regulation, hormonal and environmental responses, and pathophysiology of neuropsychiatric disorders. The advantages of using ATAC-seq include a small amount of material needed, fast protocol, and the ability to capture a range of gene regulatory elements with a single assay. With increasing interest in chromatin research, it is an imperative to have feasible, reliable assays that are compatible with a range of neuroscience study designs. Here we tested three protocols for neuronal chromatin accessibility analysis, including a varying brain tissue freezing method followed by fluorescence-activated nuclei sorting (FANS) and ATAC-seq. Our study shows that the cryopreservation method impacts the number of open chromatin regions identified from frozen brain tissue using ATAC-seq. However, we show that all protocols generate consistent and robust data and enable the identification of functional regulatory elements in neuronal cells. Our study implies that the broad biological interpretation of chromatin accessibility data is not significantly affected by the freezing condition. We also reveal additional challenges of doing chromatin analysis on post-mortem human brain tissue. Overall, ATAC-seq coupled with FANS is a powerful method to capture cell-type-specific chromatin accessibility information in mouse and human brain. Our study provides alternative brain preservation methods that generate high-quality ATAC-seq data while fitting in different study designs, and further encourages the use of this method to uncover the role of epigenetic (dys)regulation in the brain.
https://ega-archive.org/dacs/EGAC00001003193https://ega-archive.org/dacs/EGAC00001003193
RNAseq and ATACseq data for the FMF patients and healthy control. The RNAseq data was sequenced on a BGI MGI G400 machine, with PE100 reads. ATAC-seq libraries were prepared with Illumina Nextera primers and sequenced on NovaSeq 6000 platform with 50bp paired-end sequencing, where each sample was sequenced to approximate 60 million reads.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gene transcription is largely regulated by cis-regulatory elements. Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is an emerging technology that can accurately map cis-regulatory elements in animals and plants. However, the presence of cell walls and chloroplasts in plants hinders the extraction of high-quality nuclei, thereby affects the quality of ATAC-seq data. Meanwhile, it is tricky to perform ATAC-seq with different tissue types, especially for those with limited size and amount. Moreover, with rapid growth of ATAC-seq datasets from plants, powerful and easy-to-use data analysis pipelines for ATAC-seq, especially for wheat is lacking. Here, we provided an all-in-one solution for mapping open chromatin in wheat including both experimental and data analysis procedure. We efficiently obtained nuclei with less cell debris from various wheat tissues. High-quality ATAC-seq data from young spike and ovary, which are hard to harvest were generated. We determined that the saturation sequencing depth of wheat ATAC-seq is about 16 Gb. Particularly, we developed a powerful and easy-to-use online pipeline to analyze the wheat ATAC-seq data and this pipeline can be easily extended to other plant species. The method developed here will facilitate plant regulatory genome study not only for wheat but also for other plant species.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains single-cell ATAC sequencing data from nineteen cases of childhood BCP-ALL and four samples of mononuclear cells from normal bone marrow from healthy donors. The dataset is available as raw sequencing reads (fastq; restricted access) or as an annotated ATAC dataset (h5ad). The libraries were prepared according to the manufacturer’s instructions (10x Genomics CG000169: Nuclei Isolation for Single Cell ATAC Sequencing; 10x Genomics CG000209: Chromium Single Cell ATAC Reagent Kits v1.1).) and sequenced on a Novaseq 6000.
https://ega-archive.org/dacs/EGAC00001000515https://ega-archive.org/dacs/EGAC00001000515
High grade glioma sample, Gender Male Age 46. Single Nuclei ATAC seq data from high grade human glioma samples. NovaSeq6000 was used for ATAC seq. The files uploaded are bam files created with grch38 reference.
Aire is a transcriptional regulator that induces promiscuous expression of thousands of tissue-restricted antigen (TRA) genes in medullary thymic epithelial cells (mTECs). While the target genes of Aire are well characterized, the transcriptional programs regulating its own expression remain elusive. We used Affymetrix microarrays to analyze the gene expression patterns of Aire expressing cells (mature mTECs and Thymic B cells) and compared them to control counterparts, namely immature mTECs, cortical Thymic epithelial cells and splenic B cells of tissue-restricted antigen (TRA) genes in medullary thymic epithelial cells (mTECs). While the target genes of Aire are well characterized, the transcriptional programs regulating its own expression remain elusive. We’ve used Assay for transposase-accessible chromatin using sequencing (ATAC-Seq) on the different thymic epithelial cell populations to assess chromatin accessibility around the Aire locus in these cells. Moreover, we’ve used the indexing-first chromatin immunoprecipitation (iChIP) technique to assess the occupancy of the Irf8 transcription factor in the Aire locus Overall design: Mature EpCAM+MHC-II high mTECs, Immature EpCAM+MHC-II low mTECs, and EpCAM+Ly51+ cTECs were flow-sorted from thymi isolated from thymi of C57BL/6 6weeks old mice. These cells were then subjected to ATAC-Seq. typically 10 thousand cells were used per replicate. Mature EpCAM+MHC-II high mTECs and Immature EpCAM+MHC-II low mTECs were flow-sorted and pooled from thymi isolated from thymi of C57BL/6 6weeks old mice and iChIP using Irf8-specific antibody was then performed. Typically 250 thousand cells were used per replicate.
ChIP-Atlas is the database and its web interface to provide the result of analysis processed from the entire ChIP-Seq data archived in Sequence Read Archive. We have curated metadata described by original data submitter to enable further data analysis. See details here: https://github.com/inutano/chip-atlas/wiki
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used for tutorial.
https://ega-archive.org/dacs/EGAC00001000162https://ega-archive.org/dacs/EGAC00001000162
Open chromatin regions in the MYC super-enhancer region were investigated by ATAC-seq in t(3;8) AML. ATAC-seq was performed as described (Buenrostro et al, 2013) with a modification in the lysis buffer (0.30 M sucrose, 10 mM Tris pH 7.5, 60 mM KCl, 15 mM NaCl, 5 mM MgCl2, 0.1 mM EGTA, 0.1% NP40, 0.15 mM Spermine, 0.5 mM Spermidine, 2 mM 6AA) to reduce mitochondrial DNA contamination.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM in vivo Transcription-Factor Binding Site (TFBS) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods for TFBS prediction from DNase-seq. However, the more recent Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq has surpassed DNase-seq as the most widely-used chromatin accessibility profiling method. Furthermore, ATAC-seq is the only such technique available at single-cell resolution from standard commercial platforms. While ATAC-seq datasets grow exponentially, suboptimal motif scanning is unfortunately the most common method for TFBS prediction from ATAC-seq. To enable community access to state-of-the-art TFBS prediction from ATAC-seq, we (1) curated an extensive benchmark dataset (127 TFs) for ATAC-seq model training and (2) built “maxATAC”, a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the first collection of high-performance TFBS prediction models for ATAC-seq.
Repository Overview
This repository contains all of the processed training data used by maxATAC for model training and benchmarking. All directories have the extension .tar.gz .
In this repository you will find the directories:
ATAC_Peaks: ATAC-seq peak files called with MACS2. These files are generated for the hg38 reference genome. The files are have the extension .bed.gz. ATAC_Signal_File: ATAC-seq signal file. This file has been read-depth normalized and min-max normalized between 0,1 using the 99th percentile max value. These files are presented as bigwig files with a .bw extension. ChIP_Binding_File: ChIP-seq signal tracks. These files are the binary signal tracks in bigwig format that are found in the ChIP_Peaks directory. ChIP_Peaks: ChIP-seq peaks files. This directory contains the ENCODE IDR peak sets and peak sets created in the maxATAC publication. These files have the extension .bed.gz. Full_Models: Current set of 127 maxATAC TF models. This directory includes the information for thresholding and the .h5 model files. hg38: This directory includes the hg38 reference genome information that was used in this publication. Prediction_and_Benchmarking: This directory contains all of the predictions for chr1 used for benchmarking in a round-robin training approach. Tn5_CutSites: This directory contains the Tn5 cut sites that have been shifted +4 on the (+) strand and -5 on the (-) strand. The cut sites were then slopped 20 bp using bedtools slop. These files are presented as bed files that have been bzipped. Each file represents an individual biological replicate. scATAC: This directory includes data used for scATAC-seq based predictions.
For additional details please see the maxATAC GitHub Repository and bioRxiv pre-print.
Salignon et al. created Cactus, a new pipeline that can be used for comprehensive ATAC-Seq and mRNA-Seq data analysis. Cactus contains multiple unique functions compared to other, similar pipelines, e.g. enrichment in chromatin states and ChIP-Seq binding sites.
https://ega-archive.org/dacs/EGAC00001002719https://ega-archive.org/dacs/EGAC00001002719
ATAC-seq data. Dataset includes FASTQ files, BAM files, and analysis files with the ATAC-seq peaks determined using MACS2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As chromatin accessibility provides rich information on transcription factor binding process, for a given TF-based raw regulon, firstly we test whether the TF motif is enriched in this regulon. To perform this efficiently, we have built our own database in BED format, which contains all available TF motifs and their occurrences across the potential binding regions (TSS$\pm10$ kb) of all MOUSE genes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chromatin information content landscapes inform transcription factor and DNA interactions
Authors: Ricardo D’Oliveira Albanus, Yasuhiro Kyono, John Hensley, Arushi Varshney, Peter Orchard, Jacob O. Kitzman, Stephen C. J. Parker
https://doi.org/10.1101/777532
This record contains the processed data used in our manuscript. For instructions on how to use or regenerate this data, please refer to https://github.com/ParkerLab/chromatin_information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy tutorial that analyzes ChIP-seq data from a study published by Wu et al., 2014 (DOI:10.1101/gr.164830.113). The goal of this study was to investigate "the dynamics of occupancy and the role in gene regulation of the transcription factor Tal1, a critical regulator of hematopoiesis, at multiple stages of hematopoietic differentiation." To this end, ChIP-seq experiments were performed in multiple mouse cell types including a G1E cell line and megakaryocytes, the two cell types represented here. The dataset contains biological replicate Tal1 ChIP-seq and input control experiments (*.fastqsanger files). Because of the long processing time for the large original files, we have downsampled the original raw data files to include only reads that align to chromosome 19 and a subset of interesting genomic loci (ChIPseq_regions_of_interest_v4.bed) pulled from the Wu et al. publication. Also included is a gene annotation file (RefSeq_gene_annotations_mm10.bed) with gene names added for viewing in a genome browser.
https://ega-archive.org/dacs/EGAC00001002224https://ega-archive.org/dacs/EGAC00001002224
This dataset contains ATAC-seq data performed in MM.1S cell line in ETOH (control) or Dexamethasone condition (Treatment)